Skip to main content
Fine-tuning is the process of taking a pre-trained large language model (LLM) and further training it on a smaller, specific dataset. This process adapts the model to a particular task or domain, improving its performance and accuracy for your use case. This guide explains how to use Runpod’s fine-tuning feature, powered by Axolotl, to customize an LLM. You’ll learn how to select a base model, choose a dataset, configure your training environment, and deploy your fine-tuned model. For more information about fine-tuning with Axolotl, see the Axolotl Documentation.

Requirements

Before you begin, you’ll need:
  • A Runpod account.
  • (Optional) A Hugging Face account and an access token if you plan to use gated models or upload your fine-tuned model.

Select a base model and dataset

The base model is the starting point for your fine-tuning process, while the dataset provides the specific knowledge needed to adapt the base model to your task. You can choose from thousands of models and datasets on Hugging Face.

Deploy a fine-tuning Pod

1

Go to the Fine-Tuning page

Navigate to the Fine-Tuning section in the Runpod console.
2

Specify the base model and dataset

In the Base Model field, enter the Hugging Face model ID. In the Dataset field, enter the Hugging Face dataset ID.If this is your first time fine-tuning and you’re just experimenting, try:
# Base model
TinyLlama/TinyLlama_v1.1

# Dataset (alpaca)
mhenrichsen/alpaca_2k_test
3

Provide a Hugging Face token (if needed)

If you’re using a gated model that requires special access, generate a Hugging Face token with the necessary permissions and add it to the Hugging Face Access Token field.
4

Continue to the next step

Click Deploy the Fine-Tuning Pod to start configuring your fine-tuning Pod.
5

Choose a GPU for the Pod

Select a GPU instance based on your model’s requirements. Larger models and datasets require GPUs with more memory.
6

Deploy the Pod

Finishing configuring the Pod, then click Deploy on-demand. This should open the detail pane for your Pod automatically.
7

Monitor Pod deployment

Click Logs to monitor the system logs for deployment progress. Wait for the success message: "You've successfully configured your training environment!" Depending on the size of your model and dataset, this may take some time.
8

Connect to your training environment

Once your training environment is ready, you can connect to it to configure and start the fine-tuning process.Click Connect and choose your preferred connection method:
  • Jupyter Notebook: A browser-based notebook interface.
  • Web Terminal: A browser-based terminal.
  • SSH: A secure connection from your local machine.
To use SSH, add your public SSH key in your account settings. The system automatically adds your key to the Pod’s authorized_keys file. For more information, see Connect to a Pod with SSH.

Configure your environment

For a list of working configuration examples, check out the Axolotl examples repository (also available in your training environment at /workspace/fine-tuning/examples/).
Your training environment is located in the /workspace/fine-tuning/ directory and has the following structure:
  • examples/: Sample configurations and scripts.
  • outputs/: Where your training results and model outputs will be saved.
  • config.yaml: The main configuration file for your training parameters.
The system generates an initial config.yaml based on your selected base model and dataset. This is where you define all the hyperparameters for your fine-tuning job. You may need to experiment with these settings to achieve the best results.
1

Open the configuration file

Navigate to the fine-tuning directory (/workspace/fine-tuning/) and open the configuration file (config.yaml) in JupyterLab or your preferred text editor to review and adjust the fine-tuning parameters.If you’re using the web terminal, the fine-tuning directory should open automatically. Use nano to edit the config.yaml file:
nano config.yaml
The config.yaml file will look something like this (base_model and datasets will be replaced with the model and dataset you selected in Step 2):
adapter: lora
base_model: TinyLlama/TinyLlama_v1.1
bf16: auto
datasets:
- path: mhenrichsen/alpaca_2k_test
  type: null
gradient_accumulation_steps: 1
learning_rate: 0.0002
load_in_8bit: true
lora_alpha: 16
lora_dropout: 0.05
lora_r: 8
lora_target_modules:
- q_proj
- v_proj
- k_proj
- o_proj
- gate_proj
- down_proj
- up_proj
micro_batch_size: 16
num_epochs: 1
optimizer: adamw_bnb_8bit
output_dir: ./outputs/mymodel
sequence_len: 4096
train_on_inputs: false
Here’s a breakdown of the config.yaml file:
Model and precision:
  • base_model: The base model you want to fine-tune.
  • bf16: auto: This tells the GPU to use Bfloat16 precision if it can. It’s more stable than standard FP16 and helps prevent the model’s math from “overflowing” (exploding) during training.
  • load_in_8bit: true: This is a memory-saving trick. It squashes the base model weights into 8 bits so it takes up less VRAM, allowing you to train on smaller GPUs.
LoRA settings:
  • lora_r: 8: The rank of the LoRA adapter. 8 is a standard starting point; higher numbers (like 16 or 32) let the model learn more complex patterns but use more VRAM.
  • lora_alpha: 16: This scales the learned weights.
  • lora_target_modules: This list tells Axolotl exactly which parts of the Transformer architecture to attach the adapters to.
Dataset logic
  • path: Where the data is coming from (Hugging Face).
  • type: null: This tells Axolotl how to format the text into prompts.
You’ll need to change this value depending on the dataset you selected—see the next step for details.
  • train_on_inputs: false: This is a smart setting. It tells the model: “Don’t try to predict the user’s question; only learn how to predict the assistant’s answer.” This focuses the “learning energy” on the responses.
  • sequence_len: 4096: The maximum length of text the model can “read” at once.
Training mechanics
  • micro_batch_size: 16: How many examples the GPU processes at a single time.
  • gradient_accumulation_steps: 1: How many batches to “save up” before actually updating the model’s weights.
  • learning_rate: 0.0002: How fast the model changes. Too high and it “forgets” everything; too low and it never learns.
  • optimizer: adamw_bnb_8bit: A special version of the AdamW optimizer that uses 8-bit math to save even more VRAM.
2

Update the dataset type

The dataset type is set to null by default. You’ll need to change this value depending on the dataset you selected. For example, if you selected the mhenrichsen/alpaca_2k_test dataset, you’ll need to change type: null to type: alpaca to load the dataset correctly.Once you’ve changed the dataset type, save the file (config.yaml) and continue to the next step.If you’re not sure what dataset type to use, you can find an overview of common dataset types below:
chat_template for chat-based datasets:
{
 "messages" : [
   {"role": "user", "content": "What is the capital of France?"},
   {"role": "assistant", "content": "The capital of France is Paris."}
 ]
}
You’ll also need to add the field_messages key to datasets to specify the field that contains the messages:
datasets:
  - path: your/dataset
    type: chat_template
    field_messages: messages
completion for raw text datasets:
{
  "text": "The quick brown fox jumps over the lazy dog."
}
input_output for template-free datasets:
{
  "input": "User: What is the capital of France?\nAssistant: ",
  "output": "The capital is Paris.</s>"
}
alpaca for instruction-following datasets:
{
 "instruction": "Summarize the following text.",
 "input": "The sun is a star at the center of the Solar System.",
 "output": "The sun is the central star of our solar system."
}
sharegpt for conversational datasets:
{
 "conversations": [
   {
     "from": "human",
     "value": "What are the three laws of thermodynamics?"
   },
   {
     "from": "gpt",
     "value": "1. Energy cannot be created or destroyed. 2. Entropy always increases. 3. Absolute zero cannot be reached."
   }
 ]
}
You’ll also need to add the conversation key to datasets to specify the name of the list field that contains the messages:
datasets:
  - path: your/dataset
    type: sharegpt
    conversation: conversations

Start the fine-tuning process

Once you’re satisfied with your configuration, you can start the training. Run the following command in your terminal:
axolotl train config.yaml
Monitor the training progress in your terminal. The output will show the training loss, validation loss, and other metrics.

Test your model with vLLM

Once the fine-tuning process is complete, you can test the inference capabilities of your fine-tuned model with vLLM. For example, to serve the fine-tuned TinyLlama model used in the examples above, you would follow these steps:
1

Serve your model

To serve your fine-tuned model, run the following command:
vllm serve TinyLlama/TinyLlama_v1.1 --enable-lora --lora-modules my-adapter=/workspace/fine-tuning/outputs/mymodel --port 8000
2

Test your model

To test your model, first you’ll need to start a new terminal window, tab, or pane.If you’re using the web terminal, tmux is already installed, and you can create a new horizontal pane by running:
tmux split-window -h
In the new window/tab/pane, you can send a test request to the vLLM server using curl:
curl http://localhost:8000/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "my-adapter",
    "prompt": "### Instruction:\nExplain gravity in one sentence.\n\n### Response:\n",
    "max_tokens": 50
  }'
You should see the response from your model in the terminal.

Push your model to Hugging Face

After the fine-tuning process is complete, you can upload your model to the Hugging Face Hub to share it with the community or use it in your applications.
1

Log in to Hugging Face

Run this command to log in to your Hugging Face account:
huggingface-cli login
2

Upload your model files

To upload your model files to the Hugging Face Hub, run this command:
huggingface-cli upload YOUR_USERNAME/MODEL_NAME ./outputs/mymodel
Replace YOUR_USERNAME with your Hugging Face username and MODEL_NAME with your desired model name.

Next steps

Now that you’ve successfully fine-tuned a model, you can deploy it for inference using Runpod Serverless. If you’ve uploaded your model to Hugging Face, you can deploy it as a cached model to reduce cost and cold start times.