Requirements
Before you begin, you’ll need:- A Runpod account.
- (Optional) A Hugging Face account and an access token if you plan to use gated models or upload your fine-tuned model.
Select a base model and dataset
The base model is the starting point for your fine-tuning process, while the dataset provides the specific knowledge needed to adapt the base model to your task. You can choose from thousands of models and datasets on Hugging Face.Deploy a fine-tuning Pod
Go to the Fine-Tuning page
Navigate to the Fine-Tuning section in the Runpod console.
Specify the base model and dataset
In the Base Model field, enter the Hugging Face model ID. In the Dataset field, enter the Hugging Face dataset ID.If this is your first time fine-tuning and you’re just experimenting, try:
Provide a Hugging Face token (if needed)
If you’re using a gated model that requires special access, generate a Hugging Face token with the necessary permissions and add it to the Hugging Face Access Token field.
Continue to the next step
Click Deploy the Fine-Tuning Pod to start configuring your fine-tuning Pod.
Choose a GPU for the Pod
Select a GPU instance based on your model’s requirements. Larger models and datasets require GPUs with more memory.
Deploy the Pod
Finishing configuring the Pod, then click Deploy on-demand. This should open the detail pane for your Pod automatically.
Monitor Pod deployment
Click Logs to monitor the system logs for deployment progress. Wait for the success message:
"You've successfully configured your training environment!" Depending on the size of your model and dataset, this may take some time.Connect to your training environment
Once your training environment is ready, you can connect to it to configure and start the fine-tuning process.Click Connect and choose your preferred connection method:
- Jupyter Notebook: A browser-based notebook interface.
- Web Terminal: A browser-based terminal.
- SSH: A secure connection from your local machine.
Configure your environment
Your training environment is located in the/workspace/fine-tuning/ directory and has the following structure:
examples/: Sample configurations and scripts.outputs/: Where your training results and model outputs will be saved.config.yaml: The main configuration file for your training parameters.
config.yaml based on your selected base model and dataset. This is where you define all the hyperparameters for your fine-tuning job. You may need to experiment with these settings to achieve the best results.
Open the configuration file
Navigate to the fine-tuning directory (The Here’s a breakdown of the
/workspace/fine-tuning/) and open the configuration file (config.yaml) in JupyterLab or your preferred text editor to review and adjust the fine-tuning parameters.If you’re using the web terminal, the fine-tuning directory should open automatically. Use nano to edit the config.yaml file:config.yaml file will look something like this (base_model and datasets will be replaced with the model and dataset you selected in Step 2):config.yaml file:Configuration breakdown
Configuration breakdown
Model and precision:
base_model: The base model you want to fine-tune.bf16: auto: This tells the GPU to use Bfloat16 precision if it can. It’s more stable than standard FP16 and helps prevent the model’s math from “overflowing” (exploding) during training.load_in_8bit: true: This is a memory-saving trick. It squashes the base model weights into 8 bits so it takes up less VRAM, allowing you to train on smaller GPUs.
lora_r: 8: The rank of the LoRA adapter. 8 is a standard starting point; higher numbers (like 16 or 32) let the model learn more complex patterns but use more VRAM.lora_alpha: 16: This scales the learned weights.lora_target_modules: This list tells Axolotl exactly which parts of the Transformer architecture to attach the adapters to.
path: Where the data is coming from (Hugging Face).type: null: This tells Axolotl how to format the text into prompts.
train_on_inputs: false: This is a smart setting. It tells the model: “Don’t try to predict the user’s question; only learn how to predict the assistant’s answer.” This focuses the “learning energy” on the responses.sequence_len: 4096: The maximum length of text the model can “read” at once.
micro_batch_size: 16: How many examples the GPU processes at a single time.gradient_accumulation_steps: 1: How many batches to “save up” before actually updating the model’s weights.learning_rate: 0.0002: How fast the model changes. Too high and it “forgets” everything; too low and it never learns.optimizer: adamw_bnb_8bit: A special version of the AdamW optimizer that uses 8-bit math to save even more VRAM.
Update the dataset type
The dataset type is set to
You’ll also need to add the You’ll also need to add the
null by default. You’ll need to change this value depending on the dataset you selected. For example, if you selected the mhenrichsen/alpaca_2k_test dataset, you’ll need to change type: null to type: alpaca to load the dataset correctly.Once you’ve changed the dataset type, save the file (config.yaml) and continue to the next step.If you’re not sure what dataset type to use, you can find an overview of common dataset types below:Common dataset types
Common dataset types
chat_template for chat-based datasets:field_messages key to datasets to specify the field that contains the messages:completion for raw text datasets:input_output for template-free datasets:alpaca for instruction-following datasets:sharegpt for conversational datasets:conversation key to datasets to specify the name of the list field that contains the messages:Start the fine-tuning process
Once you’re satisfied with your configuration, you can start the training. Run the following command in your terminal:Test your model with vLLM
Once the fine-tuning process is complete, you can test the inference capabilities of your fine-tuned model with vLLM. For example, to serve the fine-tuned TinyLlama model used in the examples above, you would follow these steps:Test your model
To test your model, first you’ll need to start a new terminal window, tab, or pane.If you’re using the web terminal, In the new window/tab/pane, you can send a test request to the vLLM server using You should see the response from your model in the terminal.
tmux is already installed, and you can create a new horizontal pane by running:curl: