How to Fine-Tune a Small Language Model (Step-by-Step Guide)
Fine-tuning a small language model (SLM) lets you take a general-purpose model — like Llama 3.2 1B or Qwen3 0.6B — and specialise it for your exact task. The result is a model that’s faster, cheaper, and often more accurate than prompting a large language model.
This guide walks you through the entire process, from preparing your data to deploying a production-ready model.
Why Fine-Tune a Small Language Model?
Large language models are impressive generalists, but they come with trade-offs:
- Latency — billion-parameter models are slow for real-time applications
- Cost — API calls to frontier models add up quickly at scale
- Privacy — sending sensitive data to third-party APIs isn’t always an option
- Reliability — general-purpose models can be unpredictable on narrow tasks
A fine-tuned SLM solves all of these. By training a compact model on task-specific data, you get deterministic, fast, and private inference — often matching or exceeding the accuracy of the teacher LLM on your domain.
Step 1: Define Your Task
Before touching any code, clearly define what you want the model to do. Fine-tuning works best when the task is well-scoped:
- Classification — assign one of N labels to an input (e.g. sentiment, intent, topic)
- Information extraction — pull structured fields from unstructured text (e.g. NER, PII redaction)
- Question answering — answer questions given a context document (open-book) or from learned knowledge (closed-book)
- Tool calling — decide which function to call and with what arguments
The clearer your task definition, the better your fine-tuned model will perform.
Step 2: Prepare Your Data
You need examples that demonstrate the input-output behaviour you want. Each example should have:
- A question (the input your model will receive)
- An answer (the output you expect)
- Optionally, a context (background information for open-book tasks)
You don’t need thousands of examples to start. With knowledge distillation, you can begin with as few as 10 seed examples and use a teacher model to generate synthetic training data.
{"question": "What is the return policy?", "answer": "30-day money-back guarantee"}
{"question": "How do I reset my password?", "answer": "Go to Settings > Security > Reset Password"}
How Much Data Do You Need?
| Approach | Examples needed | When to use |
|---|---|---|
| Few-shot distillation | 10–50 | Starting from scratch, rapid prototyping |
| Standard fine-tuning | 500–2,000 | You have labelled data available |
| Full dataset | 5,000+ | Maximum accuracy on complex tasks |
Step 3: Choose a Base Model
Pick a student model based on your deployment constraints:
| Model | Parameters | Best for |
|---|---|---|
| SmolLM2 135M | 135M | Ultra-low latency, edge devices |
| Qwen3 0.6B | 600M | Good balance of size and capability |
| Llama 3.2 1B | 1B | Strong general-purpose baseline |
| Llama 3.2 3B | 3B | Complex tasks needing more capacity |
| Gemma 3 4B | 4B | High accuracy with moderate resources |
Step 4: Configure Training
Key hyperparameters to set:
- Learning rate — start with
2e-4for LoRA,5e-5for full fine-tuning - Epochs — 3–5 epochs is typical; watch for overfitting on small datasets
- LoRA rank —
r=16is a safe default; increase to 32–64 for complex tasks - Batch size — as large as your GPU memory allows
LoRA vs Full Fine-Tuning
For most use cases, LoRA (Low-Rank Adaptation) is the right choice:
- Uses a fraction of the memory
- Trains significantly faster
- Produces results comparable to full fine-tuning
- Makes it easy to swap adapters for different tasks
Full fine-tuning is worth considering only when you have large datasets (10k+ examples) and need to squeeze out every last percentage point of accuracy.
Step 5: Train the Model
With distil labs, training is as simple as providing your data and configuration:
base:
task: classification
student_model_name: Qwen3-0.6B
teacher_model_name: Llama-3.3-70B-Instruct
tuning:
num_train_epochs: 4
use_lora: true
synthgen:
generation_target: 1000
The platform handles the heavy lifting: synthetic data generation from a teacher model, training with LoRA adapters, and evaluation against your test set.
Step 6: Evaluate
Always evaluate your fine-tuned model against a held-out test set. Key metrics to track:
- Task accuracy — does the model produce correct outputs?
- Latency — how fast is inference?
- Consistency — does the model behave predictably across similar inputs?
Compare your fine-tuned SLM against the teacher model. In our benchmarks, distilled students match or exceed the teacher on 8 of 10 datasets.
Step 7: Deploy
Once you’re satisfied with evaluation results, deploy your model. Fine-tuned SLMs are small enough to run almost anywhere:
- Cloud API — deploy to a serverless endpoint for easy integration
- On-premises — run on your own infrastructure for maximum data privacy
- Edge devices — models under 3B parameters can run on mobile and IoT hardware
What’s Next?
Fine-tuning is an iterative process. Start with a small number of examples, evaluate, and improve your dataset based on where the model struggles. With each iteration, your SLM gets closer to production-ready performance.