← All learn articles

How to Fine-Tune a Small Language Model (Step-by-Step Guide)

How to Fine-Tune a Small Language Model (Step-by-Step Guide)

Fine-tuning a small language model (SLM) lets you take a general-purpose model — like Llama 3.2 1B or Qwen3 0.6B — and specialise it for your exact task. The result is a model that’s faster, cheaper, and often more accurate than prompting a large language model.

This guide walks you through the entire process, from preparing your data to deploying a production-ready model.

Why Fine-Tune a Small Language Model?

Large language models are impressive generalists, but they come with trade-offs:

  • Latency — billion-parameter models are slow for real-time applications
  • Cost — API calls to frontier models add up quickly at scale
  • Privacy — sending sensitive data to third-party APIs isn’t always an option
  • Reliability — general-purpose models can be unpredictable on narrow tasks

A fine-tuned SLM solves all of these. By training a compact model on task-specific data, you get deterministic, fast, and private inference — often matching or exceeding the accuracy of the teacher LLM on your domain.

Step 1: Define Your Task

Before touching any code, clearly define what you want the model to do. Fine-tuning works best when the task is well-scoped:

  • Classification — assign one of N labels to an input (e.g. sentiment, intent, topic)
  • Information extraction — pull structured fields from unstructured text (e.g. NER, PII redaction)
  • Question answering — answer questions given a context document (open-book) or from learned knowledge (closed-book)
  • Tool calling — decide which function to call and with what arguments

The clearer your task definition, the better your fine-tuned model will perform.

Step 2: Prepare Your Data

You need examples that demonstrate the input-output behaviour you want. Each example should have:

  • A question (the input your model will receive)
  • An answer (the output you expect)
  • Optionally, a context (background information for open-book tasks)

You don’t need thousands of examples to start. With knowledge distillation, you can begin with as few as 10 seed examples and use a teacher model to generate synthetic training data.

{"question": "What is the return policy?", "answer": "30-day money-back guarantee"}
{"question": "How do I reset my password?", "answer": "Go to Settings > Security > Reset Password"}

How Much Data Do You Need?

ApproachExamples neededWhen to use
Few-shot distillation10–50Starting from scratch, rapid prototyping
Standard fine-tuning500–2,000You have labelled data available
Full dataset5,000+Maximum accuracy on complex tasks

Step 3: Choose a Base Model

Pick a student model based on your deployment constraints:

ModelParametersBest for
SmolLM2 135M135MUltra-low latency, edge devices
Qwen3 0.6B600MGood balance of size and capability
Llama 3.2 1B1BStrong general-purpose baseline
Llama 3.2 3B3BComplex tasks needing more capacity
Gemma 3 4B4BHigh accuracy with moderate resources

Step 4: Configure Training

Key hyperparameters to set:

  • Learning rate — start with 2e-4 for LoRA, 5e-5 for full fine-tuning
  • Epochs — 3–5 epochs is typical; watch for overfitting on small datasets
  • LoRA rankr=16 is a safe default; increase to 32–64 for complex tasks
  • Batch size — as large as your GPU memory allows

LoRA vs Full Fine-Tuning

For most use cases, LoRA (Low-Rank Adaptation) is the right choice:

  • Uses a fraction of the memory
  • Trains significantly faster
  • Produces results comparable to full fine-tuning
  • Makes it easy to swap adapters for different tasks

Full fine-tuning is worth considering only when you have large datasets (10k+ examples) and need to squeeze out every last percentage point of accuracy.

Step 5: Train the Model

With distil labs, training is as simple as providing your data and configuration:

base:
  task: classification
  student_model_name: Qwen3-0.6B
  teacher_model_name: Llama-3.3-70B-Instruct
tuning:
  num_train_epochs: 4
  use_lora: true
synthgen:
  generation_target: 1000

The platform handles the heavy lifting: synthetic data generation from a teacher model, training with LoRA adapters, and evaluation against your test set.

Step 6: Evaluate

Always evaluate your fine-tuned model against a held-out test set. Key metrics to track:

  • Task accuracy — does the model produce correct outputs?
  • Latency — how fast is inference?
  • Consistency — does the model behave predictably across similar inputs?

Compare your fine-tuned SLM against the teacher model. In our benchmarks, distilled students match or exceed the teacher on 8 of 10 datasets.

Step 7: Deploy

Once you’re satisfied with evaluation results, deploy your model. Fine-tuned SLMs are small enough to run almost anywhere:

  • Cloud API — deploy to a serverless endpoint for easy integration
  • On-premises — run on your own infrastructure for maximum data privacy
  • Edge devices — models under 3B parameters can run on mobile and IoT hardware

What’s Next?

Fine-tuning is an iterative process. Start with a small number of examples, evaluate, and improve your dataset based on where the model struggles. With each iteration, your SLM gets closer to production-ready performance.