Few-Shot Fine-Tuning: Train a Model with 10 Examples
Large language models are impressive few-shot learners — you give them a handful of examples in the prompt and they figure out the pattern. But what happens when you need consistent, fast, and private inference at scale? In-context learning starts to break down: prompts get expensive, latency climbs, and you’re still sending data to a third-party API.
Few-shot fine-tuning flips the script. Instead of stuffing examples into every prompt, you bake them directly into a small language model’s weights. The result is a compact, task-specific model that runs locally, responds in milliseconds, and doesn’t need examples at inference time.
What is few-shot fine-tuning?
Few-shot fine-tuning is the process of adapting a pre-trained language model using a very small labeled dataset — often between 5 and 50 examples. Unlike traditional fine-tuning that assumes hundreds or thousands of training samples, few-shot fine-tuning leverages the knowledge already embedded in the base model and nudges it toward your specific task with minimal data.
This approach works especially well with modern small language models (SLMs) like Llama 3.2 1B, Qwen3 0.6B, and Gemma 3 1B, which have been pre-trained on broad, diverse corpora.
Few-shot fine-tuning vs in-context learning
| In-Context Learning | Few-Shot Fine-Tuning | |
|---|---|---|
| Examples needed at inference | Yes (in every prompt) | No |
| Latency | Higher (longer prompts) | Lower (no examples in prompt) |
| Cost per request | Higher (more tokens) | Lower (smaller model, shorter prompts) |
| Consistency | Variable | High |
| Privacy | Data sent to API | Runs locally |
| Setup effort | Minimal | Requires training step |
The key insight: in-context learning is great for prototyping, but few-shot fine-tuning is better for production. Once you’ve validated that a task is solvable with a few examples, fine-tuning locks in that performance permanently.
How few-shot fine-tuning works with synthetic data
The secret to making few-shot fine-tuning work reliably is synthetic data generation. Here’s the workflow:
- Start with seed examples — Provide as few as 10 labeled examples that represent your task.
- Generate synthetic training data — A teacher model (like Llama 3.3 70B) uses your seed examples to generate hundreds or thousands of diverse, high-quality training samples.
- Fine-tune the student model — A small language model is fine-tuned on the synthetic dataset using LoRA or full fine-tuning.
- Evaluate — The fine-tuned student is benchmarked against the teacher to verify quality.
This is the core workflow behind distil labs — you bring a prompt and a few examples, and the platform handles synthetic data generation and fine-tuning automatically.
When does few-shot fine-tuning outperform prompting?
Few-shot fine-tuning tends to win in these scenarios:
- Structured output tasks — Classification, NER, information extraction, and tool calling benefit from the consistency that fine-tuning provides.
- High-volume inference — When you’re making thousands of predictions per hour, the cost savings from a smaller model compound quickly.
- Latency-sensitive applications — A 1B parameter model responds 10–50x faster than a 70B model behind an API.
- Privacy-constrained environments — Healthcare, finance, and government use cases where data cannot leave your infrastructure.
- Edge deployment — Running models on devices, on-prem servers, or in air-gapped environments.
Getting started
The fastest way to try few-shot fine-tuning is with the distil labs CLI:
pip install distil-cli
distil train --task classification --examples my_examples.jsonl
You provide your labeled examples in JSONL format, and the platform handles teacher evaluation, synthetic data generation, and student fine-tuning — all from a single command.
Further reading
- Small Expert Agents from 10 Examples — See how 10 seed examples can produce a production-ready agent.
- Vibe-Tuning: Fine-Tuning SLMs with a Prompt — Learn about prompt-driven fine-tuning without writing any code.