← All learn articles

Few-Shot Fine-Tuning: Train a Model with 10 Examples

Few-Shot Fine-Tuning: Train a Model with 10 Examples

Large language models are impressive few-shot learners — you give them a handful of examples in the prompt and they figure out the pattern. But what happens when you need consistent, fast, and private inference at scale? In-context learning starts to break down: prompts get expensive, latency climbs, and you’re still sending data to a third-party API.

Few-shot fine-tuning flips the script. Instead of stuffing examples into every prompt, you bake them directly into a small language model’s weights. The result is a compact, task-specific model that runs locally, responds in milliseconds, and doesn’t need examples at inference time.

What is few-shot fine-tuning?

Few-shot fine-tuning is the process of adapting a pre-trained language model using a very small labeled dataset — often between 5 and 50 examples. Unlike traditional fine-tuning that assumes hundreds or thousands of training samples, few-shot fine-tuning leverages the knowledge already embedded in the base model and nudges it toward your specific task with minimal data.

This approach works especially well with modern small language models (SLMs) like Llama 3.2 1B, Qwen3 0.6B, and Gemma 3 1B, which have been pre-trained on broad, diverse corpora.

Few-shot fine-tuning vs in-context learning

In-Context LearningFew-Shot Fine-Tuning
Examples needed at inferenceYes (in every prompt)No
LatencyHigher (longer prompts)Lower (no examples in prompt)
Cost per requestHigher (more tokens)Lower (smaller model, shorter prompts)
ConsistencyVariableHigh
PrivacyData sent to APIRuns locally
Setup effortMinimalRequires training step

The key insight: in-context learning is great for prototyping, but few-shot fine-tuning is better for production. Once you’ve validated that a task is solvable with a few examples, fine-tuning locks in that performance permanently.

How few-shot fine-tuning works with synthetic data

The secret to making few-shot fine-tuning work reliably is synthetic data generation. Here’s the workflow:

  1. Start with seed examples — Provide as few as 10 labeled examples that represent your task.
  2. Generate synthetic training data — A teacher model (like Llama 3.3 70B) uses your seed examples to generate hundreds or thousands of diverse, high-quality training samples.
  3. Fine-tune the student model — A small language model is fine-tuned on the synthetic dataset using LoRA or full fine-tuning.
  4. Evaluate — The fine-tuned student is benchmarked against the teacher to verify quality.

This is the core workflow behind distil labs — you bring a prompt and a few examples, and the platform handles synthetic data generation and fine-tuning automatically.

When does few-shot fine-tuning outperform prompting?

Few-shot fine-tuning tends to win in these scenarios:

  • Structured output tasks — Classification, NER, information extraction, and tool calling benefit from the consistency that fine-tuning provides.
  • High-volume inference — When you’re making thousands of predictions per hour, the cost savings from a smaller model compound quickly.
  • Latency-sensitive applications — A 1B parameter model responds 10–50x faster than a 70B model behind an API.
  • Privacy-constrained environments — Healthcare, finance, and government use cases where data cannot leave your infrastructure.
  • Edge deployment — Running models on devices, on-prem servers, or in air-gapped environments.

Getting started

The fastest way to try few-shot fine-tuning is with the distil labs CLI:

pip install distil-cli
distil train --task classification --examples my_examples.jsonl

You provide your labeled examples in JSONL format, and the platform handles teacher evaluation, synthetic data generation, and student fine-tuning — all from a single command.

Further reading