← All learn articles

LoRA vs Full Fine-Tuning: When to Use What

LoRA vs Full Fine-Tuning: When to Use What

Fine-tuning a language model means updating its weights so it performs better on your specific task. But you don’t always need to update every weight. That’s the core idea behind LoRA (Low-Rank Adaptation) — and understanding when it’s enough (and when it isn’t) can save you significant time and compute.

What Is Full Fine-Tuning?

Full fine-tuning updates all of the model’s parameters during training. For a 1B-parameter model, that means all one billion weights are adjusted to fit your dataset.

Pros:

  • Maximum flexibility — the model can change its behaviour significantly
  • Often delivers the best possible accuracy on a given task
  • Well-understood, standard approach

Cons:

  • High GPU memory requirements — you need to store gradients and optimizer states for every parameter
  • Slower training iterations
  • Risk of catastrophic forgetting on general capabilities
  • Produces a full-size model checkpoint for every experiment

What Is LoRA?

LoRA freezes the original model weights and injects small trainable matrices into each transformer layer. Instead of updating a large weight matrix W directly, LoRA learns two small matrices A and B such that the update is ΔW = A × B. The rank of these matrices (the “r” in LoRA) controls how expressive the adaptation is.

Pros:

  • Dramatically lower memory usage — often 5–10× less than full fine-tuning
  • Faster training iterations
  • Tiny adapter files (often < 100 MB) instead of full model checkpoints
  • Easy to swap adapters for different tasks on the same base model

Cons:

  • Slightly lower ceiling on task accuracy for complex tasks
  • Choosing the right rank and alpha requires some experimentation
  • Not every architecture benefits equally

When to Choose LoRA

LoRA is the right default for most practical scenarios:

  • Limited GPU budget — you can fine-tune a 3B model on a single consumer GPU with LoRA
  • Multiple tasks on one base model — swap lightweight adapters instead of managing multiple full checkpoints
  • Rapid experimentation — shorter training cycles mean faster iteration on data and hyperparameters
  • The task is well-scoped — classification, NER, and structured extraction tasks rarely need full fine-tuning to reach production quality

When to Choose Full Fine-Tuning

Full fine-tuning still makes sense in specific situations:

  • The task requires deep behavioural change — for example, teaching a model a new output format or reasoning style that differs substantially from its pre-training
  • You have ample compute and data — if cost isn’t a constraint and you want to squeeze out every last point of accuracy
  • Very small models — for models under 500M parameters, the memory savings of LoRA are less meaningful, and full fine-tuning is straightforward

Practical Recommendations

FactorLoRAFull Fine-Tuning
GPU memory (1B model)~6–8 GB~24+ GB
Training speedFaster per stepSlower per step
Adapter size10–100 MBFull model (2–16 GB)
Accuracy ceilingVery close to full FTHighest possible
Best forScoped tasks, fast iterationDeep adaptation, small models

For most teams fine-tuning small language models for production tasks like classification, question answering, or tool calling, LoRA is the recommended starting point. You can always fall back to full fine-tuning if you hit an accuracy wall — but in our benchmarks across dozens of tasks, that rarely happens.

How distil labs Handles This

When you fine-tune a model with distil labs, LoRA is enabled by default. The platform automatically selects a sensible rank and alpha based on the model size and task type, so you don’t need to tune these hyperparameters yourself. If you need full fine-tuning, you can disable LoRA in your configuration — but we recommend trying LoRA first.