What Small Language Model is best for Fine-Tuning

March 16, 2026
We benchmarked 15 small language models across 9 tasks and found that Qwen3-8B delivers the best fine-tuned performance, Liquid AI's LFM2-350M is the most tunable model despite having just 350M parameters, and a fine-tuned Qwen3-4B can match or beat a 120B+ teacher on 8 of 9 benchmarks at a fraction of the cost.

Full-Stack Production Language Models: Expert Model Optimization Meets Scalable GPU Infrastructure

March 6, 2026
Distil Labs is the developer platform for building custom small language models. Cerebrium is the serverless GPU infrastructure platform that powers Distil Labs' production workloads. The result is an end-to-end offering for any company looking to move from bloated, expensive LLM inference to lean, production-grade small-model deployments—without building or managing either the ML pipeline or the infrastructure themselves.

The 10x inference tax you don't have to pay

March 6, 2026
Small specialized models are faster, cheaper and the same quality as the models hosted by frontier LLM labs for many real-world tasks! Importantly, you can reach these results while having as few as 50 training data examples and self-host these small models on your own infrastructure!

When Does Reinforcement Learning Help Small Language Models?

February 26, 2026
We tested an RL stage on top of fine-tuned small language models across 12 datasets. The results split cleanly in two: Text generation tasks (QA, documentation, PII redaction): +2.0pp average. Every dataset improved. Structured tasks (classification, function calling): -0.7pp average. Two regressions. No consistent wins. Simple decision rule: → Classification or function calling? SFT alone. → QA, documentation, extraction? Add RLVR.

The LLM in Your Voice Assistant Is the latency Bottleneck. Replace It with an SLM.

February 21, 2026
Voice assistants use cloud LLMs for intent routing, but the LLM stage alone adds 375–750ms per turn and sends customer data off-premise. For defined workflows, a fine-tuned SLM is a better fit: our 0.6B model runs in ~40ms, scores 90.9% accuracy versus 87.5% for the 120B teacher, and cuts total pipeline latency from 680–1300ms to ~315ms.

Making FunctionGemma Work: Multi-Turn Tool Calling at 270M Parameters

February 26, 2026
Google's FunctionGemma is a 270M-parameter model purpose-built for function calling, small enough to run on a phone CPU at 125 tokens/sec. But it ships untrained for multi-turn use cases, and our benchmarks show it scores just 10-39% on multi-turn tool calling. We fine-tuned it using the Distil Labs platform on three tasks and pushed accuracy to 90-97%, matching or exceeding a 120B teacher model while staying 445× smaller.

How Knowunity used distil labs to cut their LLM bill by 50%

March 5, 2026
We show how Knowunity use the distil labs platform to fine-tune and deploy their own models, significantly reducing their LLM costs.

Teaching Small Language Models New Skills - Training a Local Cybersecurity Agent

February 11, 2026
Learn how fine-tuning a Small Language Model (SLM) delivers superior cybersecurity log analysis and threat classification compared to massive LLMs. By specializing in the MITRE ATT&CK framework, this secure solution ensures total data privacy while outperforming generalist models in local environments.

Train your SLM with distill-cli Claude Skill

January 27, 2026
Train a custom Text2SQL model by chatting with Claude and the Distil Labs skill. no ML expertise, no data labeling, just a conversation and a few examples.

We benchmarked 12 small language models across 8 tasks to find the best base model for fine-tuning

December 10, 2025
Fine-tuned 12 small models to find which ones are most tunable and perform best after fine-tuning. Surprise finding: Llama-3.2-1B showed the biggest improvement (most tunable), while Qwen3-4B delivered the best final performance - matching a 120B teacher on 7/8 tasks and outperforming by 19 points on the SQuAD 2.0 dataset.

Small expert agents from 10 examples

October 15, 2025
Distil labs turns a prompt and a few dozen examples into a small accurate expert agent. Our platform automates data generation, curation, fine-tuning, and evaluation—so you can reach LLM-level results with models 50–400× smaller, deployable almost anywhere, in hours.