Guides
How-tos and strategic perspectives on building with small language models.
Full-Stack Production Language Models: Expert Model Optimization Meets Scalable GPU Infrastructure
How distil labs and Cerebrium combine expert model optimization with serverless GPU infrastructure to deliver an end-to-end stack for replacing expensive LLM inference with lean, production-grade small-model deployments.
From Production Traces to a Faster, Cheaper, Accurate Model
Learn how to turn your production LLM agent traces into a compact specialist model that outperforms the original, with zero manual annotation and deployment in under 12 hours.
How SLMs Can Enable On-Device RAG - Making Industrial Machinery More Usable
Fine-tuned 1B parameter models can match the accuracy of 3B base models on domain-specific documentation — making on-device RAG viable for industrial equipment without expensive AI-optimized hardware. We tested this on a Siemens PLC manual and achieved a +16 percentage point accuracy gain through distillation.
The LLM in Your Voice Assistant Is the Latency Bottleneck. Replace It with an SLM.
Voice assistants using cloud LLMs add 700+ms of latency per turn. A fine-tuned small language model drops the brain stage to ~40ms while matching or exceeding LLM accuracy on bounded tasks, with full data privacy.
Vibe-Tuning: The Art of Fine-Tuning Small Language Models with a Prompt
Fine-tuning is a pain – you need datasets, ML expertise, and a stack of GPUs just to get started. Not anymore. With model vibe-tuning, you go from prompt to production-ready model without these headaches. This blog post shows you exactly how to build one, starting with just a prompt.
Train Your SLM with the distil labs Claude Skill
A step-by-step walkthrough of training a Text2SQL small language model using the distil labs Claude Code skill, going from raw conversation data to a working local model in a single conversation.
distil-PII: Family of PII Redaction SLMs
We trained and released a family of small language models specialized for policy-aware PII redaction that dramatically outperform their pre-trained counterparts.
distil labs: Small Models, Big Wins – Using SLMs in Agentic AI
How small language models can match or beat much larger LLMs when fine-tuned to well-scoped tasks, enabling faster, cheaper, and more private agentic AI workflows.
distil labs: Small Expert Agents from 10 Examples
An overview of how distil labs turns a prompt and a few dozen examples into a small, accurate expert agent that matches LLM-level results with models 50-400x smaller.