Blog & Demos

Tutorials, case studies, benchmarks, and open-source demos — everything you need to build with small language models.

Distil PII Redactor: an OpenClaw Skill
Guide Information ExtractionOn-Prem / Edge

Distil PII Redactor: an OpenClaw Skill

Locally redact PII from text using a fine-tuned 1B parameter model packaged as an OpenClaw skill. Your sensitive data never leaves your machine.

Autonomous Bug Fixing Agent with distil labs' SLM and Warp Oz
Demo Agentic AIOn-Prem / Edge

Autonomous Bug Fixing Agent with distil labs' SLM and Warp Oz

A self-healing loop that diagnoses production failures with a fine-tuned 0.6B SLM and applies the fix with Warp Oz — closing incidents in seconds, no humans paged.

Train an SLM from your production traces with the distil labs Claude skill
Guide Tool CallingAgentic AI

Train an SLM from your production traces with the distil labs Claude skill

A walkthrough of using the distil labs Claude skill to turn 327 noisy production traces into a fine-tuned Qwen3-1.7B multi-turn tool-calling model, deployed on a managed endpoint in a single conversation.

Why training on production traces fails (and what to do instead)
Benchmark Tool CallingAgentic AI

Why training on production traces fails (and what to do instead)

Training directly on production traces doesn't work as well as you'd expect. We tested across five scenarios and synthetic data from traces scores up to 26 percentage points higher in accuracy.

Fine-Tuning Liquid's LFM2.5: Accurate Tool Calling at 350M Parameters
Benchmark Tool Calling

Fine-Tuning Liquid's LFM2.5: Accurate Tool Calling at 350M Parameters

Liquid AI's LFM2.5-350M reaches 96-98% tool call equivalence after fine-tuning with distil labs across three benchmarks, matching or exceeding a 120B teacher model while staying at 350M parameters.

What Small Language Model Is Best for Fine-Tuning
Benchmark ClassificationQuestion AnsweringTool Calling

What Small Language Model Is Best for Fine-Tuning

We benchmarked 15 small language models across 9 tasks to find the best base model for fine-tuning. Qwen3-8B ranks #1 overall. Liquid AI's LFM2 family is the most tunable. Fine-tuned Qwen3-4B matches a 120B+ teacher on 8 of 9 benchmarks.

A 0.6B model outperformed a 120B LLM by 29 points - using dlt, distil labs, and Hugging Face
Case Study ClassificationInformation Extraction

A 0.6B model outperformed a 120B LLM by 29 points - using dlt, distil labs, and Hugging Face

How to turn production LLM traces into a deployed specialist model using dlt for trace extraction and distil labs for training, achieving 79% exact match with a 0.6B model that beats a 120B teacher by 29 points.

Full-Stack Production Language Models: Expert Model Optimization Meets Scalable GPU Infrastructure
Guide Inference

Full-Stack Production Language Models: Expert Model Optimization Meets Scalable GPU Infrastructure

How distil labs and Cerebrium combine expert model optimization with serverless GPU infrastructure to deliver an end-to-end stack for replacing expensive LLM inference with lean, production-grade small-model deployments.

The 10x Inference Tax You Don't Have to Pay
Benchmark ClassificationQuestion AnsweringTool Calling

The 10x Inference Tax You Don't Have to Pay

Benchmarking fine-tuned small language models (0.6B-8B) against 10 frontier LLMs across 8 datasets shows that task-specific SLMs match or beat frontier models at 10-100x lower inference cost.