Config file

The configuration file controls the training pipeline through five main sections: base, tuning, evaluation, synthgen, and trace_processing. Each section handles a specific aspect of the model training process.

File format

The configuration file supports two formats depending on how you interact with distil labs:

Webapp: Use JSON format (.json file)
API: Use YAML format (.yaml file)

Both formats are functionally equivalent—choose based on your workflow. Examples in this documentation show YAML, but the JSON equivalent is straightforward:

# YAML (API)
base:
  task: question-answering
  student_model_name: Llama-3.2-3B-Instruct

// JSON (Webapp)
{
  "base": {
    "task": "question-answering",
    "student_model_name": "Llama-3.2-3B-Instruct"
  }
}

Configuration structure

base:
  # General parameters (task is required)
  task: classification

tuning:
  # Fine-tuning parameters
  num_train_epochs: 32

evaluation:
  # Evaluation parameters
  num_few_shot_examples: 1

synthgen:
  # Synthetic data generation parameters
  generation_target: 10000

trace_processing:
  # Trace processing parameters
  relabel: true

Base configuration

General parameters relevant to the overall task.

Parameter	Type	Default	Description
`task`	`string`	required	Type of NLP task to be solved. See supported task types below.
`student_model_name`	`string`	`Llama-3.2-1B-Instruct`	Base model to use for the student model. This is the model we finetune for your use-case.
`teacher_model_name`	`string`	`openai.gpt-oss-120b`	Teacher model used to generate synthetic data and from which we distil knowledge.
`random_seed`	`integer \| null`	`123`	Random seed used across distillib for reproducible random sampling.
`llm_num_parallel_requests`	`integer`	`4`	Maximum number of LLM requests to send in parallel across the teacher, synthgen, and judge pipelines. Set to 1 to disable parallelism.

Supported task types

Task	Value	Description
Question Answering	`question-answering`	Extract or generate answers from text based on queries
Classification	`classification`	Assign text to categories from a fixed set
Tool Calling	`tool-calling-closed-book`	Select and invoke functions based on user requests
Multi-turn Tool Calling	`multi-turn-tool-calling-closed-book`	Handle multi-step conversations with function calls
Open Book QA (RAG)	`question-answering-open-book`	Answer questions using provided context passages
Closed Book QA	`question-answering-closed-book`	Answer questions using knowledge learned during training

Supported student models

Model	Value
Llama 3.2 1B Instruct	`Llama-3.2-1B-Instruct`
Llama 3.2 3B Instruct	`Llama-3.2-3B-Instruct`
Llama 3.1 8B Instruct	`Llama-3.1-8B-Instruct`
SmolLM2 135M	`SmolLM2-135M-Instruct`
SmolLM2 1.7B	`SmolLM2-1.7B-Instruct`
FunctionGemma 270M	`functiongemma-270m-it`
Gemma 3 270M	`gemma-3-270m-it`
Gemma 3 1B	`gemma-3-1b-it`
Gemma 3 4B	`gemma-3-4b-it`
Qwen3 0.6B	`Qwen3-0.6B`
Qwen3 1.7B	`Qwen3-1.7B`
Qwen3 4B	`Qwen3-4B-Instruct-2507`
Qwen3 8B	`Qwen3-8B`
Liquid LFM2 350M	`LFM2-350M`
Liquid LFM2 1.2B	`LFM2-1.2B`
Liquid LFM2 2.6B	`LFM2-2.6B`
Liquid LFM2.5 350M	`LFM2.5-350M`
Liquid LFM2.5 1.2B Instruct	`LFM2.5-1.2B-Instruct`

Supported teacher models

Model	Value
GPT OSS 120B	`openai.gpt-oss-120b`
GPT OSS 120B Thinking	`openai.gpt-oss-120b-thinking`
GPT OSS 20B	`openai.gpt-oss-20b`
GPT OSS 20B Thinking	`openai.gpt-oss-20b-thinking`
DeepSeek V3.1	`deepseek.v3.1`
Qwen3 235B A22B	`Qwen3-235B-A22B-Instruct-2507`
Qwen3 480B A35B Coder	`Qwen3-480B-A35B-Coder`
Qwen2.5 VL 72B	`Qwen2.5-VL-72B-Instruct`
ZAI GLM 5	`zai.glm-5`
ZAI GLM 5 Thinking	`zai.glm-5-thinking`
Moonshot Kimi K2 Thinking	`moonshotai.kimi-k2-thinking`
Moonshot Kimi K2.5	`moonshotai.kimi-k2.5`
MiniMax M2 Thinking	`minimax.minimax-m2-thinking`

Tuning configuration

Parameters controlling the finetuning of the student model.

Parameter	Type	Default	Description
`learning_rate`	`float`	`5e-5`	The initial learning rate for AdamW optimizer.
`learning_rate_scheduler`	`string`	`linear`	The scheduler type to use. Options: `cosine`, `linear`, `constant`.
`weight_decay`	`float`	`0.0`	Weight decay applied to all layers except bias and LayerNorm weights in AdamW optimizer.
`warmup_ratio`	`float`	`0.05`	Ratio of total training steps used for linear warmup from 0 to `learning_rate`.
`bf16`	`boolean`	`true`	Whether to use bf16 16-bit (mixed) precision training instead of 32-bit training.
`use_lora`	`boolean`	`true`	Whether to use LoRA for student training.
`lora_r`	`integer`	`64`	LoRA attention dimension (rank). Only used if `use_lora` is true.
`lora_alpha_multiplier`	`integer`	`1`	Alpha parameter for LoRA scaling is `lora_r * lora_alpha_multiplier`. Only used if `use_lora` is true.
`per_device_train_batch_size`	`integer`	`1`	Batch size per GPU/device for training.
`per_device_eval_batch_size`	`integer`	`1`	Batch size per GPU/device for evaluation.
`num_train_epochs`	`integer`	`4`	Total number of training epochs.
`train_eval_split`	`float`	`0.2`	Fraction of training data used for evaluation. Must be between 0 and 1 (exclusive).
`gradient_accumulation_steps`	`integer`	`1`	Number of update steps to accumulate gradients before performing a backward/update pass. Effectively multiplies the batch size by this factor without increasing memory usage.
`num_few_shot_examples_student`	`integer`	`0`	Number of few-shot examples when running student evaluation and tuning. If above 0, at least one example per class is used for classification tasks.
`memory_optimized_training`	`boolean`	`false`	Enable activation offloading and gradient checkpointing to reduce GPU memory usage at the cost of significantly slower training. Only enable this if training runs out of GPU memory.
`use_qlora`	`boolean`	`false`	Load the base model in 4-bit NF4 (QLoRA) during finetuning, then attach LoRA adapters in higher precision. Reduces base-model VRAM by roughly 3x at the cost of slightly slower training. Only takes effect when `use_lora` is true. Requires bitsandbytes (Linux only).

RLVR (Reinforcement Learning with Verifiable Rewards)

RLVR is an optional reinforcement learning stage that runs after SFT finetuning. It uses reward signals from an LLM-as-a-judge to further improve model performance. Set rlvr_dataset_size to a value greater than 0 to enable it.

Parameter	Type	Default	Description
`rlvr_dataset_size`	`float`	`0.0`	Proportion of the dataset to use for the RLVR split. Must be between 0.0 and 1.0. Default `0.0` means RLVR is disabled.
`rlvr_llm_as_a_judge_model_name`	`string`	`openai.gpt-oss-120b`	Model used to power the LLM-as-a-judge for RLVR reward signals.
`rlvr_per_device_batch_size`	`integer`	`6`	Batch size per GPU/device for RLVR training and evaluation. Must be a multiple of `rlvr_num_generations`.
`rlvr_num_generations`	`integer`	`6`	Number of generations per prompt during RLVR training.
`rlvr_num_train_epochs`	`integer`	`1`	Number of training epochs for RLVR finetuning.

Evaluation configuration

Parameters used in teacher evaluation.

Parameter	Type	Default	Description
`num_few_shot_examples`	`integer`	`1`	Number of few-shot examples when running teacher evaluation. If above 0, at least one example per class is used for classification tasks.
`llm_as_a_judge_model_name`	`string`	`openai.gpt-oss-120b`	Model used to power the LLM-as-a-judge evaluation.
`expand_tool_calling_turns`	`boolean`	`true`	If true, each line in multi-turn tool calling test files is expanded into multiple evaluation lines, each ending at a tool call.

Synthetic generation configuration

Parameters for fine-grained control over synthetic data generation.

Parameter	Type	Default	Description
`generation_target`	`integer`	`10000`	Target number of synthetic examples to generate. For Closed-Book QA, this is calculated as `len(unstructured_data) * generation_per_unstructured_context`.
`generation_in_single_call`	`integer`	`4`	Number of examples to generate per teacher/LLM invocation.
`generation_iteration_size`	`integer`	`128`	Batch size for the generate-validate cycle.
`generation_per_unstructured_context`	`integer \| null`	`null`	Examples to generate per unstructured context. Only used with `question-answering-closed-book` task. Overwrites `generation_target` when set.
`num_positive_exemplars_per_generation`	`integer`	`2`	Number of in-context examples for the class/task being generated.
`num_negative_exemplars_per_generation`	`integer`	`2`	Number of in-context examples for classes not being generated. Only used for classification tasks.
`num_unlabelled_exemplars_per_generation`	`integer`	`2`	Number of unlabelled examples provided during each teacher invocation.
`validation_max_total_length`	`integer`	`10000`	Maximum total length (input + output) of generated examples in characters.
`validation_similarity_threshold`	`float`	`0.95`	Similarity threshold for deduplication. Generated data with similarity above this threshold to seed data are removed.
`teacher_temperature`	`float`	`0.7`	Temperature for teacher output. Controls balance between predictability and creativity. Must be between 0.0 and 1.0.
`teacher_max_tokens`	`integer`	`32000`	Maximum number of tokens in the generated response. Kept well below typical model context limits so the reserved output budget does not crowd out large (e.g. multi-image) prompts.
`match_generated_distribution_to_seed`	`boolean`	`false`	Match generated data class distribution to seed data. Only used for classification tasks.
`num_distractor_context_blocks`	`integer`	`0`	Number of distractor context blocks per example. Setting above zero enables RAFT training.
`output_is_json`	`boolean`	`false`	Only generate synthetic data with valid JSON outputs. Only relevant for QA tasks.
`basic_mutators_to_use`	`list[string]`	`["complexity"]`	List of basic mutators to use for data generation. Supported options: `complexity`, `length`, `specificity`.
`mutation_topics`	`list[list[string]] \| list[string]`	`[]`	Selection of topics to sample from to guide the generation process.

Trace processing configuration

Parameters for the trace processing pipeline, which converts production traces into training and testing data.

Parameter	Type	Default	Description
`relabel`	`boolean`	`true`	If true, use a committee of models to relabel trace examples. If false, use the original labels from traces.
`relevance_filtering`	`boolean`	`true`	If true, score each trace with an LLM and drop those below the relevance / coherence thresholds. If false, relevance filtering is skipped entirely and every seed trace flows straight to the next step.
`relevance_filtering_batch_size`	`integer`	`32`	Number of examples scored per batch during relevance filtering.
`min_relevance_score`	`integer`	`4`	Minimum relevance score (1-5) for a trace to pass relevance filtering.
`min_coherence_score`	`integer`	`3`	Minimum coherence score (1-5) for a trace to pass coherence filtering. Lower values allow more corrupted traces through for committee repair.
`num_traces_as_training_base`	`integer`	`200`	Number of traces to use as the seed for generating training examples. Unused traces beyond this count are used as unstructured data.
`num_traces_as_testing_base`	`integer`	`200`	Number of traces to use as the seed for generating testing examples. Unused traces beyond this count are used as unstructured data. Ignored if a test set is provided.
`min_generated_examples`	`integer`	`20`	Minimum number of examples that trace processing must produce. Raises an error if fewer are generated, to prevent training with too few examples.
`max_unstructured`	`integer`	`10000`	Maximum number of unstructured data examples to include.
`observation_format`	`string`	`openai_messages`	Format of trace observations in `traces.jsonl`. Options: `langfuse` (Langfuse observation objects with id, input, output), `openai_messages` (objects with a `messages` array of chat completion messages), `openai_messages_with_images` (OpenAI messages that may include images), `unstructured_with_openai_messages` (unstructured data with OpenAI messages).
`remove_system_prompt_from_traces`	`boolean`	`true`	If true, strip leading system messages from traces (before unstructured export) and from processed examples. Defaults to true because the system prompt is typically captured by the job description, and keeping it in the conversation breaks the single-turn `[user, assistant]` shape expected at the training boundary.
`compress_job_description`	`boolean`	`false`	If true, compress the job description using the teacher model before relevance filtering. Useful when the task description is very long and would overwhelm the filtering LLM.
`teacher_model_name`	`string`	`zai.glm-5`	Teacher model used for relevance filtering and picking the best relabelled answer from the committee.
`relabelling_committee_models`	`list[string]`	`[]`	If the list is non-empty, models in the list are used to produce candidate relabels. Each model generates an output for every example and the trace processing teacher aggregates them into the final relabel. Only used when `relabel` is true.

Example configuration

Minimal configuration

base:
  task: question-answering
  student_model_name: Llama-3.2-3B-Instruct
  teacher_model_name: openai.gpt-oss-120b

Full configuration example

base:
  task: question-answering-open-book
  student_model_name: Qwen3-1.7B
  teacher_model_name: openai.gpt-oss-120b
  random_seed: 42

tuning:
  learning_rate: 1e-4
  learning_rate_scheduler: cosine
  use_lora: true
  lora_r: 32
  num_train_epochs: 3
  train_eval_split: 0.15

evaluation:
  num_few_shot_examples: 2

synthgen:
  generation_target: 5000
  generation_in_single_call: 8
  teacher_temperature: 0.6
  validation_similarity_threshold: 0.9

trace_processing:
  relabel: true
  num_traces_as_training_base: 5000
  num_traces_as_testing_base: 100

Model-specific notes

Reasoning teacher models

Reasoning teacher models (the GPT OSS, DeepSeek, GLM, Kimi, and MiniMax families) require a teacher temperature between 0.5 and 0.7. Configurations that set synthgen.teacher_temperature outside this range will raise a validation error.

GPT OSS 120B Thinking

The openai.gpt-oss-120b-thinking model uses a medium reasoning effort setting by default for enhanced chain-of-thought capabilities.

Tool Calling

Tool calling tasks have specific model compatibility requirements:

Student models: Only Qwen3, Llama 3-family, LFM2/LFM2.5, and FunctionGemma models are supported for tool-calling-closed-book and multi-turn-tool-calling-closed-book tasks.

Teacher models for multi-turn: Multi-turn tool calling (multi-turn-tool-calling-closed-book) requires a teacher from the tested tool-calling allowlist. Common choices include:

openai.gpt-oss-120b / openai.gpt-oss-120b-thinking
openai.gpt-oss-20b
Qwen3-235B-A22B-Instruct-2507
zai.glm-5 / zai.glm-5-thinking
moonshotai.kimi-k2-thinking / moonshotai.kimi-k2.5
minimax.minimax-m2-thinking