Config file
The configuration file controls the training pipeline through four main sections: base, tuning, evaluation, and synthgen. Each section handles a specific aspect of the model training process.
File format
Section titled “File format”The configuration file supports two formats depending on how you interact with distil labs:
- Webapp: Use JSON format (
.jsonfile) - API: Use YAML format (
.yamlfile)
Both formats are functionally equivalent—choose based on your workflow. Examples in this documentation show YAML, but the JSON equivalent is straightforward:
# YAML (API)
base:
task: question-answering
student_model_name: Llama-3.2-3B-Instruct
// JSON (Webapp)
{
"base": {
"task": "question-answering",
"student_model_name": "Llama-3.2-3B-Instruct"
}
}
Configuration structure
Section titled “Configuration structure”base:
# General parameters (task is required)
task: classification
tuning:
# Fine-tuning parameters
num_train_epochs: 32
evaluation:
# Evaluation parameters
num_few_shot_examples: 1
synthgen:
# Synthetic data generation parameters
generation_target: 10000
Base configuration
Section titled “Base configuration”General parameters relevant to the overall task.
| Parameter | Type | Default | Description |
|---|---|---|---|
task | string | required | Type of NLP task to be solved. See supported task types below. |
student_model_name | string | Llama-3.2-1B-Instruct | Base model to use for the student model. This is the model we finetune for your use-case. |
teacher_model_name | string | Llama-3.3-70B-Instruct | Teacher model used to generate synthetic data and from which we distil knowledge. |
random_seed | integer | null | 123 | Random seed used across distillib for reproducible random sampling. |
Supported task types
Section titled “Supported task types”| Task | Value | Description |
|---|---|---|
| Question Answering | question-answering | Extract or generate answers from text based on queries |
| Classification | classification | Assign text to categories from a fixed set |
| Tool Calling | tool-calling-closed-book | Select and invoke functions based on user requests |
| Multi-turn Tool Calling | multi-turn-tool-calling-closed-book | Handle multi-step conversations with function calls |
| Open Book QA (RAG) | question-answering-open-book | Answer questions using provided context passages |
| Closed Book QA | question-answering-closed-book | Answer questions using knowledge learned during training |
Supported student models
Section titled “Supported student models”| Model | Value |
|---|---|
| Llama 3.2 1B Instruct | Llama-3.2-1B-Instruct |
| Llama 3.2 3B Instruct | Llama-3.2-3B-Instruct |
| Llama 3.1 8B Instruct | Llama-3.1-8B-Instruct |
| SmolLM2 135M | SmolLM2-135M-Instruct |
| SmolLM2 1.7B | SmolLM2-1.7B-Instruct |
| FunctionGemma 270M | functiongemma-270m-it |
| Gemma 3 270M | gemma-3-270m-it |
| Gemma 3 1B | gemma-3-1b-it |
| Gemma 3 4B | gemma-3-4b-it |
| Qwen3 0.6B | Qwen3-0.6B |
| Qwen3 1.7B | Qwen3-1.7B |
| Qwen3 4B | Qwen3-4B-Instruct-2507 |
| Qwen3 8B | Qwen3-8B |
| IBM Granite 3.1 8B | granite-3.1-8b-instruct |
| IBM Granite 3.3 8B | granite-3.3-8b-instruct |
Supported teacher models
Section titled “Supported teacher models”| Model | Value |
|---|---|
| DeepSeek R1 | deepseek.r1 |
| DeepSeek V3.1 | deepseek.v3.1 |
| Qwen3 235B A22B | Qwen3-235B-A22B-Instruct-2507 |
| Qwen3 480B A35B Coder | Qwen3-480B-A35B-Coder |
| Qwen2.5 VL 72B | Qwen2.5-VL-72B-Instruct |
| Llama 3.1 405B Instruct | Llama-3.1-405B-Instruct |
| Llama 3.1 8B Instruct | Llama-3.1-8B-Instruct |
| Llama 3.3 70B Instruct | Llama-3.3-70B-Instruct |
| GPT OSS 20B | openai.gpt-oss-20b |
| GPT OSS 120B | openai.gpt-oss-120b |
| GPT OSS 120B Thinking | openai.gpt-oss-120b-thinking |
Tuning configuration
Section titled “Tuning configuration”Parameters controlling the finetuning of the student model.
| Parameter | Type | Default | Description |
|---|---|---|---|
learning_rate | float | 5e-5 | The initial learning rate for AdamW optimizer. |
learning_rate_scheduler | string | linear | The scheduler type to use. Options: cosine, linear, constant. |
weight_decay | float | 0.0 | Weight decay applied to all layers except bias and LayerNorm weights in AdamW optimizer. |
warmup_ratio | float | 0.05 | Ratio of total training steps used for linear warmup from 0 to learning_rate. |
bf16 | boolean | true | Whether to use bf16 16-bit (mixed) precision training instead of 32-bit training. |
use_lora | boolean | true | Whether to use LoRA for student training. |
lora_r | integer | 64 | LoRA attention dimension (rank). Only used if use_lora is true. |
lora_alpha_multiplier | integer | 1 | Alpha parameter for LoRA scaling is lora_r * lora_alpha_multiplier. Only used if use_lora is true. |
per_device_train_batch_size | integer | 1 | Batch size per GPU/device for training. |
per_device_eval_batch_size | integer | 1 | Batch size per GPU/device for evaluation. |
num_train_epochs | integer | 4 | Total number of training epochs. |
train_eval_split | float | 0.2 | Fraction of training data used for evaluation. Must be between 0 and 1 (exclusive). |
gradient_accumulation_steps | integer | 1 | Number of update steps to accumulate gradients before performing a backward/update pass. Effectively multiplies the batch size by this factor without increasing memory usage. |
num_few_shot_examples_student | integer | 0 | Number of few-shot examples when running student evaluation and tuning. If above 0, at least one example per class is used for classification tasks. |
RLVR (Reinforcement Learning with Verifiable Rewards)
Section titled “RLVR (Reinforcement Learning with Verifiable Rewards)”RLVR is an optional reinforcement learning stage that runs after SFT finetuning. It uses reward signals from an LLM-as-a-judge to further improve model performance. Set rlvr_dataset_size to a value greater than 0 to enable it.
| Parameter | Type | Default | Description |
|---|---|---|---|
rlvr_dataset_size | float | 0.0 | Proportion of the dataset to use for the RLVR split. Must be between 0.0 and 1.0. Default 0.0 means RLVR is disabled. |
rlvr_llm_as_a_judge_model_name | string | openai.gpt-oss-20b | Model used to power the LLM-as-a-judge for RLVR reward signals. |
rlvr_per_device_batch_size | integer | 6 | Batch size per GPU/device for RLVR training and evaluation. Must be a multiple of rlvr_num_generations. |
rlvr_num_generations | integer | 6 | Number of generations per prompt during RLVR training. |
rlvr_num_train_epochs | integer | 1 | Number of training epochs for RLVR finetuning. |
Evaluation configuration
Section titled “Evaluation configuration”Parameters used in teacher evaluation.
| Parameter | Type | Default | Description |
|---|---|---|---|
num_few_shot_examples | integer | 1 | Number of few-shot examples when running teacher evaluation. If above 0, at least one example per class is used for classification tasks. |
llm_as_a_judge_model_name | string | openai.gpt-oss-120b | Model used to power the LLM-as-a-judge evaluation. |
expand_tool_calling_turns | boolean | true | If true, each line in multi-turn tool calling test files is expanded into multiple evaluation lines, each ending at a tool call. |
batch_size | integer | 4 | (Deprecated) Batch size for model evaluation. |
Synthetic generation configuration
Section titled “Synthetic generation configuration”Parameters for fine-grained control over synthetic data generation.
| Parameter | Type | Default | Description |
|---|---|---|---|
generation_target | integer | 10000 | Target number of synthetic examples to generate. For Closed-Book QA, this is calculated as len(unstructured_data) * generation_per_unstructured_context. |
generation_in_single_call | integer | 4 | Number of examples to generate per teacher/LLM invocation. |
generation_iteration_size | integer | 128 | Batch size for the generate-validate cycle. |
generation_per_unstructured_context | integer | null | null | Examples to generate per unstructured context. Only used with question-answering-closed-book task. Overwrites generation_target when set. |
num_positive_exemplars_per_generation | integer | 2 | Number of in-context examples for the class/task being generated. |
num_negative_exemplars_per_generation | integer | 2 | Number of in-context examples for classes not being generated. Only used for classification tasks. |
num_unlabelled_exemplars_per_generation | integer | 2 | Number of unlabelled examples provided during each teacher invocation. |
validation_max_total_length | integer | 10000 | Maximum total length (input + output) of generated examples in characters. |
validation_similarity_threshold | float | 0.95 | Similarity threshold for deduplication. Generated data with similarity above this threshold to seed data are removed. |
validation_max_answer_length | integer | 8192 | (Deprecated) Use validation_max_total_length instead. |
teacher_temperature | float | 0.7 | Temperature for teacher output. Controls balance between predictability and creativity. Must be between 0.0 and 1.0. |
teacher_max_tokens | integer | null | null | Maximum tokens in the generated response. |
match_generated_distribution_to_seed | boolean | false | Match generated data class distribution to seed data. Only used for classification tasks. |
num_distractor_context_blocks | integer | 0 | Number of distractor context blocks per example. Setting above zero enables RAFT training. |
output_is_json | boolean | false | Only generate synthetic data with valid JSON outputs. Only relevant for QA tasks. |
basic_mutators_to_use | list[string] | ["complexity"] | List of basic mutators to use for data generation. Supported options: complexity, length, specificity. |
mutation_topics | list[list[string]] | list[string] | [] | Selection of topics to sample from to guide the generation process. |
parallel_llm_calls | boolean | false | If true, call the LLM in parallel during data generation and evaluation. |
Example configuration
Section titled “Example configuration”Minimal configuration
Section titled “Minimal configuration”base:
task: question-answering
student_model_name: Llama-3.2-3B-Instruct
teacher_model_name: openai.gpt-oss-120b
Full configuration example
Section titled “Full configuration example”base:
task: question-answering-open-book
student_model_name: Qwen3-1.7B
teacher_model_name: openai.gpt-oss-120b
random_seed: 42
tuning:
learning_rate: 1e-4
learning_rate_scheduler: cosine
use_lora: true
lora_r: 32
num_train_epochs: 3
train_eval_split: 0.15
evaluation:
num_few_shot_examples: 2
synthgen:
generation_target: 5000
generation_in_single_call: 8
teacher_temperature: 0.6
validation_similarity_threshold: 0.9
Model-specific notes
Section titled “Model-specific notes”DeepSeek R1
Section titled “DeepSeek R1”When using deepseek.r1 as the teacher model, the recommended temperature range is 0.5 to 0.7. Configurations with temperatures outside this range will raise a validation error.
GPT OSS 120B Thinking
Section titled “GPT OSS 120B Thinking”The openai.gpt-oss-120b-thinking model uses a medium reasoning effort setting by default for enhanced chain-of-thought capabilities.
Tool Calling
Section titled “Tool Calling”Tool calling tasks have specific model compatibility requirements:
Student models: Only Qwen3 and Llama 3-family models are supported for tool-calling-closed-book and multi-turn-tool-calling-closed-book tasks.
Teacher models for multi-turn: Multi-turn tool calling (multi-turn-tool-calling-closed-book) requires one of the following teacher models:
Qwen3-235B-A22B-Instruct-2507Llama-3.1-405B-Instructopenai.gpt-oss-120b