What are we building?

What if you could train a 1B parameter model to match a frontier model on your task from just 10 examples? At distil labs, we’re building the platform that makes this possible. Our pipeline turns a handful of seed examples, or collected traces from real use-cases, into production-ready small language models that match or exceed cloud LLM accuracy while running privately on-prem or at the edge.

We’re looking for an ML Science Intern to join us for 4-6 months and tackle an open research problem at the core of our platform.

About the internship

As a Machine Learning Science Intern, you’ll join the small, high-impact research group that powers our platform. You’ll work side by side with experienced ML scientists and engineers to explore new ideas in knowledge distillation, synthetic data generation and model self-improvement, then turn the best of them into prototypes (or production features if you wish) and publishable papers. Expect fast iteration cycles, lots of autonomy and a direct line of sight from your code to real customers. The internship is planned for 4-6 months of full-time work, which is enough time to explore an idea in depth.

Example areas you might work on:

Synthetic data diversity: how do we ensure generated training data covers the right distribution without drifting off-task? We model controllable properties (length, topic, style) of new data to sample the examples we need, and there’s plenty of room to push this further.
Validation and filtering: what makes a synthetic example “good enough” to train on? We currently use similarity thresholds, schema checks and deduplication, but LLM-as-judge and learned quality models are open questions.
Knowledge distillation at the edge: can we distill into even smaller models (100M-1B parameters) without losing task performance? What about multi-task or multi-modal distillation?
Reasoning and tool use: our tool-calling benchmarks show large gains from distillation. How far can we push small models on agentic tasks?

Key responsibilities

Research and prototyping: implement, benchmark and iterate on novel algorithms for transferring knowledge from large models to smaller without sacrificing accuracy.
Evaluation: help expand our internal benchmark suite to measure latency, memory footprint and task performance across hardware profiles.
Production testing: work with the engineering team to try successful prototypes on real workloads and learn from the results.
Collaboration: pair with engineers to integrate successful experiments into our Python/PyTorch stack and customer-facing APIs.
Publication: prepare and publish benchmark results and research findings as scientific papers at top-tier machine-learning conferences.
Communication and culture: share learnings in weekly demos, contribute to paper-reading clubs and give feedback on product direction.

What you’ll bring

Currently enrolled in a PhD program in Computer Science, Machine Learning, Robotics, Statistics, Mathematics or a related field.
Solid understanding of machine-learning fundamentals and at least one of deep learning, NLP, reinforcement learning or optimization.
Proficiency in Python and modern ML frameworks (PyTorch, JAX or TensorFlow).
Ability to translate an open-ended research question into a clear experimental plan.

Bonus points for

First-author or co-author on a workshop, conference or journal paper.
Hands-on experience with model-compression techniques (distillation, quantisation, pruning), synthetic data generation, or model self-improvement.
Experience with LLMs and LLM-centered research.
Contributions to open-source ML projects and an active GitHub profile.

Why should you join?

Front-row research: work on open research problems and see the results land in a product used by defence, robotics and industrial teams.
Mentorship and growth: you’ll be paired with a dedicated mentor and have direct access to the founding science team.
Publication support: if your work yields a publishable result, we’ll sponsor conference travel and fees.
Competitive salary: we pay above-market internship compensation and cover travel for our Berlin offsites.
Remote-first, Europe-centric: work from anywhere in EU time zones; gather in Berlin for planning offsites.

Internship details

Item	Detail
Start date	Flexible; we review applications on a rolling basis
Duration	4-6 months, full-time
Location	Remote within EU time zones; on-site weeks in Berlin each quarter

distil labs is an equal-opportunity employer. We value diversity and do not discriminate on the basis of race, religion, colour, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.

Machine Learning Science Intern