What are we building?
distil labs is a platform that fine-tunes task-specific small language models automatically. Customers use it to swap the general-purpose LLM in their agentic system for a smaller, purpose-built one: same quality on the task, at 50-80% lower cost and with lower latency. Under the hood, we take the production traces collected by the customer, generate synthetic training data from them, train a small model that matches frontier-model quality on the narrow task, and deploy it to an OpenAI-compatible endpoint in the cloud, on-prem, or at the edge.
About the role
We are looking for a Machine Learning Engineer to own every stage of the pipeline: synthetic data generation, distributed training and evaluation, and low-latency inference. You will architect, build, and operate the systems that generate and validate training data, schedule GPU jobs, store artefacts, and serve models to production workloads. As part of a small team, you will have an outsized impact on technical decisions, product direction, and engineering culture.
Responsibilities
- Own and improve our scalable compute fabric (based on Argo Workflows, Kubernetes and similar) for orchestrating data generation, training and evaluation jobs.
- Scale our synthetic data generation pipeline: high-throughput teacher-model inference plus the validation and filtering stages that keep only high-quality training examples.
- Run and optimise distributed fine-tuning workloads (HuggingFace, PyTorch, DDP/FSDP, LoRA) across cloud and on-prem GPU clusters.
- Build a secure, multi-tenant model-serving layer (vLLM, FastAPI) that auto-scales with load while holding low-latency SLAs.
- Implement observability, cost monitoring and alerting across the stack (Prometheus, Grafana, CloudWatch), and define Infrastructure-as-Code practices (Terraform/Pulumi) across AWS, GCP and on-prem.
- Partner with ML scientists to design, implement, benchmark and productionise new algorithms for knowledge distillation and synthetic data generation.
What you’ll bring
- Bachelor’s or Master’s degree in Computer Science, Electrical Engineering or a related field.
- 6+ years of software engineering experience, with 3+ years building ML or data infrastructure at scale.
- Deep proficiency in Python for machine learning. Our stack is built on the HuggingFace ecosystem with PyTorch.
- Hands-on experience operating Kubernetes, Argo Workflows or similar orchestration systems in production.
- Proven track record running ML training jobs with HuggingFace, PyTorch, JAX or TensorFlow (DDP/FSDP, DeepSpeed, Ray or Kubeflow).
- Expertise in containerisation and IaC (Docker, Terraform, Pulumi) and cloud services (AWS, GCP or Azure).
- Strong grasp of monitoring and observability stacks (Prometheus, Grafana, Datadog) and cost-optimisation strategies for GPU workloads.
- Excellent problem-solving abilities and a bias for automation and reliability engineering.
Bonus points for
- Familiarity with inference optimisation techniques (quantisation, sparsity, compilation) and serving engines (Triton, TensorRT-LLM, vLLM).
- Experience running hybrid or on-prem GPU clusters and high-speed storage (NVMe, Infiniband).
- Experience contributing to academic research, ideally with publications in machine learning or related fields.
- Contributions to open-source ML infrastructure projects.
Why should you join?
- Real ownership: architect the pipeline end to end and choose the best tools for the job.
- Mission with impact: make advanced AI accessible to teams that lack massive GPUs or ML expertise. Our models already run in production for defence, cybersecurity, edtech and robotics customers.
- Early-stage upside: competitive salary, VSOP/ESOP and a real say in where the company goes.
- Remote-first, Europe-centric: work from anywhere in EU time zones, with regular Berlin offsites.
Why now?
Enterprises are moving from general-purpose cloud LLMs to smaller, faster, private models, and the infrastructure to train and serve those models securely is still being invented. Join us at the ground floor and shape the systems that will power the next generation of AI products.
Who are we?
We are a small team full of deep technical expertise: engineers and researchers who built ML systems and infrastructure at places like Amazon, Delivery Hero, Five AI and ING. Meet the whole team on our homepage.
distil labs is an equal-opportunity employer. We value diversity and do not discriminate on the basis of race, religion, colour, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.