Skip to content

Distil labs inference playground

You can use the distil labs inference playground to test your trained model. The playground provides a hosted deployment endpoint that supports OpenAI-compatible inference.

The distil CLI is the quickest way to deploy, query, and manage your model on distil-managed remote infrastructure.

Deploy your trained model with a single command:

distil model deploy remote <model-id>

The CLI will provision your deployment and display the endpoint URL, API key, and a client script you can use to query your model.

To output only the client script (useful for piping to a file):

distil model deploy remote --client-script <model-id>

Get the command to invoke your deployed model:

distil model invoke <model-id>

This outputs a ready-to-run command using uv pointing to client saved in CLI’s cache. Copy and run it directly:

uv run PATH_TO_CLIENT --question "Your question here"

For question answering models that require context, use the --context flag:

uv run PATH_TO_CLIENT --question "Your question here" --context "Your context here"

When you’re done testing, deactivate your deployment to conserve credits:

distil model deploy remote --deactivate <model-id>
OptionDescription
--client-scriptOutput only the client script for the deployment
--deactivateDeactivate a remote deployment
--output jsonOutput results in JSON format

You can also manage deployments programmatically using the REST API.

curl -X POST "https://api.distillabs.ai/trainings/YOUR_TRAINING_ID/deployment" \
  -H "Authorization: Bearer $DISTIL_TOKEN" \
  -H "Content-Type: application/json" \
  -d "{}"

The response includes all the information you need to query your model:

{
  "id": "deployment-uuid",
  "training_id": "your-training-uuid",
  "deployment_status": "active",
  "url": "https://your-deployment-endpoint.distillabs.ai",
  "client_script": "...",
  "secrets": {
    "api_key": "your-api-key"
  }
}

The deployment_status field indicates the current state:

  • building - Deployment is being provisioned
  • active - Ready to accept requests
  • inactive - Deployment has been deactivated
  • credits_exhausted - No credits remaining

The client_script field contains example Python code you can use to query your model. It is important that you use the exact prompt format shown in this script when querying your model.

After your deployment is set up, you can also retrieve information about it (the format will be the same as shown above).

curl -X GET "https://api.distillabs.ai/trainings/YOUR_DEPLOYMENT_ID/deployment" \
  -H "Authorization: Bearer $DISTIL_TOKEN"

Extract the client script from your deployment and save it to a file (you will need jq installed):

curl -s "https://api.distillabs.ai/trainings/YOUR_TRAINING_ID/deployment" \
  -H "Authorization: Bearer $DISTIL_TOKEN" \
  | jq -r '.client_script' > model_client.py

Then run the script with your question and context. You will need the openai Python package available locally.

python model_client.py \
  --question "Your question here" \
  --context "Your context here"

When you’re done testing, deactivate your deployment to conserve credits:

curl -X DELETE "https://api.distillabs.ai/trainings/YOUR_TRAINING_ID/deployment" \
  -H "Authorization: Bearer $DISTIL_TOKEN"

Inference playground deployments require credits. When you run out of credits, you won’t be able to create new deployments and your existing deployments will be deactivated. All users get $30 of free starting credits - reach out to us at contact@distillabs.ai when you need more.