Distil labs inference

You can use the distil labs inference to test your trained model. The playground provides a hosted deployment endpoint that supports OpenAI-compatible inference.

Using the CLI

The distil CLI is the quickest way to deploy, query, and manage your model on distil-managed remote infrastructure.

Activating a deployment

Deploy your trained model with a single command:

distil model deploy remote <model-id>

The CLI will provision your deployment and display the endpoint URL, API key, and a client script you can use to query your model.

To output only the client script (useful for piping to a file):

distil model deploy remote --client-script <model-id>

Querying your model

Get the command to invoke your deployed model:

distil model invoke <model-id>

This outputs a ready-to-run command using uv pointing to client saved in CLI’s cache. Copy and run it directly:

uv run PATH_TO_CLIENT --conversation '[{"role": "user", "content": "Your question here"}]'

For question answering models that require context, wrap it in a <context> tag inside the first user message:

uv run PATH_TO_CLIENT --conversation '[{"role": "user", "content": "<context>Your context here</context>Your question here"}]'

Deactivating a deployment

When you’re done testing, deactivate your deployment to conserve credits:

distil model deploy remote --deactivate <model-id>

CLI options reference

Option	Description
`--client-script`	Output only the client script for the deployment
`--deactivate`	Deactivate a remote deployment
`--output json`	Output results in JSON format

Using the API

You can also manage deployments programmatically using the REST API.

curl -X POST "https://api.distillabs.ai/trainings/YOUR_TRAINING_ID/deployment" \
  -H "Authorization: Bearer $DISTIL_TOKEN" \
  -H "Content-Type: application/json" \
  -d "{}"

import requests

# See Account and Authentication for distil_bearer_token() implementation
auth_header = {"Authorization": f"Bearer {distil_bearer_token()}"}

training_id = "YOUR_TRAINING_ID"
response = requests.post(
    f"https://api.distillabs.ai/trainings/{training_id}/deployment",
    headers={"Content-Type": "application/json", **auth_header},
    json={},
)
print(response.json())

The response includes all the information you need to query your model:

{
  "id": "deployment-uuid",
  "training_id": "your-training-uuid",
  "deployment_status": "active",
  "url": "https://your-deployment-endpoint.distillabs.ai",
  "client_script": "...",
  "secrets": {
    "api_key": "your-api-key"
  }
}

The deployment_status field indicates the current state:

building - Deployment is being provisioned
active - Ready to accept requests
inactive - Deployment has been deactivated
credits_exhausted - No credits remaining

The client_script field contains example Python code you can use to query your model. It is important that you use the exact prompt format shown in this script when querying your model.

Retrieving deployment information

After your deployment is set up, you can also retrieve information about it (the format will be the same as shown above).

curl
Python

curl -X GET "https://api.distillabs.ai/trainings/YOUR_DEPLOYMENT_ID/deployment" \
  -H "Authorization: Bearer $DISTIL_TOKEN"

import requests

# See Account and Authentication for distil_bearer_token() implementation
auth_header = {"Authorization": f"Bearer {distil_bearer_token()}"}

training_id = "YOUR_TRAINING_ID"
response = requests.get(
    f"https://api.distillabs.ai/trainings/{training_id}/deployment",
    headers={"Content-Type": "application/json", **auth_header},
)
print(response.json())

Querying your model

Extract the client script from your deployment and save it to a file (you will need jq installed):

curl -s "https://api.distillabs.ai/trainings/YOUR_TRAINING_ID/deployment" \
  -H "Authorization: Bearer $DISTIL_TOKEN" \
  | jq -r '.client_script' > model_client.py

Then run the script with your conversation. You will need the openai Python package available locally. For question answering models that require context, wrap it in a <context> tag inside the first user message.

python model_client.py \
  --conversation '[{"role": "user", "content": "<context>Your context here</context>Your question here"}]'

Deactivating a deployment

When you’re done testing, deactivate your deployment to conserve credits:

curl
Python

curl -X DELETE "https://api.distillabs.ai/trainings/YOUR_TRAINING_ID/deployment" \
  -H "Authorization: Bearer $DISTIL_TOKEN"

import requests

# See Account and Authentication for distil_bearer_token() implementation
auth_header = {"Authorization": f"Bearer {distil_bearer_token()}"}

training_id = "YOUR_TRAINING_ID"
response = requests.delete(
    f"https://api.distillabs.ai/trainings/{training_id}/deployment",
    headers=auth_header,
)
# Returns 204 No Content on success

Credits

Inference playground deployments require credits. When you run out of credits, you won’t be able to create new deployments and your existing deployments will be deactivated. All users get $30 of free starting credits - reach out to us at contact@distillabs.ai when you need more.

Distil labs inference

Using the CLI

Activating a deployment

Querying your model

Deactivating a deployment

CLI options reference

Using the API

Activating a deployment

Retrieving deployment information

Querying your model

Deactivating a deployment

Credits

Cookie preferences