Distil labs inference playground
You can use the distil labs inference playground to test your trained model. The playground provides a hosted deployment endpoint that supports OpenAI-compatible inference.
Using the CLI
Section titled “Using the CLI”The distil CLI is the quickest way to deploy, query, and manage your model on distil-managed remote infrastructure.
Activating a deployment
Section titled “Activating a deployment”Deploy your trained model with a single command:
distil model deploy remote <model-id>
The CLI will provision your deployment and display the endpoint URL, API key, and a client script you can use to query your model.
To output only the client script (useful for piping to a file):
distil model deploy remote --client-script <model-id>
Querying your model
Section titled “Querying your model”Get the command to invoke your deployed model:
distil model invoke <model-id>
This outputs a ready-to-run command using uv pointing to client saved in CLI’s cache. Copy and run it directly:
uv run PATH_TO_CLIENT --question "Your question here"
For question answering models that require context, use the --context flag:
uv run PATH_TO_CLIENT --question "Your question here" --context "Your context here"
Deactivating a deployment
Section titled “Deactivating a deployment”When you’re done testing, deactivate your deployment to conserve credits:
distil model deploy remote --deactivate <model-id>
CLI options reference
Section titled “CLI options reference”| Option | Description |
|---|---|
--client-script | Output only the client script for the deployment |
--deactivate | Deactivate a remote deployment |
--output json | Output results in JSON format |
Using the API
Section titled “Using the API”You can also manage deployments programmatically using the REST API.
Activating a deployment
Section titled “Activating a deployment”curl -X POST "https://api.distillabs.ai/trainings/YOUR_TRAINING_ID/deployment" \
-H "Authorization: Bearer $DISTIL_TOKEN" \
-H "Content-Type: application/json" \
-d "{}" import requests
# See Account and Authentication for distil_bearer_token() implementation
auth_header = {"Authorization": f"Bearer {distil_bearer_token()}"}
training_id = "YOUR_TRAINING_ID"
response = requests.post(
f"https://api.distillabs.ai/trainings/{training_id}/deployment",
headers={"Content-Type": "application/json", **auth_header},
json={},
)
print(response.json()) The response includes all the information you need to query your model:
{
"id": "deployment-uuid",
"training_id": "your-training-uuid",
"deployment_status": "active",
"url": "https://your-deployment-endpoint.distillabs.ai",
"client_script": "...",
"secrets": {
"api_key": "your-api-key"
}
}
The deployment_status field indicates the current state:
building- Deployment is being provisionedactive- Ready to accept requestsinactive- Deployment has been deactivatedcredits_exhausted- No credits remaining
The client_script field contains example Python code you can use to query your model. It is important that you use the exact prompt format shown in this script when querying your model.
Retrieving deployment information
Section titled “Retrieving deployment information”After your deployment is set up, you can also retrieve information about it (the format will be the same as shown above).
curl -X GET "https://api.distillabs.ai/trainings/YOUR_DEPLOYMENT_ID/deployment" \
-H "Authorization: Bearer $DISTIL_TOKEN" import requests
# See Account and Authentication for distil_bearer_token() implementation
auth_header = {"Authorization": f"Bearer {distil_bearer_token()}"}
training_id = "YOUR_TRAINING_ID"
response = requests.get(
f"https://api.distillabs.ai/trainings/{training_id}/deployment",
headers={"Content-Type": "application/json", **auth_header},
)
print(response.json()) Querying your model
Section titled “Querying your model”Extract the client script from your deployment and save it to a file (you will need jq installed):
curl -s "https://api.distillabs.ai/trainings/YOUR_TRAINING_ID/deployment" \
-H "Authorization: Bearer $DISTIL_TOKEN" \
| jq -r '.client_script' > model_client.py
Then run the script with your question and context. You will need the openai Python package available locally.
python model_client.py \
--question "Your question here" \
--context "Your context here"
Deactivating a deployment
Section titled “Deactivating a deployment”When you’re done testing, deactivate your deployment to conserve credits:
curl -X DELETE "https://api.distillabs.ai/trainings/YOUR_TRAINING_ID/deployment" \
-H "Authorization: Bearer $DISTIL_TOKEN" import requests
# See Account and Authentication for distil_bearer_token() implementation
auth_header = {"Authorization": f"Bearer {distil_bearer_token()}"}
training_id = "YOUR_TRAINING_ID"
response = requests.delete(
f"https://api.distillabs.ai/trainings/{training_id}/deployment",
headers=auth_header,
)
# Returns 204 No Content on success Credits
Section titled “Credits”Inference playground deployments require credits. When you run out of credits, you won’t be able to create new deployments and your existing deployments will be deactivated. All users get $30 of free starting credits - reach out to us at contact@distillabs.ai when you need more.