Monitor, scale, and manage the lifecycle of your model deployments on Qwen Cloud.
Deployment list
Go to the Deployments page in the Qwen Cloud console to view all deployments in your workspace.
- Search by deployment name.
- Filter by status.
Deployment detail
Click a deployment name to open its detail page, which has two tabs.
Overview
Displays the deployment's configuration and billing information:
- Basic information -- Service name, status, created/updated timestamps, and model name.
- Billing information -- Billing method and payment type.
- Deployment configuration -- Rate limiting settings (RPM and TPM), replicas, and model unit specification.
Monitoring
Provides real-time and historical performance metrics:
- Summary stats -- Total models, total calls, failures, average TTFT (time to first token), and average latency.
- RPM / TPM charts -- Requests per minute and tokens per minute over time.
- TTFT / Latency charts -- Time to first token and end-to-end latency over time.
Deployment statuses
| Status | Description |
|---|---|
| Deploying | Resources are being provisioned. The deployment is not yet ready for inference. |
| Active | The deployment is running and accepting inference requests. Billing is active. |
| Stopped | The deployment has been manually stopped. No billing is incurred. |
| Error | The deployment encountered an error during provisioning or runtime. |
Actions
You can perform the following actions from the deployment list or detail page:
- Try -- Send a test inference request directly from the console.
- Stop -- Pause the deployment. Billing stops while the deployment is stopped.
- Start -- Resume a stopped deployment. Billing resumes when the status returns to Active.
- Delete -- Permanently remove the deployment. This action cannot be undone.
Deleting a deployment is irreversible. The service shuts down immediately and all associated resources are released.