Skip to main content

Manage deployments

Monitor, scale, and manage the lifecycle of your model deployments on Qwen Cloud.

Deployment list

Go to the Deployments page in the Qwen Cloud console to view all deployments in your workspace.
  • Search by deployment name.
  • Filter by status.
Each entry shows the deployment name, model, billing method, status, last updated time, and available actions.

Deployment detail

Click a deployment name to open its detail page, which has two tabs.

Overview

Displays the deployment's configuration and billing information:
  • Basic information -- Service name, status, created/updated timestamps, and model name.
  • Billing information -- Billing method and payment type.
  • Deployment configuration -- Rate limiting settings (RPM and TPM), replicas, and model unit specification.

Monitoring

Provides real-time and historical performance metrics:
  • Summary stats -- Total models, total calls, failures, average TTFT (time to first token), and average latency.
  • RPM / TPM charts -- Requests per minute and tokens per minute over time.
  • TTFT / Latency charts -- Time to first token and end-to-end latency over time.

Deployment statuses

StatusDescription
DeployingResources are being provisioned. The deployment is not yet ready for inference.
ActiveThe deployment is running and accepting inference requests. Billing is active.
StoppedThe deployment has been manually stopped. No billing is incurred.
ErrorThe deployment encountered an error during provisioning or runtime.

Actions

You can perform the following actions from the deployment list or detail page:
  • Try -- Send a test inference request directly from the console.
  • Stop -- Pause the deployment. Billing stops while the deployment is stopped.
  • Start -- Resume a stopped deployment. Billing resumes when the status returns to Active.
  • Delete -- Permanently remove the deployment. This action cannot be undone.
Deleting a deployment is irreversible. The service shuts down immediately and all associated resources are released.

Scaling

You can adjust deployment capacity without recreating the service. Open the deployment detail page and modify the capacity settings. Changes take effect within a few minutes.