Deploy custom models on Qwen Cloud to create dedicated inference services for production workloads.
Qwen Cloud deployments provision dedicated inference services for your fine-tuned custom models. Standard Qwen models can be called directly via the chat completions API without a deployment. Deployments are only needed for custom models you have trained.
Go to the Deployments page and click Create Deployment to open the wizard.
Review the cost breakdown — billing method, payment type, throughput, and token pricing — then click Create deployment to submit.
Once submitted, the deployment enters Deploying status. Provisioning typically takes a few minutes. When the status changes to Active, the deployment is ready to receive inference requests.
Use the deployment's model code as the
Prerequisites
- A Qwen Cloud account. Log in to the console.
- A custom model published from a completed fine-tuning job. See Publish checkpoints and Custom Models.
Create a deployment
Go to the Deployments page and click Create Deployment to open the wizard.
1. Basic info
- Deploy name: Enter a name to identify this deployment.
- Select model: Choose a deployable model from the dropdown.
- Model code: Review and optionally customize the model code suffix used for API calls.
2. Configure
- Billing method: Select the billing approach (e.g., Billing by tokens).
- Payment type: Select the payment type (e.g., Pay-as-you-go).
The billing method cannot be changed after deployment. To switch, delete the deployment and create a new one.
3. Review and submit
Review the cost breakdown — billing method, payment type, throughput, and token pricing — then click Create deployment to submit.
Billing begins as soon as the deployment reaches Active status, even if no inference requests have been sent.
After creation
Once submitted, the deployment enters Deploying status. Provisioning typically takes a few minutes. When the status changes to Active, the deployment is ready to receive inference requests.
Calling deployed models
Use the deployment's model code as the model parameter in the chat completions API. Find the model code on the Deployments page below the deployment name.
- OpenAI-compatible (Python)
- curl
Next steps
- Manage deployments -- Monitor, stop, and delete your deployments.