Skip to main content

Deployment overview

Deploy custom models on Qwen Cloud to create dedicated inference services for production workloads.

Qwen Cloud deployments provision dedicated inference services for your fine-tuned custom models. Standard Qwen models can be called directly via the chat completions API without a deployment. Deployments are only needed for custom models you have trained.

Prerequisites

Create a deployment

Go to the Deployments page and click Create Deployment to open the wizard.

1. Basic info

  • Deploy name: Enter a name to identify this deployment.
  • Select model: Choose a deployable model from the dropdown.
  • Model code: Review and optionally customize the model code suffix used for API calls.

2. Configure

  • Billing method: Select the billing approach (e.g., Billing by tokens).
  • Payment type: Select the payment type (e.g., Pay-as-you-go).
The billing method cannot be changed after deployment. To switch, delete the deployment and create a new one.

3. Review and submit

Review the cost breakdown — billing method, payment type, throughput, and token pricing — then click Create deployment to submit.
Billing begins as soon as the deployment reaches Active status, even if no inference requests have been sent.

After creation

Once submitted, the deployment enters Deploying status. Provisioning typically takes a few minutes. When the status changes to Active, the deployment is ready to receive inference requests.

Calling deployed models

Use the deployment's model code as the model parameter in the chat completions API. Find the model code on the Deployments page below the deployment name.
  • OpenAI-compatible (Python)
  • curl
import os
from openai import OpenAI

client = OpenAI(
  api_key=os.getenv("DASHSCOPE_API_KEY"),
  base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)

completion = client.chat.completions.create(
  model="your-deployment-model-code",  # Replace with your deployment model code
  messages=[
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Explain quantum computing in simple terms."},
  ],
)
print(completion.choices[0].message.content)

Next steps