Deployment overview - Qwen Cloud

Qwen Cloud deployments provision dedicated inference services for your fine-tuned custom models. Standard Qwen models can be called directly via the chat completions API without a deployment. Deployments are only needed for custom models you have trained.

Prerequisites

A Qwen Cloud account. Log in to the console.
A custom model published from a completed fine-tuning job. See Publish checkpoints and Custom Models.

Create a deployment

Go to the Deployments page and click Create Deployment to open the wizard.

1. Basic info

Deploy name: Enter a name to identify this deployment.
Select model: Choose a deployable model from the dropdown.
Model code: Review and optionally customize the model code suffix used for API calls.

2. Configure

Billing method: Select the billing approach (e.g., Billing by tokens).
Payment type: Select the payment type (e.g., Pay-as-you-go).

The billing method cannot be changed after deployment. To switch, delete the deployment and create a new one.

3. Review and submit

Review the cost breakdown — billing method, payment type, throughput, and token pricing — then click Create deployment to submit.

Billing begins as soon as the deployment reaches Active status, even if no inference requests have been sent.

After creation

Once submitted, the deployment enters Deploying status. Provisioning typically takes a few minutes. When the status changes to Active, the deployment is ready to receive inference requests.

Calling deployed models

Use the deployment's model code as the model parameter in the chat completions API. Find the model code on the Deployments page below the deployment name.

OpenAI-compatible (Python)
curl

import os
from openai import OpenAI

client = OpenAI(
  api_key=os.getenv("DASHSCOPE_API_KEY"),
  base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)

completion = client.chat.completions.create(
  model="your-deployment-model-code",  # Replace with your deployment model code
  messages=[
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Explain quantum computing in simple terms."},
  ],
)
print(completion.choices[0].message.content)

curl "https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions" \
  -H "Authorization: Bearer $DASHSCOPE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "your-deployment-model-code",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Explain quantum computing in simple terms."}
    ]
  }'

Next steps

Manage deployments -- Monitor, stop, and delete your deployments.

​Prerequisites

​Create a deployment

​1. Basic info

​2. Configure

​3. Review and submit

​After creation

​Calling deployed models

​Next steps

Prerequisites

Create a deployment

1. Basic info

2. Configure

3. Review and submit

After creation

Calling deployed models

Next steps