DashScope chat - Qwen Cloud

POST

/api/v1/services/aigc/text-generation/generation

import os
import dashscope

dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
messages = [
  {'role': 'system', 'content': 'You are a helpful assistant.'},
  {'role': 'user', 'content': 'Who are you?'}
]
response = dashscope.Generation.call(
  api_key=os.getenv('DASHSCOPE_API_KEY'),
  model='qwen-plus',
  messages=messages,
  result_format='message'
)
print(response)

{
  "status_code": 200,
  "request_id": "902fee3b-f7f0-9a8c-96a1-6b4ea25af114",
  "code": "",
  "message": "",
  "output": {
    "text": null,
    "finish_reason": null,
    "choices": [
      {
        "finish_reason": "stop",
        "message": {
          "role": "assistant",
          "content": "I am a large-scale language model developed by Alibaba Cloud. My name is Qwen.",
          "tool_calls": null,
          "reasoning_content": null
        }
      }
    ]
  },
  "usage": {
    "input_tokens": 22,
    "output_tokens": 17,
    "total_tokens": 39,
    "image_tokens": null,
    "video_tokens": null,
    "audio_tokens": null
  }
}

Get an API key and set it as an environment variable. To use the SDK, install it.

Endpoint

HTTP (text-only, such as qwen-plus): POST https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/text-generation/generation
HTTP (multimodal, such as qwen3.7-plus, qwen3-vl-plus): POST https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation
SDK base_http_api_url: https://dashscope-intl.aliyuncs.com/api/v1

Python SDK:

dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'

Java SDK:

// Option 1: Set during instantiation
import com.alibaba.dashscope.protocol.Protocol;
Generation gen = new Generation(Protocol.HTTP.getValue(), "https://dashscope-intl.aliyuncs.com/api/v1");

// Option 2: Set globally
import com.alibaba.dashscope.utils.Constants;
Constants.baseHttpApiUrl = "https://dashscope-intl.aliyuncs.com/api/v1";

Authorizations

string

header

required

Your DashScope API key. See Get API key for details.

Body

application/json

string

required

The name of the model to call. Supports Qwen large language models (commercial and open-source), Qwen-Coder, and math models. For a list of models, see Text generation — Qwen.

Example:qwen-plus

object

required

The input to the model.

Show child attributes

(System message · object | User message · object | Assistant message · object | Tool message · object)[]

required

The conversation context, provided as an ordered list of messages. Each message is a system, user, assistant, or tool message object.

Sets the role, tone, task objective, or constraints for the model. Usually placed first in the messages array. Do not set for QwQ models. Does not take effect for QVQ models.

System message
User message
Assistant message
Tool message

Show child attributes

enum<string>

required

Fixed as system.

Available options:system

string

required

The system message content that sets context for the model.

object

Optional generation parameters for text models.

Show child attributes

enum<string>

default"text"

The format of the returned data. Set to message for multi-turn conversations.

Default values: text for most models, except Qwen3-Max, Qwen3-VL, QwQ, and Qwen3 open source models (excluding qwen3-next-80b-a3b-instruct), which default to message.

When the model is Qwen-VL/QVQ, setting text has no effect. For Qwen3-Max, Qwen3-VL, and Qwen3 models in thinking mode, this can only be set to message.

Available options:message,text

number

Sampling temperature. Controls output diversity. Higher values produce more diverse output; lower values produce more deterministic output. Range: [0, 2).

Do not modify the default temperature value for QVQ models.

Required range:0 <= x < 2

number

Nucleus sampling threshold. Higher values produce more diverse output. Range: (0, 1.0].

Default values by model:

Qwen3.5 (non-thinking), Qwen3 (non-thinking), Qwen3-Instruct series, Qwen3-Coder series, qwen-max series, qwen-plus series (non-thinking), qwen-flash series (non-thinking), qwen-turbo series (non-thinking), qwen open source series, qwen-vl-max-2025-08-13, Qwen3-VL (non-thinking): 0.8
qwen-vl-plus series, qwen-vl-max, qwen-vl-max-latest, qwen-vl-max-2025-04-08, qwen2.5-vl-3b/7b/32b/72b-instruct: 0.001
QVQ series, qwen-vl-plus-2025-07-10, qwen-vl-plus-2025-08-15: 0.5
qwen3-max-preview (thinking mode), Qwen3-Omni-Flash series: 1.0
Qwen3.5 (thinking), Qwen3 (thinking), Qwen3-VL (thinking), Qwen3-Thinking, QwQ series, Qwen3-Omni-Captioner: 0.95

Do not modify the default top_p value for QVQ models.

Required range:0 < x <= 1

integer

The size of the candidate token set for sampling. A larger value increases randomness; a smaller value increases determinism. If None or greater than 100, top_k is disabled and only top_p takes effect. Must be >= 0.

Default values by model:

QVQ series, qwen-vl-plus-2025-07-10, qwen-vl-plus-2025-08-15: 10
QwQ series: 40
Other Qwen-VL-Plus series, Qwen-VL-Max models released before August 13 2025, qwen2.5-omni-7b: 1
Qwen3-Omni-Flash series: 50
All other models: 20

Do not modify the default top_k value for QVQ models.

Required range:x >= 0

integer

deprecated

Deprecated. Use max_completion_tokens for new integrations.

Maximum number of tokens to generate. When reached, generation stops and finish_reason is length. Does not limit thinking chain length. Default is the model's maximum output length.

integer

The maximum number of tokens in this response, including chain-of-thought tokens. Generation stops when this limit is reached, and finish_reason is set to length. The default and maximum values correspond to the model's maximum output length.

Difference from max_tokens: max_completion_tokens limits the total length of both the chain-of-thought and the final response, while max_tokens does not limit the chain-of-thought length. For reasoning models, max_completion_tokens is recommended.

Supported models:

Qwen-Max: Qwen3.7-Max and later
Qwen-Plus: Qwen3.5-Plus and later
Qwen-Flash: Qwen3.5-Flash and later
DeepSeek: deepseek-v3, deepseek-r1, deepseek-r1-0528, deepseek-v3.1, deepseek-v3.2, deepseek-v3.2-exp, deepseek-v4-pro, deepseek-v4-flash and later

The actual number of output tokens may differ from the configured value by up to 10 tokens.

For HTTP calls, place max_completion_tokens in the parameters field.

boolean

defaultfalse

Whether to stream the response. For HTTP streaming, also set the X-DashScope-SSE: enable header. For Java SDK streaming, use the streamCall interface.

boolean

defaultfalse

When streaming, whether to return only the new delta tokens (true) or the full accumulated text so far (false).

boolean

Whether to enable thinking mode. Applies to hybrid thinking models: Qwen3.5, Qwen3, and Qwen3-VL series. When enabled, thinking content is returned in the reasoning_content field.

integer

Maximum length of the thinking chain. Applies to commercial and open source versions of Qwen3.5, Qwen3-VL, and Qwen3. Default is the model's maximum chain-of-thought length.

boolean

defaultfalse

Controls streaming behavior for complex tool arguments. Only takes effect in streaming mode. Ordinary tool arguments (all parameter types are string) are streamed as long as streaming is enabled; tool_stream has no effect on them. Complex tools are those whose definitions include parameters of type array or object.

Supported Qwen models:

qwen-max series: text modality of the qwen3.7-max series
qwen-plus series: text modality of the qwen3.7-plus and qwen3.6-plus series, and all modalities of the qwen3.5-plus series
qwen-flash series: all modalities of the qwen3.6-flash and qwen3.5-flash series

Qwen usage:

tool_stream=false: Complex tool arguments are returned all at once (default). More accurate for complex formats.
tool_stream=true: Complex tool arguments are streamed. Avoids timeout risk for complex formats.

Supported GLM models: glm-5.1.

GLM usage:

tool_stream=false: Tool arguments are returned all at once (default). More accurate for complex formats.
tool_stream=true: Tool arguments are streamed. Avoids timeout risk for complex formats.

For HTTP calls, place tool_stream in the parameters object.

boolean

defaultfalse

Whether to enable the code interpreter feature. Supported by qwen3.5, and by qwen3.7-max, qwen3.7-max-2026-05-20, qwen3.7-max-2026-06-08, qwen3.7-max-preview, qwen3.7-max-2026-05-17, qwen3-max, qwen3-max-2026-01-23, and qwen3-max-preview in thinking mode.

number

Penalty for token repetition. A value of 1.0 means no penalty. Higher values reduce repetition. Must be a positive number.

When using the qwen-vl-plus_2025-01-25 model for text extraction, set repetition_penalty to 1.0. Do not modify the default repetition_penalty value for QVQ models.

number

Controls how much the model avoids repeating content already present in the text. Range: [-2.0, 2.0]. Positive values reduce repetition; negative values increase it.

Default values by model:

Qwen3.5 (non-thinking), qwen3-max-preview (thinking), Qwen3 (non-thinking), Qwen3-Instruct series, qwen3-0.6b/1.7b/4b (thinking), QVQ series, qwen-max, qwen-max-latest, qwen2.5-vl series, qwen-vl-max series, qwen-vl-plus, Qwen3-VL (non-thinking): 1.5
qwen-vl-plus-latest, qwen-vl-plus-2025-08-15: 1.2
qwen-vl-plus-2025-01-25: 1.0
qwen3-8b/14b/32b/30b-a3b/235b-a22b (thinking), qwen-plus/qwen-plus-latest/2025-04-28 (thinking), qwen-turbo/qwen-turbo/2025-04-28 (thinking): 0.5
All other models: 0.0

When using qwen-vl-plus-2025-01-25 for text extraction, set presence_penalty to 1.5. Do not modify the default for QVQ models.

Required range:-2 <= x <= 2

integer

Random seed for reproducible results. Range: [0, 2³¹−1]. With the same seed and parameters, the model returns the same result whenever possible.

Required range:x >= 0

string

Stop sequences. When the generated text contains a specified string or token ID, generation stops immediately. Do not mix strings and token IDs in the same array. Not supported by all models; check model documentation.

object[]

An array of tool objects for function calling. When using tools, you must set result_format to message. Not supported by qwen-vl series models. For usage examples, see the Function calling guide.

Show child attributes

enum<string>

required

The type of tool. Currently only function is supported.

Available options:function

object

required

Show child attributes

string

required

The name of the tool function. Can contain letters, numbers, underscores, and hyphens. Maximum 64 characters.

string

required

A description of the tool function that helps the model decide when and how to call it.

object

A JSON Schema object describing the function parameters. Defaults to {}.

enum<string>

default"auto"

Defines the tool selection strategy. Thinking mode models do not support forcing a specific tool.

boolean

defaultfalse

Whether to enable parallel tool calls. Not supported by thinking mode models when forcing a specific tool. See Parallel tool calls.

object

default{"type":"text"}

The format of the returned content. If set to json_object, you must instruct the model to output JSON in the prompt.

Show child attributes

enum<string>

The output format type. text: plain text. json_object: standard JSON string. json_schema: JSON matching the provided schema.

Available options:text,json_object,json_schema

object

Required when type is json_schema. Defines the JSON Schema for structured output. For supported models, see Structured output.

Show child attributes

string

Unique schema name (letters, numbers, underscores, hyphens; max 64 characters).

string

Description of the schema's purpose.

object

A JSON Schema object defining the output data structure.

boolean

defaultfalse

Whether the model must strictly adhere to all schema constraints. true is recommended.

boolean

defaultfalse

Whether to return log probabilities of the output tokens. Supported by: snapshot models of qwen-plus/qwen-turbo series; qwen3-vl-plus/qwen3-vl-flash series; Qwen3 open source models. See model page for supported models.

integer

default0

Number of most likely candidate tokens to return at each generation step. Valid values: 0–5. Only takes effect when logprobs is true. Supported by the same models as logprobs.

Required range:0 <= x <= 5

integer

default1

The number of responses to generate. Range: 1–4. Currently only non-thinking mode Qwen3 models are supported. Fixed at 1 when tools is specified. Increases output token consumption.

Required range:1 <= x <= 4

boolean

defaultfalse

Whether to enable high-resolution image processing. When enabled, uses a fixed-resolution strategy where max_pixels is ignored. Default: false.

Pixel limits when enabled (true):

Qwen3.5 series, Qwen3-VL series, qwen-vl-max, qwen-vl-max-latest, qwen-vl-max-0813, qwen-vl-plus, qwen-vl-plus-latest, qwen-vl-plus-0815: fixed at 16777216 pixels (16384 tokens × 32×32 pixels)
QVQ series and other Qwen2.5-VL series: fixed at 12845056 pixels (16384 tokens × 28×28 pixels)

When false, the pixel limit is determined by max_pixels.

boolean

defaultfalse

Whether to return the dimensions of the scaled image in the response (image_hw field). When streaming, returned in the last chunk. Applies to Qwen-VL series models.

Response

200-application/json

integer

The status code of the request. 200 indicates success. The Java SDK does not return this field; if a call fails, an exception is thrown containing the status_code.

string

A unique identifier for this request. In the Java SDK, this is requestId.

string

The error code. Empty string if the request was successful. Only the Python SDK returns this field.

string

A human-readable error message. Empty string if the request was successful.

object

The model's output.

Show child attributes

string | null

The generated text. Returned when result_format is text.

string | null

The reason generation stopped. Returned when result_format is text. Values: null (still generating), stop (natural end or stop condition triggered), length (max tokens reached), tool_calls (tool call triggered).

object[]

The output choices. Returned when result_format is message.

Show child attributes

string | null

The reason generation stopped. Values: null (generating), stop, length, tool_calls.

object

The assistant's output message.

Show child attributes

string

Always assistant.

string

The message content. A string for text models; an array for Qwen-VL/Qwen-Audio models. Empty when tool_calls is present.

string | null

The deep thinking content. Returned when thinking mode is enabled.

object[] | null

Tool calls the model wants to make. Present when the model triggers a function call.

Show child attributes

string

The ID of the tool call.

enum<string>

The type of tool. Currently only function is supported.

Available options:function

object

Show child attributes

string

The name of the tool function.

string

The input parameters for the tool, as a JSON string.

integer

The index of this tool call in the tool_calls array.

object | null

Log probability information for this choice. Returned when logprobs is true.

Show child attributes

object[]

An array of tokens with log probability information.

Show child attributes

string

integer[]

number | null

object[]

Show child attributes

string

integer[]

number | null

object

Token usage information for this request.

Show child attributes

integer

Number of tokens in the user input.

integer

Number of tokens in the model output.

integer

Total tokens (input + output). Returned for plain text input.

integer | null

Number of tokens in the input image. Returned when the input includes an image.

integer | null

Number of tokens in the input video. Returned when the input includes a video.

integer | null

Number of tokens in the input audio. Returned when the input includes audio.

object

Detailed input token breakdown for Qwen-VL and QVQ models.

Show child attributes

integer

object

Detailed output token breakdown.

Show child attributes

integer

Number of tokens in the output text.

integer

Number of tokens in the thinking process.

object

Fine-grained classification of input tokens.

Show child attributes

integer

Number of tokens that hit the cache. See Context cache.

integer

Number of tokens used to create the explicit cache.

string

If explicit caching is used, the value is ephemeral. Otherwise not returned.

object

Information about explicit cache creation.

Show child attributes

integer

The number of tokens used to create a 5-minute explicit cache.

​Endpoint

Authorizations

Body

Response

Endpoint