import os
import dashscope
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
messages = [
{'role': 'system', 'content': 'You are a helpful assistant.'},
{'role': 'user', 'content': 'Who are you?'}
]
response = dashscope.Generation.call(
api_key=os.getenv('DASHSCOPE_API_KEY'),
model='qwen3.6-plus',
messages=messages,
result_format='message'
)
print(response){
"status_code": 200,
"request_id": "902fee3b-f7f0-9a8c-96a1-6b4ea25af114",
"code": "",
"message": "",
"output": {
"text": null,
"finish_reason": null,
"choices": [
{
"finish_reason": "stop",
"message": {
"role": "assistant",
"content": "I am a large-scale language model developed by Alibaba Cloud. My name is Qwen.",
"tool_calls": null,
"reasoning_content": null
}
}
]
},
"usage": {
"input_tokens": 22,
"output_tokens": 17,
"total_tokens": 39,
"image_tokens": null,
"video_tokens": null,
"audio_tokens": null
}
}Endpoint
- HTTP (text-only, such as
qwen-plus):POST https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/text-generation/generation - HTTP (multimodal, such as
qwen3.6-plus,qwen3-vl-plus):POST https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation - SDK
base_http_api_url:https://dashscope-intl.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
// Option 1: Set during instantiation
import com.alibaba.dashscope.protocol.Protocol;
Generation gen = new Generation(Protocol.HTTP.getValue(), "https://dashscope-intl.aliyuncs.com/api/v1");
// Option 2: Set globally
import com.alibaba.dashscope.utils.Constants;
Constants.baseHttpApiUrl = "https://dashscope-intl.aliyuncs.com/api/v1";
Authorizations
Your DashScope API key. See Get API key for details.
Body
application/jsonThe name of the model to call. Supports Qwen large language models (commercial and open-source), Qwen-Coder, and math models. For a list of models, see Text generation — Qwen.
The input to the model.
Show child attributes
Show child attributes
The conversation context, provided as an ordered list of messages. Each message is a system, user, assistant, or tool message object.
Sets the role, tone, task objective, or constraints for the model. Usually placed first in the messages array. Do not set for QwQ models. Does not take effect for QVQ models.
Optional generation parameters for text models.
Show child attributes
Show child attributes
The format of the returned data. Set to message for multi-turn conversations.
Default values: text for most models, except Qwen3-Max, Qwen3-VL, QwQ, and Qwen3 open source models (excluding qwen3-next-80b-a3b-instruct), which default to message.
When the model is Qwen-VL/QVQ, setting text has no effect. For Qwen3-Max, Qwen3-VL, and Qwen3 models in thinking mode, this can only be set to message.
Sampling temperature. Controls output diversity. Higher values produce more diverse output; lower values produce more deterministic output. Range: [0, 2).
Do not modify the default temperature value for QVQ models.
Nucleus sampling threshold. Higher values produce more diverse output. Range: (0, 1.0].
Default values by model:
- Qwen3.5 (non-thinking), Qwen3 (non-thinking), Qwen3-Instruct series, Qwen3-Coder series, qwen-max series, qwen-plus series (non-thinking), qwen-flash series (non-thinking), qwen-turbo series (non-thinking), qwen open source series, qwen-vl-max-2025-08-13, Qwen3-VL (non-thinking): 0.8
- qwen-vl-plus series, qwen-vl-max, qwen-vl-max-latest, qwen-vl-max-2025-04-08, qwen2.5-vl-3b/7b/32b/72b-instruct: 0.001
- QVQ series, qwen-vl-plus-2025-07-10, qwen-vl-plus-2025-08-15: 0.5
- qwen3-max-preview (thinking mode), Qwen3-Omni-Flash series: 1.0
- Qwen3.5 (thinking), Qwen3 (thinking), Qwen3-VL (thinking), Qwen3-Thinking, QwQ series, Qwen3-Omni-Captioner: 0.95
Do not modify the default top_p value for QVQ models.
The size of the candidate token set for sampling. A larger value increases randomness; a smaller value increases determinism. If None or greater than 100, top_k is disabled and only top_p takes effect. Must be >= 0.
Default values by model:
- QVQ series, qwen-vl-plus-2025-07-10, qwen-vl-plus-2025-08-15: 10
- QwQ series: 40
- Other Qwen-VL-Plus series, Qwen-VL-Max models released before August 13 2025, qwen2.5-omni-7b: 1
- Qwen3-Omni-Flash series: 50
- All other models: 20
Do not modify the default top_k value for QVQ models.
Maximum number of tokens to generate. When reached, generation stops and finish_reason is length. Does not limit thinking chain length. Default is the model's maximum output length.
Whether to stream the response. For HTTP streaming, also set the X-DashScope-SSE: enable header. For Java SDK streaming, use the streamCall interface.
When streaming, whether to return only the new delta tokens (true) or the full accumulated text so far (false).
Whether to enable thinking mode. Applies to hybrid thinking models: Qwen3.5, Qwen3, and Qwen3-VL series. When enabled, thinking content is returned in the reasoning_content field.
Maximum length of the thinking chain. Applies to commercial and open source versions of Qwen3.5, Qwen3-VL, and Qwen3. Default is the model's maximum chain-of-thought length.
Whether to enable the code interpreter feature. Only supported by qwen3.5, qwen3-max, qwen3-max-2026-01-23, and qwen3-max-preview in thinking mode.
Penalty for token repetition. A value of 1.0 means no penalty. Higher values reduce repetition. Must be a positive number.
When using the qwen-vl-plus_2025-01-25 model for text extraction, set repetition_penalty to 1.0. Do not modify the default repetition_penalty value for QVQ models.
Controls how much the model avoids repeating content already present in the text. Range: [-2.0, 2.0]. Positive values reduce repetition; negative values increase it.
Default values by model:
- Qwen3.5 (non-thinking), qwen3-max-preview (thinking), Qwen3 (non-thinking), Qwen3-Instruct series, qwen3-0.6b/1.7b/4b (thinking), QVQ series, qwen-max, qwen-max-latest, qwen2.5-vl series, qwen-vl-max series, qwen-vl-plus, Qwen3-VL (non-thinking): 1.5
- qwen-vl-plus-latest, qwen-vl-plus-2025-08-15: 1.2
- qwen-vl-plus-2025-01-25: 1.0
- qwen3-8b/14b/32b/30b-a3b/235b-a22b (thinking), qwen-plus/qwen-plus-latest/2025-04-28 (thinking), qwen-turbo/qwen-turbo/2025-04-28 (thinking): 0.5
- All other models: 0.0
When using qwen-vl-plus-2025-01-25 for text extraction, set presence_penalty to 1.5. Do not modify the default for QVQ models.
Random seed for reproducible results. Range: [0, 2³¹−1]. With the same seed and parameters, the model returns the same result whenever possible.
Stop sequences. When the generated text contains a specified string or token ID, generation stops immediately. Do not mix strings and token IDs in the same array. Not supported by all models; check model documentation.
An array of tool objects for function calling. When using tools, you must set result_format to message. Not supported by qwen-vl series models. For usage examples, see the Function calling guide.
Show child attributes
Show child attributes
The type of tool. Currently only function is supported.
Show child attributes
Show child attributes
The name of the tool function. Can contain letters, numbers, underscores, and hyphens. Maximum 64 characters.
A description of the tool function that helps the model decide when and how to call it.
A JSON Schema object describing the function parameters. Defaults to {}.
Defines the tool selection strategy. Thinking mode models do not support forcing a specific tool.
Whether to enable parallel tool calls. Not supported by thinking mode models when forcing a specific tool. See Parallel tool calls.
The format of the returned content. If set to json_object, you must instruct the model to output JSON in the prompt.
Show child attributes
Show child attributes
The output format type. text: plain text. json_object: standard JSON string. json_schema: JSON matching the provided schema.
Required when type is json_schema. Defines the JSON Schema for structured output. For supported models, see Structured output.
Show child attributes
Show child attributes
Whether to return log probabilities of the output tokens. Supported by: snapshot models of qwen-plus/qwen-turbo series; qwen3-vl-plus/qwen3-vl-flash series; Qwen3 open source models. See model page for supported models.
Number of most likely candidate tokens to return at each generation step. Valid values: 0–5. Only takes effect when logprobs is true. Supported by the same models as logprobs.
The number of responses to generate. Range: 1–4. Currently only non-thinking mode Qwen3 models are supported. Fixed at 1 when tools is specified. Increases output token consumption.
Whether to enable high-resolution image processing. When enabled, uses a fixed-resolution strategy where max_pixels is ignored. Default: false.
Pixel limits when enabled (true):
- Qwen3.5 series, Qwen3-VL series, qwen-vl-max, qwen-vl-max-latest, qwen-vl-max-0813, qwen-vl-plus, qwen-vl-plus-latest, qwen-vl-plus-0815: fixed at 16777216 pixels (16384 tokens × 32×32 pixels)
- QVQ series and other Qwen2.5-VL series: fixed at 12845056 pixels (16384 tokens × 28×28 pixels)
When false, the pixel limit is determined by max_pixels.
Whether to return the dimensions of the scaled image in the response (image_hw field). When streaming, returned in the last chunk. Applies to Qwen-VL series models.
Response
The status code of the request. 200 indicates success. The Java SDK does not return this field; if a call fails, an exception is thrown containing the status_code.
A unique identifier for this request. In the Java SDK, this is requestId.
The error code. Empty string if the request was successful. Only the Python SDK returns this field.
A human-readable error message. Empty string if the request was successful.
The model's output.
Show child attributes
Show child attributes
The generated text. Returned when result_format is text.
The reason generation stopped. Returned when result_format is text. Values: null (still generating), stop (natural end or stop condition triggered), length (max tokens reached), tool_calls (tool call triggered).
The output choices. Returned when result_format is message.
Show child attributes
Show child attributes
The reason generation stopped. Values: null (generating), stop, length, tool_calls.
The assistant's output message.
Show child attributes
Show child attributes
Always assistant.
The message content. A string for text models; an array for Qwen-VL/Qwen-Audio models. Empty when tool_calls is present.
The deep thinking content. Returned when thinking mode is enabled.
Tool calls the model wants to make. Present when the model triggers a function call.
Show child attributes
Show child attributes
The ID of the tool call.
The type of tool. Currently only function is supported.
The index of this tool call in the tool_calls array.
Token usage information for this request.
Show child attributes
Show child attributes
Number of tokens in the user input.
Number of tokens in the model output.
Total tokens (input + output). Returned for plain text input.
Number of tokens in the input image. Returned when the input includes an image.
Number of tokens in the input video. Returned when the input includes a video.
Number of tokens in the input audio. Returned when the input includes audio.
Fine-grained classification of input tokens.
Show child attributes
Show child attributes
Number of tokens that hit the cache. See Context cache.
Number of tokens used to create the explicit cache.
If explicit caching is used, the value is ephemeral. Otherwise not returned.