Authorizations
DashScope API key.
Body
application/jsonThe name of the model to use. Supported models include Qwen large language models (commercial and open source), Qwen-VL, Qwen-Coder, Qwen-Omni, and Qwen-Math. For specific model names and billing information, see Text Generation - Qwen.
The conversation history for the model, listed in chronological order.
A system message that defines the role, tone, task objectives, or constraints for the model. Place it at the beginning of the messages array. Do not set a System Message for the QwQ model. A System Message has no effect on the QVQ model.
- System message
- User message
- Assistant message
- Tool message
Enables streaming output mode. When true, the model generates and sends output incrementally. A data block (chunk) is returned as soon as part of the content is generated. You can read these chunks in real time to assemble the full reply. Set this to true to improve the reading experience and reduce the risk of timeouts.
Configuration options for streaming output. This parameter is effective only when stream is set to true.
Specifies the modalities of the output data. This parameter applies only to Qwen-Omni models. Valid values: ["text","audio"] or ["text"].
The voice and format of the output audio. This parameter applies only to Qwen-Omni models, and you must set the modalities parameter to ["text","audio"].
The sampling temperature controls the diversity of the generated text. Higher values increase diversity, while lower values make the output more deterministic. The value must be greater than or equal to 0 and less than 2. Both temperature and top_p control the diversity of the generated text. Set only one of them. Do not modify the default temperature value for QVQ models.
The probability threshold for nucleus sampling. A higher top_p value produces more diverse text. A lower top_p value produces more deterministic text. Value range: (0, 1.0]. Both temperature and top_p control the diversity of the generated text. Set only one of them. Do not modify the default top_p value for QVQ models.
Specifies the number of candidate tokens to use for sampling during generation. A larger value produces more random output, whereas a smaller value produces more deterministic output. If set to null or a value greater than 100, the top_k strategy is disabled and only top_p takes effect. The value must be an integer greater than or equal to 0.
Default top_k values:
- QVQ series, qwen-vl-plus-2025-07-10, and qwen-vl-plus-2025-08-15: 10
- QwQ series: 40
- Other qwen-vl-plus series, models before qwen-vl-max-2025-08-13, qwen2.5-omni-7b: 1
- Qwen3-Omni-Flash series: 50
- All other models: 20
You must not change the default top_k value for QVQ models.
This is not a standard OpenAI parameter. When using the Python SDK, include it in extra_body: extra_body={"top_k": xxx}.
Controls how strongly the model avoids repeating content. Valid values: -2.0 to 2.0. Positive values reduce repetition. Negative values increase it. For creative writing or brainstorming, increase this value. For technical documents or formal text, decrease this value.
Default presence_penalty values:
- Qwen3.5 (non-thinking mode), qwen3-max-preview (thinking mode), Qwen3 (non-thinking mode), Qwen3-Instruct series, qwen3-0.6b/1.7b/4b (thinking mode), QVQ series, qwen-max, qwen-max-latest, qwen2.5-vl series, qwen-vl-max series, qwen-vl-plus, Qwen3-VL (non-thinking): 1.5
- qwen-vl-plus-latest, qwen-vl-plus-2025-08-15: 1.2
- qwen-vl-plus-2025-01-25: 1.0
- qwen3-8b/14b/32b/30b-a3b/235b-a22b (thinking mode), qwen3.6-plus/qwen3.6-plus-2026-04-02, qwen3.5-plus/qwen3.5-plus-latest/2025-04-28 (thinking mode), qwen-turbo/qwen-turbo/2025-04-28 (thinking mode): 0.5
- All other models: 0.0
How it works: When the parameter value is positive, the model penalizes tokens that already appear in the generated text. The penalty does not depend on how many times a token appears. This reduces the likelihood of those tokens reappearing, which decreases repetition and increases lexical diversity.
When using the qwen-vl-plus-2025-01-25 model for text extraction, set presence_penalty to 1.5.
Do not modify the default presence_penalty value for QVQ models.
The format of the response content. Defaults to {"type": "text"}.
Valid values:
{"type": "text"}: Returns a plain text response.{"type": "json_object"}: Returns a JSON string that conforms to standard JSON syntax.{"type": "json_schema", "json_schema": {...}}: Returns a JSON string that conforms to a custom schema.
If you specify {"type": "json_object"}, explicitly instruct the model to output JSON in the prompt, such as by adding "Please output in JSON format." Otherwise, an error occurs.
For supported models, see Structured output.
The maximum number of tokens in the response. Generation stops when this limit is reached, and the finish_reason field is set to length. The default and maximum values correspond to the model's maximum output length. max_tokens does not limit the length of the chain-of-thought.
Increases the maximum pixel limit for input images to the pixel value corresponding to 16384 tokens. When true, a fixed-resolution strategy is used and the max_pixels setting is ignored. If an image exceeds the pixel limit, its total pixel count is downscaled to meet the limit.
Pixel limits when vl_high_resolution_images is true:
- Qwen3.5 series, Qwen3-VL series, qwen-vl-max, qwen-vl-max-latest, qwen-vl-max-0813, qwen-vl-plus, qwen-vl-plus-latest, qwen-vl-plus-0815: 16,777,216 (each token corresponds to 32×32 pixels, i.e., 16,384×32×32)
- QVQ series and other Qwen2.5-VL series models: 12,845,056 (each token corresponds to 28×28 pixels, i.e., 16,384×28×28)
If vl_high_resolution_images is false, the actual pixel limit is determined by max_pixels.
This is not a standard OpenAI parameter. When using the Python SDK, include it in extra_body: extra_body={"vl_high_resolution_images": xxx}.
The number of responses to generate. Must be an integer in the range of 1-4. This is useful for scenarios that require multiple candidate responses, such as creative writing or ad copy. Supported only by Qwen3 (non-thinking mode) models. If you pass the tools parameter, set n to 1. Increasing n increases output token consumption but does not affect input token consumption.
Enables the thinking mode for hybrid thinking models. This mode is available for the Qwen3.5, Qwen3, Qwen3-Omni-Flash, and Qwen3-VL models. When enabled, the thinking content is returned in the reasoning_content field.
Default values differ by model. For supported models and their default enable_thinking values, see the Model List.
This is not a standard OpenAI parameter. When using the Python SDK, place it in extra_body: extra_body={"enable_thinking": xxx}.
The maximum number of tokens for the thinking process. Applies to Qwen3.5, Qwen3-VL, and the commercial and open source versions of Qwen3 models. The default value is the model's maximum chain-of-thought length. For more information, see the Model List.
This is not a standard OpenAI parameter. When using the Python SDK, place it in extra_body: extra_body={"thinking_budget": xxx}.
Specifies whether to enable the code interpreter feature.
This is not a standard OpenAI parameter. When using the Python SDK, include it in extra_body: extra_body={"enable_code_interpreter": xxx}.
The random number seed. Ensures reproducible results. If you use the same seed value and other parameters remain unchanged, the model returns the same result whenever possible. Valid values: [0, 2^31-1].
Specifies whether to return the log probabilities of the output tokens. Content generated during the thinking phase (reasoning_content) does not include log probabilities.
Supported models:
- Qwen-plus series snapshots (excluding the stable model)
- Qwen-turbo series snapshots (excluding the stable model)
- Qwen3-vl-plus models (including the stable model)
- Qwen3-vl-flash models (including the stable model)
- Qwen3 open source models
The number of most likely candidate tokens to return at each generation step. Valid values: 0 to 5. This parameter applies only if logprobs is set to true.
Stop words. If a string or token specified in stop appears in the generated text, generation stops immediately. If stop is an array, do not use a token_id and a string as elements simultaneously.
An array of one or more tool objects that the model can call in function calling. When tools is set and the model determines that a tool needs to be called, the response returns tool information in the tool_calls field. For usage examples, see the Function calling guide.
The tool selection policy. auto lets the model select. none disables tool calling. An object with {"type": "function", "function": {"name": "..."}} forces a specific tool. Models in thinking mode do not support forcing a specific tool.
Specifies whether to enable parallel tool calling.
Enables web search. Enabling web search may increase token consumption.
This is not a standard OpenAI parameter. When using the Python SDK, include it in extra_body: extra_body={"enable_search": True}.
The web search strategy. Takes effect only when enable_search is true.
Properties:
forced_search(boolean, default:false): Forces web search.trueforcefully enables web search.falselets the model decide.search_strategy(string, default:turbo): The search scale strategy.turbobalances speed and effectiveness.maxuses a more comprehensive strategy with multiple search engines.agentcalls search and LLM multiple times for multi-round retrieval (applicable only to qwen3.6-plus, qwen3.6-plus-2026-04-02, qwen3.5-plus, qwen3.5-plus-2026-02-15, qwen3.5-flash, qwen3.5-flash-2026-02-23, qwen3-max, qwen3-max-2026-01-23, and qwen3-max-2025-09-23).agent_maxadds web extraction to the agent strategy (applicable only to qwen3.6-plus, qwen3.6-plus-2026-04-02, qwen3.5-plus, qwen3.5-plus-2026-02-15, qwen3.5-flash, qwen3.5-flash-2026-02-23, and the thinking mode of qwen3-max and qwen3-max-2026-01-23). Whenagentoragent_maxis enabled, only return search sources (enable_source: true) is supported. All other web search features are unavailable.enable_search_extension(boolean, default:false): Enables domain-specific search.
This is not a standard OpenAI parameter. When using the Python SDK, include it in extra_body: extra_body={"search_options": xxx}.
Response
The unique identifier for this request.
An array of generated content from the model.
The Unix timestamp, in seconds, when the request was created.
The model used for this request.
Always chat.completion.
Currently fixed as null.
Currently fixed as null.
Token consumption details for this request.