Compatibility with OpenAI
This API is OpenAI-compatible, but key differences exist in parameters, features, and behaviors.
Requests process only the parameters listed in this document. Any OpenAI parameters not mentioned are ignored.
Key differences:
-
Unsupported parameters: Some parameters are not supported, such as
background(synchronous only). -
Additional parameters: Supports extra parameters beyond OpenAI's spec, such as
enable_thinking.
Authorizations
DashScope API key.
Header Parameters
Controls session cache for multi-turn conversations using previous_response_id. When enabled, the server automatically caches conversation context to reduce latency and cost.
enable: Enables session cache. Cache creation tokens are billed at 125% of the standard input price; cache hits at 10%. Cache is valid for 5 minutes (resets on hit). Minimum 1024 tokens for cache creation.disable: Disables session cache. Falls back to implicit cache if supported by the model.
Supported models: qwen3-max, qwen3.6-plus, qwen3.5-plus, qwen3.5-flash, qwen-plus, qwen-flash, qwen3-coder-plus, qwen3-coder-flash.
Pass via SDK: default_headers (Python) or defaultHeaders (Node.js).
Body
application/jsonThe model name. Supported models include qwen3-max, qwen3-max-2026-01-23, qwen3.6-plus, qwen3.6-plus-2026-04-02, qwen3.5-plus, qwen3.5-plus-2026-02-15, qwen3.5-flash, qwen3.5-flash-2026-02-23, qwen3.5-397b-a17b, qwen3.5-122b-a10b, qwen3.5-27b, qwen3.5-35b-a3b, qwen-plus, qwen-flash, qwen3-coder-plus, qwen3-coder-flash.
The model input. Supports a plain text string or a message array arranged in conversational order.
A system instruction inserted at the beginning of the context. When previous_response_id is used, the instructions specified in the previous turn are not carried over to the current context.
The unique ID of the previous response. The current response id is valid for 7 days. You can use this parameter to create a multi-turn conversation. The server automatically retrieves and combines the input and output of that turn as context. If both an input message array and previous_response_id are provided, the new messages in input are appended to the historical context. Cannot be used together with conversation. For usage examples, see the Multi-turn conversations guide.
The conversation to which the current response belongs. Historical items in the conversation are automatically passed as context to the current request. The input and output of the current request are also automatically added to the conversation after the response completes. Cannot be used together with previous_response_id.
Specifies whether to enable streaming output. If this parameter is set to true, the model response data is streamed back to the client in real time.
A list of tools the model can use. Supported tool types: web_search, code_interpreter, web_extractor, web_search_image, image_search, file_search, mcp, function.
Built-in tools use {"type": "<tool_name>"} format. For example: {"type": "web_search"}.
MCP tools use the following format:
Function tools use the following format:
Controls how the model selects and calls tools. Supports string format and object format.
String format:
auto: The model automatically decides whether to call a tool.none: Prevents the model from calling any tool.required: Forces the model to call a tool. Available only when there is a single tool in the tools list.
Object format: Specifies the range of available tools for the model. The model can select and call tools only from the predefined list.
The sampling temperature that controls the diversity of the generated text. A higher temperature results in more diverse text. A lower temperature results in more deterministic text. Value range: [0, 2). Both temperature and top_p control the diversity of the generated text. We recommend that you set only one of them.
The probability threshold for nucleus sampling that controls the diversity of the generated text. A higher top_p value results in more diverse text. A lower top_p value results in more deterministic text. Value range: (0, 1.0]. Both temperature and top_p control the diversity of the generated text. We recommend that you set only one of them.
Specifies whether to enable thinking mode. If set to true, the model thinks before replying. The thinking content is returned through an output item of the reasoning type. Reasoning tokens are counted in output_tokens_details.reasoning_tokens and are billed as reasoning tokens. When thinking mode is enabled, we recommend enabling the built-in tools to achieve the best model performance on complex tasks.
This is not a standard OpenAI parameter. The Python SDK passes it using extra_body={"enable_thinking": True}. The Node.js SDK and curl use enable_thinking: true directly as a top-level parameter.
Response
The unique ID for this response. It is valid for 7 days. You can use this parameter in the previous_response_id parameter to create a multi-turn conversation.
The Unix timestamp in seconds for this request.
The object type. The value is response.
The status of the response generation.
The ID of the model that is used to generate the response.
An array of output items generated by the model. The type and order of elements in the array depend on the model's response.
Whether parallel tool calls are enabled.
The value of the tool_choice parameter from the echo request. Valid values are auto, none, and required.
The complete content of the tools parameter from the echo request. The structure is the same as the tools parameter in the request body.
The error object that is returned when the model fails to generate a response. This field is null on success.
The token consumption information for this request.