Authorizations
Qwen Cloud API key passed via x-api-key header. Authorization: Bearer header is also supported.
Body
application/jsonModel name. Supported models:
Qwen Max: qwen3.6-max-preview, qwen3-max, qwen3-max-2026-01-23, qwen3-max-preview
Qwen Plus: qwen3.6-plus, qwen3.6-plus-2026-04-02, qwen3.5-plus, qwen3.5-plus-2026-04-20, qwen3.5-plus-2026-02-15, qwen-plus, qwen-plus-latest, qwen-plus-2025-09-11
Qwen Flash: qwen3.6-flash, qwen3.6-flash-2026-04-16, qwen3.5-flash, qwen3.5-flash-2026-02-23, qwen-flash, qwen-flash-2025-07-28
Qwen Turbo: qwen-turbo, qwen-turbo-latest
Qwen Coder: qwen3-coder-next, qwen3-coder-plus, qwen3-coder-plus-2025-09-23, qwen3-coder-flash
Qwen VL: qwen3-vl-plus, qwen3-vl-flash, qwen-vl-max, qwen-vl-plus
Qwen Open-source: qwen3.6-27b, qwen3.5-397b-a17b, qwen3.5-122b-a10b, qwen3.5-27b, qwen3.5-35b-a3b
Third-party models: deepseek-v4-pro, deepseek-v4-flash, deepseek-v3.2
Maximum number of tokens to generate.
Message array, alternating between user and assistant turns.
System prompt to set the model's role or behavior. Passed as a top-level parameter; the messages array does not accept the system role. A string is equivalent to a single type="text" content block. To use context caching, pass an array of content blocks with cache_control.
Enable streaming output. Default is false.
Controls diversity of generated text, range [0, 2). Higher values produce more random output. This range differs from Anthropic's native [0.0, 1.0] — verify this parameter when migrating from Anthropic.
Probability threshold for nucleus sampling. Both temperature and top_p control diversity — set only one at a time.
Size of the sampling candidate set during generation.
Text sequences that stop generation. The model stops before outputting the sequence. When hit, stop_reason is still end_turn and the matched sequence is not included in the response.
Thinking configuration. When enabled, the model reasons before generating a reply to improve accuracy. The response will include thinking type content blocks.
Controls reasoning intensity. Default is max. Supported models: deepseek-v4-pro, deepseek-v4-flash. Values low or medium are mapped to high; xhigh is mapped to max.
Tool definition array for function calling.
Tool choice strategy. {"type": "auto"}: model decides whether to call tools (default). {"type": "any"}: force calling any tool. {"type": "none"}: disable tool calling. {"type": "tool", "name": "tool_name"}: force calling a specific tool.
Response
Unique message identifier.
Always message.
Always assistant.
Model name used.
Content array. Element types can be text, thinking (returned when thinking is enabled), or tool_use (tool call).
Stop reason: end_turn (normal completion), max_tokens (token limit reached), tool_use (tool call).
Always null.
Token usage statistics. In streaming, the usage in the message_start event only contains input_tokens and output_tokens; all 4 fields appear in the message_delta event.

