Skip to main content
Chat Models

OpenAI chat

Compatible Chat API

POST
/compatible-mode/v1/chat/completions
import os
from openai import OpenAI

client = OpenAI(
  # If the environment variable is not set, replace the following line with: api_key="sk-xxx"
  api_key=os.getenv("DASHSCOPE_API_KEY"),
  base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)

completion = client.chat.completions.create(
  model="qwen3.6-plus"
  messages=[
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Who are you?"},
  ],
  # extra_body={"enable_thinking": False},
)
print(completion.model_dump_json())
{
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "I am a large-scale language model developed by Alibaba Cloud. My name is Qwen."
      },
      "finish_reason": "stop",
      "index": 0,
      "logprobs": null
    }
  ],
  "object": "chat.completion",
  "usage": {
    "prompt_tokens": 3019,
    "completion_tokens": 104,
    "total_tokens": 3123,
    "prompt_tokens_details": {
      "cached_tokens": 2048
    }
  },
  "created": 1735120033,
  "system_fingerprint": null,
  "model": "qwen3.6-plus",
  "id": "chatcmpl-6ada9ed2-7f33-9de2-8bb0-78bd4035025a"
}

Authorizations

string
header
required

DashScope API key.

Body

application/json
string
required

The name of the model to use. Supported models include Qwen large language models (commercial and open source), Qwen-VL, Qwen-Coder, Qwen-Omni, and Qwen-Math. For specific model names and billing information, see Text Generation - Qwen.

(System message · object | User message · object | Assistant message · object | Tool message · object)[]
required

The conversation history for the model, listed in chronological order.

A system message that defines the role, tone, task objectives, or constraints for the model. Place it at the beginning of the messages array. Do not set a System Message for the QwQ model. A System Message has no effect on the QVQ model.

  • System message
  • User message
  • Assistant message
  • Tool message
boolean
defaultfalse

Enables streaming output mode. When true, the model generates and sends output incrementally. A data block (chunk) is returned as soon as part of the content is generated. You can read these chunks in real time to assemble the full reply. Set this to true to improve the reading experience and reduce the risk of timeouts.

object

Configuration options for streaming output. This parameter is effective only when stream is set to true.

string[]
default["text"]

Specifies the modalities of the output data. This parameter applies only to Qwen-Omni models. Valid values: ["text","audio"] or ["text"].

object

The voice and format of the output audio. This parameter applies only to Qwen-Omni models, and you must set the modalities parameter to ["text","audio"].

number

The sampling temperature controls the diversity of the generated text. Higher values increase diversity, while lower values make the output more deterministic. The value must be greater than or equal to 0 and less than 2. Both temperature and top_p control the diversity of the generated text. Set only one of them. Do not modify the default temperature value for QVQ models.

number

The probability threshold for nucleus sampling. A higher top_p value produces more diverse text. A lower top_p value produces more deterministic text. Value range: (0, 1.0]. Both temperature and top_p control the diversity of the generated text. Set only one of them. Do not modify the default top_p value for QVQ models.

integer

Specifies the number of candidate tokens to use for sampling during generation. A larger value produces more random output, whereas a smaller value produces more deterministic output. If set to null or a value greater than 100, the top_k strategy is disabled and only top_p takes effect. The value must be an integer greater than or equal to 0.

Default top_k values:

  • QVQ series, qwen-vl-plus-2025-07-10, and qwen-vl-plus-2025-08-15: 10
  • QwQ series: 40
  • Other qwen-vl-plus series, models before qwen-vl-max-2025-08-13, qwen2.5-omni-7b: 1
  • Qwen3-Omni-Flash series: 50
  • All other models: 20

You must not change the default top_k value for QVQ models.

This is not a standard OpenAI parameter. When using the Python SDK, include it in extra_body: extra_body={"top_k": xxx}.

number

Controls how strongly the model avoids repeating content. Valid values: -2.0 to 2.0. Positive values reduce repetition. Negative values increase it. For creative writing or brainstorming, increase this value. For technical documents or formal text, decrease this value.

Default presence_penalty values:

  • Qwen3.5 (non-thinking mode), qwen3-max-preview (thinking mode), Qwen3 (non-thinking mode), Qwen3-Instruct series, qwen3-0.6b/1.7b/4b (thinking mode), QVQ series, qwen-max, qwen-max-latest, qwen2.5-vl series, qwen-vl-max series, qwen-vl-plus, Qwen3-VL (non-thinking): 1.5
  • qwen-vl-plus-latest, qwen-vl-plus-2025-08-15: 1.2
  • qwen-vl-plus-2025-01-25: 1.0
  • qwen3-8b/14b/32b/30b-a3b/235b-a22b (thinking mode), qwen3.6-plus/qwen3.6-plus-2026-04-02, qwen3.5-plus/qwen3.5-plus-latest/2025-04-28 (thinking mode), qwen-turbo/qwen-turbo/2025-04-28 (thinking mode): 0.5
  • All other models: 0.0

How it works: When the parameter value is positive, the model penalizes tokens that already appear in the generated text. The penalty does not depend on how many times a token appears. This reduces the likelihood of those tokens reappearing, which decreases repetition and increases lexical diversity.

When using the qwen-vl-plus-2025-01-25 model for text extraction, set presence_penalty to 1.5.

Do not modify the default presence_penalty value for QVQ models.

object

The format of the response content. Defaults to {"type": "text"}.

Valid values:

  • {"type": "text"}: Returns a plain text response.
  • {"type": "json_object"}: Returns a JSON string that conforms to standard JSON syntax.
  • {"type": "json_schema", "json_schema": {...}}: Returns a JSON string that conforms to a custom schema.

If you specify {"type": "json_object"}, explicitly instruct the model to output JSON in the prompt, such as by adding "Please output in JSON format." Otherwise, an error occurs.

For supported models, see Structured output.

integer

The maximum number of tokens in the response. Generation stops when this limit is reached, and the finish_reason field is set to length. The default and maximum values correspond to the model's maximum output length. max_tokens does not limit the length of the chain-of-thought.

boolean
defaultfalse

Increases the maximum pixel limit for input images to the pixel value corresponding to 16384 tokens. When true, a fixed-resolution strategy is used and the max_pixels setting is ignored. If an image exceeds the pixel limit, its total pixel count is downscaled to meet the limit.

Pixel limits when vl_high_resolution_images is true:

  • Qwen3.5 series, Qwen3-VL series, qwen-vl-max, qwen-vl-max-latest, qwen-vl-max-0813, qwen-vl-plus, qwen-vl-plus-latest, qwen-vl-plus-0815: 16,777,216 (each token corresponds to 32×32 pixels, i.e., 16,384×32×32)
  • QVQ series and other Qwen2.5-VL series models: 12,845,056 (each token corresponds to 28×28 pixels, i.e., 16,384×28×28)

If vl_high_resolution_images is false, the actual pixel limit is determined by max_pixels.

This is not a standard OpenAI parameter. When using the Python SDK, include it in extra_body: extra_body={"vl_high_resolution_images": xxx}.

integer
default1

The number of responses to generate. Must be an integer in the range of 1-4. This is useful for scenarios that require multiple candidate responses, such as creative writing or ad copy. Supported only by Qwen3 (non-thinking mode) models. If you pass the tools parameter, set n to 1. Increasing n increases output token consumption but does not affect input token consumption.

boolean

Enables the thinking mode for hybrid thinking models. This mode is available for the Qwen3.5, Qwen3, Qwen3-Omni-Flash, and Qwen3-VL models. When enabled, the thinking content is returned in the reasoning_content field.

Default values differ by model. For supported models and their default enable_thinking values, see the Model List.

This is not a standard OpenAI parameter. When using the Python SDK, place it in extra_body: extra_body={"enable_thinking": xxx}.

integer

The maximum number of tokens for the thinking process. Applies to Qwen3.5, Qwen3-VL, and the commercial and open source versions of Qwen3 models. The default value is the model's maximum chain-of-thought length. For more information, see the Model List.

This is not a standard OpenAI parameter. When using the Python SDK, place it in extra_body: extra_body={"thinking_budget": xxx}.

boolean
defaultfalse

Specifies whether to enable the code interpreter feature.

This is not a standard OpenAI parameter. When using the Python SDK, include it in extra_body: extra_body={"enable_code_interpreter": xxx}.

integer

The random number seed. Ensures reproducible results. If you use the same seed value and other parameters remain unchanged, the model returns the same result whenever possible. Valid values: [0, 2^31-1].

boolean
defaultfalse

Specifies whether to return the log probabilities of the output tokens. Content generated during the thinking phase (reasoning_content) does not include log probabilities.

Supported models:

  • Qwen-plus series snapshots (excluding the stable model)
  • Qwen-turbo series snapshots (excluding the stable model)
  • Qwen3-vl-plus models (including the stable model)
  • Qwen3-vl-flash models (including the stable model)
  • Qwen3 open source models
integer
default0

The number of most likely candidate tokens to return at each generation step. Valid values: 0 to 5. This parameter applies only if logprobs is set to true.

string

Stop words. If a string or token specified in stop appears in the generated text, generation stops immediately. If stop is an array, do not use a token_id and a string as elements simultaneously.

object[]

An array of one or more tool objects that the model can call in function calling. When tools is set and the model determines that a tool needs to be called, the response returns tool information in the tool_calls field. For usage examples, see the Function calling guide.

enum<string>
default"auto"

The tool selection policy. auto lets the model select. none disables tool calling. An object with {"type": "function", "function": {"name": "..."}} forces a specific tool. Models in thinking mode do not support forcing a specific tool.

boolean
defaultfalse

Specifies whether to enable parallel tool calling.

Enables web search. Enabling web search may increase token consumption.

This is not a standard OpenAI parameter. When using the Python SDK, include it in extra_body: extra_body={"enable_search": True}.

object

The web search strategy. Takes effect only when enable_search is true.

Properties:

  • forced_search (boolean, default: false): Forces web search. true forcefully enables web search. false lets the model decide.
  • search_strategy (string, default: turbo): The search scale strategy. turbo balances speed and effectiveness. max uses a more comprehensive strategy with multiple search engines. agent calls search and LLM multiple times for multi-round retrieval (applicable only to qwen3.6-plus, qwen3.6-plus-2026-04-02, qwen3.5-plus, qwen3.5-plus-2026-02-15, qwen3.5-flash, qwen3.5-flash-2026-02-23, qwen3-max, qwen3-max-2026-01-23, and qwen3-max-2025-09-23). agent_max adds web extraction to the agent strategy (applicable only to qwen3.6-plus, qwen3.6-plus-2026-04-02, qwen3.5-plus, qwen3.5-plus-2026-02-15, qwen3.5-flash, qwen3.5-flash-2026-02-23, and the thinking mode of qwen3-max and qwen3-max-2026-01-23). When agent or agent_max is enabled, only return search sources (enable_source: true) is supported. All other web search features are unavailable.
  • enable_search_extension (boolean, default: false): Enables domain-specific search.

This is not a standard OpenAI parameter. When using the Python SDK, include it in extra_body: extra_body={"search_options": xxx}.

Response

200-application/json
string

The unique identifier for this request.

object[]

An array of generated content from the model.

integer

The Unix timestamp, in seconds, when the request was created.

string

The model used for this request.

enum<string>

Always chat.completion.

chat.completion
string | null

Currently fixed as null.

string | null

Currently fixed as null.

object

Token consumption details for this request.