Skip to main content
Models

Thinking

Solve complex tasks with step-by-step thinking

Thinking (reasoning) models reason before answering — outputting reasoning_content (Chat Completions / DashScope) or reasoning_summary_text events (Responses API). Models support thinking in one of two modes:
  • Hybrid: toggle thinking on or off per request with enable_thinking. Qwen3.5 (enabled by default), Qwen3, Qwen3-VL, Qwen3-Omni (disabled by default).
  • Thinking-only: always thinks — cannot be disabled. QwQ, -thinking variants.

Enable thinking

  • OpenAI Chat Completions
  • OpenAI Responses API
  • DashScope
import os
from openai import OpenAI
client = OpenAI(api_key=os.getenv("DASHSCOPE_API_KEY"), base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1")

completion = client.chat.completions.create(
  model="qwen3.6-plus",
  messages=[{"role": "user", "content": "If 3x + 7 = 22, what is x?"}],
  extra_body={"enable_thinking": True},              # ← enable thinking
  stream=True,
)
for chunk in completion:
  if not chunk.choices:
    continue
  delta = chunk.choices[0].delta
  if hasattr(delta, "reasoning_content") and delta.reasoning_content:
    print(delta.reasoning_content, end="", flush=True)  # ← phase 1: thinking
  if hasattr(delta, "content") and delta.content:
    print(delta.content, end="", flush=True)             # ← phase 2: answer

Control thinking depth

Token budget

Use thinking_budget to cap the number of thinking tokens. If the limit is reached, the model stops thinking and generates its answer immediately. All thinking-capable models from Qwen3 onward support this parameter. Chat Completions and DashScope only — not supported by the Responses API.
  • OpenAI Chat Completions
  • DashScope
extra_body={"enable_thinking": True, "thinking_budget": 500}

Prompt-level control

With enable_thinking: true, add /no_think to skip thinking for one turn. /think restores it. Last instruction wins. Supported by open-source Qwen3 hybrid models, qwen-plus-2025-04-28, and qwen-turbo-2025-04-28.

Function calling with thinking mode

When thinking is enabled during function calling, the model reasons about which tools to call and how to use results before responding. The response includes reasoning_content before each tool call. Key points:
  • Pass enable_thinking: true alongside your tools array — no other config needed.
  • In multi-turn tool-call flows, include the assistant's reasoning_content when sending tool results back. Omitting it degrades accuracy.
  • Streaming delivers thinking tokens first, then tool call deltas. See Streaming with tool calls for the parse pattern.
  • thinking_budget works the same as in regular thinking mode.
Thinking mode is most valuable for complex tool orchestration — multi-step reasoning about which tools to call, parameter selection, and result interpretation. For simple single-tool calls, the overhead may not be worth it.

Notes

  • Streaming required for most models: Non-streaming is only supported by Qwen3.5 Plus/Flash, Qwen3 Max, Qwen Plus/Flash/Turbo (commercial), and all open-source models. Streaming is always recommended to avoid timeout risks.
  • No audio output in thinking mode (Qwen3-Omni): Text and image inputs work normally; audio output is not available when thinking is enabled.