Skip to main content
Chat Models

OpenAI responses

Compatible Responses API

POST
/api/v2/apps/protocols/compatible-mode/v1/responses
import os
from openai import OpenAI

client = OpenAI(
  # If environment variable is not set, replace with: api_key="sk-xxx"
  api_key=os.getenv("DASHSCOPE_API_KEY"),
  base_url="https://dashscope-intl.aliyuncs.com/api/v2/apps/protocols/compatible-mode/v1",
)

response = client.responses.create(
  model="qwen3.6-plus",
  input="What can you do?"
)

# Get model response
print(response.output_text)
{
  "created_at": 1771165900,
  "id": "f75c28fb-4064-48ed-90da-4d2cc4362xxx",
  "model": "qwen3.6-plus",
  "object": "response",
  "output": [
    {
      "content": [
        {
          "annotations": [],
          "text": "Hello! I am Qwen3.5, a large language model developed by Alibaba Cloud with knowledge up to 2026, designed to assist you with complex reasoning, creative tasks, and multilingual conversations.",
          "type": "output_text"
        }
      ],
      "id": "msg_89ad23e6-f128-4d4c-b7a1-a786e7880xxx",
      "role": "assistant",
      "status": "completed",
      "type": "message"
    }
  ],
  "parallel_tool_calls": false,
  "status": "completed",
  "tool_choice": "auto",
  "tools": [],
  "usage": {
    "input_tokens": 57,
    "input_tokens_details": {
      "cached_tokens": 0
    },
    "output_tokens": 44,
    "output_tokens_details": {
      "reasoning_tokens": 0
    },
    "total_tokens": 101,
    "x_details": [
      {
        "input_tokens": 57,
        "output_tokens": 44,
        "total_tokens": 101,
        "x_billing_type": "response_api"
      }
    ]
  }
}

Compatibility with OpenAI

This API is OpenAI-compatible, but key differences exist in parameters, features, and behaviors. Requests process only the parameters listed in this document. Any OpenAI parameters not mentioned are ignored. Key differences:
  • Unsupported parameters: Some parameters are not supported, such as background (synchronous only).
  • Additional parameters: Supports extra parameters beyond OpenAI's spec, such as enable_thinking.

Authorizations

string
header
required

DashScope API key.

Header Parameters

enum<string>

Controls session cache for multi-turn conversations using previous_response_id. When enabled, the server automatically caches conversation context to reduce latency and cost.

  • enable: Enables session cache. Cache creation tokens are billed at 125% of the standard input price; cache hits at 10%. Cache is valid for 5 minutes (resets on hit). Minimum 1024 tokens for cache creation.
  • disable: Disables session cache. Falls back to implicit cache if supported by the model.

Supported models: qwen3-max, qwen3.6-plus, qwen3.5-plus, qwen3.5-flash, qwen-plus, qwen-flash, qwen3-coder-plus, qwen3-coder-flash.

Pass via SDK: default_headers (Python) or defaultHeaders (Node.js).

enable,disable

Body

application/json
string
required

The model name. Supported models include qwen3-max, qwen3-max-2026-01-23, qwen3.6-plus, qwen3.6-plus-2026-04-02, qwen3.5-plus, qwen3.5-plus-2026-02-15, qwen3.5-flash, qwen3.5-flash-2026-02-23, qwen3.5-397b-a17b, qwen3.5-122b-a10b, qwen3.5-27b, qwen3.5-35b-a3b, qwen-plus, qwen-flash, qwen3-coder-plus, qwen3-coder-flash.

string
required

The model input. Supports a plain text string or a message array arranged in conversational order.

string

A system instruction inserted at the beginning of the context. When previous_response_id is used, the instructions specified in the previous turn are not carried over to the current context.

string

The unique ID of the previous response. The current response id is valid for 7 days. You can use this parameter to create a multi-turn conversation. The server automatically retrieves and combines the input and output of that turn as context. If both an input message array and previous_response_id are provided, the new messages in input are appended to the historical context. Cannot be used together with conversation. For usage examples, see the Multi-turn conversations guide.

string

The conversation to which the current response belongs. Historical items in the conversation are automatically passed as context to the current request. The input and output of the current request are also automatically added to the conversation after the response completes. Cannot be used together with previous_response_id.

boolean
defaultfalse

Specifies whether to enable streaming output. If this parameter is set to true, the model response data is streamed back to the client in real time.

object[]

A list of tools the model can use. Supported tool types: web_search, code_interpreter, web_extractor, web_search_image, image_search, file_search, mcp, function.

Built-in tools use {"type": "<tool_name>"} format. For example: {"type": "web_search"}.

MCP tools use the following format:

{
    "type": "mcp",
    "server_protocol": "sse",
    "server_label": "amap-maps",
    "server_description": "AMAP MCP Server...",
    "server_url": "https://dashscope-intl.aliyuncs.com/api/v1/mcps/amap-maps/sse",
    "headers": {
        "Authorization": "Bearer <your-mcp-server-token>"
    }
}

Function tools use the following format:

[{
  "type": "function",
  "name": "get_weather",
  "description": "Get weather information for a specified city",
  "parameters": {
    "type": "object",
    "properties": {
      "city": {
        "type": "string",
        "description": "The name of the city"
      }
    },
    "required": ["city"]
  }
}]
``` For usage examples, see the [Function calling guide](/developer-guides/text-generation/function-calling) and the [Web search guide](/developer-guides/text-generation/web-search).
enum<string>

Controls how the model selects and calls tools. Supports string format and object format.

String format:

  • auto: The model automatically decides whether to call a tool.
  • none: Prevents the model from calling any tool.
  • required: Forces the model to call a tool. Available only when there is a single tool in the tools list.

Object format: Specifies the range of available tools for the model. The model can select and call tools only from the predefined list.

number

The sampling temperature that controls the diversity of the generated text. A higher temperature results in more diverse text. A lower temperature results in more deterministic text. Value range: [0, 2). Both temperature and top_p control the diversity of the generated text. We recommend that you set only one of them.

number

The probability threshold for nucleus sampling that controls the diversity of the generated text. A higher top_p value results in more diverse text. A lower top_p value results in more deterministic text. Value range: (0, 1.0]. Both temperature and top_p control the diversity of the generated text. We recommend that you set only one of them.

boolean

Specifies whether to enable thinking mode. If set to true, the model thinks before replying. The thinking content is returned through an output item of the reasoning type. Reasoning tokens are counted in output_tokens_details.reasoning_tokens and are billed as reasoning tokens. When thinking mode is enabled, we recommend enabling the built-in tools to achieve the best model performance on complex tasks.

This is not a standard OpenAI parameter. The Python SDK passes it using extra_body={"enable_thinking": True}. The Node.js SDK and curl use enable_thinking: true directly as a top-level parameter.

Response

200-application/json
string

The unique ID for this response. It is valid for 7 days. You can use this parameter in the previous_response_id parameter to create a multi-turn conversation.

number

The Unix timestamp in seconds for this request.

enum<string>

The object type. The value is response.

response
enum<string>

The status of the response generation.

completed,failed,in_progress,cancelled,queued,incomplete
string

The ID of the model that is used to generate the response.

object[]

An array of output items generated by the model. The type and order of elements in the array depend on the model's response.

boolean

Whether parallel tool calls are enabled.

string

The value of the tool_choice parameter from the echo request. Valid values are auto, none, and required.

object[]

The complete content of the tools parameter from the echo request. The structure is the same as the tools parameter in the request body.

object | null

The error object that is returned when the model fails to generate a response. This field is null on success.

object

The token consumption information for this request.