Skip to main content
Third-party models

GLM

Call GLM models through the OpenAI-compatible API or DashScope SDK on Qwen Cloud.

Quick start

glm-5.1 is the latest GLM model, supporting thinking and non-thinking modes via the enable_thinking parameter. Run the following code to call glm-5.1 in thinking mode. Prerequisites: obtain an API key and configure it as an environment variable. If calling via SDK, install the OpenAI or DashScope SDK.
  • OpenAI compatible
  • DashScope
enable_thinking is not a standard OpenAI parameter. In the Python SDK, pass it via extra_body; in the Node.js SDK, pass it as a top-level parameter.
  • Python
  • Node.js
  • curl
Example code
from openai import OpenAI
import os

client = OpenAI(
  api_key=os.getenv("DASHSCOPE_API_KEY"),
  base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)

messages = [{"role": "user", "content": "Who are you?"}]
completion = client.chat.completions.create(
  model="glm-5.1",
  messages=messages,
  extra_body={"enable_thinking": True},
  stream=True,
  stream_options={"include_usage": True},
)

reasoning_content = ""
answer_content = ""
is_answering = False
print("\n" + "=" * 20 + " Thinking " + "=" * 20 + "\n")

for chunk in completion:
  if not chunk.choices:
    print("\n" + "=" * 20 + " Token Usage " + "=" * 20 + "\n")
    print(chunk.usage)
    continue

  delta = chunk.choices[0].delta

  if hasattr(delta, "reasoning_content") and delta.reasoning_content is not None:
    if not is_answering:
      print(delta.reasoning_content, end="", flush=True)
    reasoning_content += delta.reasoning_content

  if hasattr(delta, "content") and delta.content:
    if not is_answering:
      print("\n" + "=" * 20 + " Response " + "=" * 20 + "\n")
      is_answering = True
    print(delta.content, end="", flush=True)
    answer_content += delta.content

Streaming tool calling

glm-5.1 supports the tool_stream parameter (boolean, default false), effective only when stream is true. When enabled, function calling arguments are returned incrementally across multiple chunks rather than all at once.
streamtool_streamtool_call behavior
truetruearguments returned incrementally across chunks
truefalse (default)arguments returned in a single chunk
falsetrue/falsetool_stream has no effect
  • OpenAI compatible
  • DashScope
  • Python
  • Node.js
  • curl
from openai import OpenAI
import os

client = OpenAI(
  api_key=os.getenv("DASHSCOPE_API_KEY"),
  base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)

tools = [
  {
    "type": "function",
    "function": {
      "name": "get_weather",
      "description": "Get weather information for a city",
      "parameters": {
        "type": "object",
        "properties": {
          "city": {"type": "string", "description": "City name"}
        },
        "required": ["city"]
      }
    }
  }
]

messages = [{"role": "user", "content": "What's the weather like in Singapore?"}]

completion = client.chat.completions.create(
  model="glm-5.1",
  tools=tools,
  messages=messages,
  extra_body={"tool_stream": True},
  stream=True,
  stream_options={"include_usage": True},
)

for chunk in completion:
  if chunk.choices:
    delta = chunk.choices[0].delta
    if hasattr(delta, 'content') and delta.content:
      print(f"[content] {delta.content}")
    if hasattr(delta, 'tool_calls') and delta.tool_calls:
      for tc in delta.tool_calls:
        print(f"[tool_call] id={tc.id}, name={tc.function.name}, args={tc.function.arguments}")
    if chunk.choices[0].finish_reason:
      print(f"[finish_reason] {chunk.choices[0].finish_reason}")
  if not chunk.choices and chunk.usage:
    print(f"[usage] {chunk.usage}")

Other features

ModelMulti-turnFunction callingWeb searchContext cache
glm-5.1✓ (non-thinking mode only)✓ (explicit and implicit)

Parameter defaults

Modelenable_thinkingtemperaturetop_ptop_krepetition_penalty
glm-5.1true1.00.95201.0
GLM - Qwen Cloud