Skip to main content
Tool calling

Function calling

Connect models to external tools

Function calling enables LLMs to use external tools -- such as APIs, databases, or custom functions -- to answer questions they cannot solve on their own. You define tools, the model decides when to call them, and your application executes the calls.

How it works

Function calling enables LLMs to use external tools through multi-step interactions between your application and the model.
  1. Make the first model call
Your application sends a request to the LLM. The request includes the user's question and a list of tools the model can call.
  1. Receive the model's tool call instruction (tool name and input parameters)
If the model determines that a tool is needed, it returns a JSON-formatted instruction. This instruction tells your application which function to run and what parameters to pass.
If no tool is needed, the model returns a natural-language response.
  1. Run the tool in your application
After receiving the tool instruction, your application runs the tool and retrieves its output.
  1. Make the second model call
After retrieving the tool's output, add it to the model's context (messages), then call the model again.
  1. Receive the final model response
The model combines the tool's output with the user's question to generate a natural-language response. The following diagram illustrates the workflow:
image

Supported models

All general-purpose text generation models support function calling, including third-party models (DeepSeek, Kimi, GLM, MiniMax). For vision, Qwen3-VL and qwen3-omni-flash also support it. See Models for the full list.

Getting started

This section shows how to use function calling with a weather lookup scenario.
  • OpenAI compatible
  • DashScope
from openai import OpenAI
from datetime import datetime
import json
import os
import random

client = OpenAI(
  api_key=os.getenv("DASHSCOPE_API_KEY"),
  base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
# Simulate user question
USER_QUESTION = "What's the weather in Singapore?"
# Define tool list
tools = [
  {
    "type": "function",
    "function": {
      "name": "get_current_weather",
      "description": "Useful when you want to check the weather for a specific city.",
      "parameters": {
        "type": "object",
        "properties": {
          "location": {
            "type": "string",
            "description": "City or county, such as Singapore or New York.",
          }
        },
        "required": ["location"],
      },
    },
  },
]


# Simulate weather lookup tool
def get_current_weather(arguments):
  weather_conditions = ["sunny", "cloudy", "rainy"]
  random_weather = random.choice(weather_conditions)
  location = arguments["location"]
  return f"{location} is {random_weather} today."


# Wrap model response function
def get_response(messages):
  completion = client.chat.completions.create(
    model="qwen3.6-plus",
    messages=messages,
    tools=tools,
  )
  return completion


messages = [{"role": "user", "content": USER_QUESTION}]
response = get_response(messages)
assistant_output = response.choices[0].message
if assistant_output.content is None:
  assistant_output.content = ""
messages.append(assistant_output)
# If no tool is needed, output content directly
if assistant_output.tool_calls is None:
  print(f"No weather tool call needed. Direct reply: {assistant_output.content}")
else:
  # Enter tool calling loop
  while assistant_output.tool_calls is not None:
    tool_call = assistant_output.tool_calls[0]
    tool_call_id = tool_call.id
    func_name = tool_call.function.name
    arguments = json.loads(tool_call.function.arguments)
    print(f"Calling tool [{func_name}], parameters: {arguments}")
    # Execute tool
    tool_result = get_current_weather(arguments)
    # Build tool response message
    tool_message = {
      "role": "tool",
      "tool_call_id": tool_call_id,
      "content": tool_result,
    }
    print(f"Tool returned: {tool_message['content']}")
    messages.append(tool_message)
    # Call model again to get summarized natural-language reply
    response = get_response(messages)
    assistant_output = response.choices[0].message
    if assistant_output.content is None:
      assistant_output.content = ""
    messages.append(assistant_output)
  print(f"Final assistant reply: {assistant_output.content}")

Tool schema reference

Each tool is a JSON object with the following structure:
{
  "type": "function",
  "function": {
    "name": "get_current_weather",
    "description": "Useful when you want to check the weather for a specific city.",
    "parameters": {
      "type": "object",
      "properties": {
        "location": {
          "type": "string",
          "description": "City or county, such as Singapore or New York."
        }
      },
      "required": ["location"]
    }
  }
}
FieldDescription
typeAlways "function".
function.nameThe function name. Must match the function your application executes.
function.descriptionDescribes the tool's purpose. The model uses this to decide when to call the tool.
function.parametersA JSON Schema object describing the function's input parameters. Omit if the function takes no input.
function.parameters.propertiesEach key is a parameter name. Values describe the parameter type and purpose.
function.parameters.requiredArray of required parameter names.
Write clear, specific descriptions. The model relies on description fields to select the right tool and extract the right parameters.

Specify tool calling behavior

Parallel tool calling

By default, the model returns one tool call per response. If the user's request requires multiple independent tool calls -- such as "What's the weather in Beijing and Shanghai?" -- set parallel_tool_calls to true:
completion = client.chat.completions.create(
  model="qwen3.6-plus",
  messages=messages,
  tools=tools,
  parallel_tool_calls=True
)
The tool_calls array then contains multiple entries:
{
  "role": "assistant",
  "tool_calls": [
    {
      "function": { "name": "get_current_weather", "arguments": "{\"location\": \"Beijing\"}" },
      "index": 0, "id": "call_c2d8a3a2...", "type": "function"
    },
    {
      "function": { "name": "get_current_weather", "arguments": "{\"location\": \"Shanghai\"}" },
      "index": 1, "id": "call_dc7f2f67...", "type": "function"
    }
  ]
}
Use parallel tool calling only when tasks are independent. If tasks depend on each other (such as when tool A's input relies on tool B's output), use the while loop from Getting started to call tools serially.

Forced tool calling (tool_choice)

The tool_choice parameter controls whether the model calls a tool. The default is "auto" (model decides).
  • Force a specific tool: Set tool_choice to {"type": "function", "function": {"name": "get_current_weather"}}. The model skips tool selection and always calls the specified function.
  • Block all tools: Set tool_choice to "none". The model replies directly without calling any tool.
# Force a specific tool
completion = client.chat.completions.create(
  model="qwen3.6-plus",
  messages=messages,
  tools=tools,
  tool_choice={"type": "function", "function": {"name": "get_current_weather"}},
  extra_body={"enable_thinking": False}
)

# Block all tools
completion = client.chat.completions.create(
  model="qwen3.6-plus",
  messages=messages,
  tools=tools,
  tool_choice="none"
)
Models like qwen3.6-plus enable thinking mode by default. Thinking mode only supports tool_choice set to "auto" or "none". To force a specific tool, disable thinking mode first by setting enable_thinking to false.
Remove the tool_choice parameter when summarizing tool outputs. Otherwise, the API still returns tool call information instead of a natural-language response.

Multi-turn conversations

A user might ask "What's the weather in Beijing?" in round one and "What about Shanghai?" in round two. Without round-one context, the model cannot determine which tool to call. In multi-turn conversations, keep the messages array after each round. Then add the new user message and invoke function calling again. The messages structure looks like this:
[
  "System message -- Strategy guiding the model to call tools",
  "User message -- User's question",
  "Assistant message -- Tool call information returned by the model",
  "Tool message -- Tool output",
  "Assistant message -- Model's summary of tool call information",
  "User message -- User's second question"
]

Streaming

For general streaming concepts (SSE protocol, how to enable streaming, billing, and token usage), see Streaming output.
When streaming with function calling, tool call information arrives in chunks:
  • Tool function name: returned in the first chunk.
  • Tool call arguments: returned incrementally across subsequent chunks.
You must aggregate the argument deltas before parsing JSON. Add stream=True to your request, then join the chunks:
tool_calls = {}
for response_chunk in stream:
  delta_tool_calls = response_chunk.choices[0].delta.tool_calls
  if delta_tool_calls:
    for tool_call_chunk in delta_tool_calls:
      call_index = tool_call_chunk.index
      tool_call_chunk.function.arguments = tool_call_chunk.function.arguments or ""
      if call_index not in tool_calls:
        tool_calls[call_index] = tool_call_chunk
      else:
        tool_calls[call_index].function.arguments += tool_call_chunk.function.arguments
print(tool_calls[0].model_dump_json())
When building the assistant message for the second model call, replace the tool_calls field with the aggregated content.

Responses API

The Responses API uses a different response format for tool calls. Instead of tool_calls on the assistant message, tool calls appear as function_call items in the output array. Step 1 -- Send tools with your request:
Python
from openai import OpenAI
import os

client = OpenAI(
  api_key=os.getenv("DASHSCOPE_API_KEY"),
  base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)

tools = [
  {
    "type": "function",
    "name": "get_current_weather",
    "description": "Useful when you want to check the weather for a specific city.",
    "parameters": {
      "type": "object",
      "properties": {
        "location": {
          "type": "string",
          "description": "City or county, such as Singapore or New York.",
        }
      },
      "required": ["location"],
    },
  },
]

response = client.responses.create(
  model="qwen3.6-plus",
  tools=tools,
  input="What's the weather in Singapore?",
)
Step 2 -- Parse the function_call output item: The response output array contains a function_call item:
{
  "type": "function_call",
  "id": "fc_12345",
  "call_id": "call_xxx",
  "name": "get_current_weather",
  "arguments": "{\"location\": \"Singapore\"}"
}
Step 3 -- Return the tool result: After executing the function, pass the result back using a function_call_output item:
Python
tool_result = get_current_weather({"location": "Singapore"})
response = client.responses.create(
  model="qwen3.6-plus",
  tools=tools,
  input=[
    {"type": "message", "role": "user", "content": "What's the weather in Singapore?"},
    response.output[0],  # The function_call item
    {
      "type": "function_call_output",
      "call_id": response.output[0].call_id,
      "output": tool_result,
    },
  ],
)
print(response.output_text)
In the Responses API, tool definitions use a flat structure (name and parameters at the top level) rather than the nested function wrapper used in Chat Completions. See Responses API for details.

Qwen3-Omni-Flash

During the tool information retrieval phase, Qwen3-Omni-Flash differs from other models in two ways:
  • Streaming output is required: Set stream=True when retrieving tool information.
  • Output text only (recommended): Set modalities=["text"] to avoid unnecessary audio output during tool selection.
See Audio and video file understanding for details on Qwen3-Omni-Flash.
from openai import OpenAI
import os

client = OpenAI(
  api_key=os.getenv("DASHSCOPE_API_KEY"),
  base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
tools = [
  {
    "type": "function",
    "function": {
      "name": "get_current_weather",
      "description": "Useful when you want to check the weather for a specific city.",
      "parameters": {
        "type": "object",
        "properties": {
          "location": {
            "type": "string",
            "description": "City or county, such as Singapore or New York.",
          }
        },
        "required": ["location"],
      },
    },
  },
]

completion = client.chat.completions.create(
  model="qwen3-omni-flash",
  messages=[{"role": "user", "content": "What's the weather in Singapore?"}],
  modalities=["text"],
  stream=True,
  tools=tools
)

for chunk in completion:
  if chunk.choices:
    delta = chunk.choices[0].delta
    print(delta.tool_calls)
Running the code gives this output:
Output
[ChoiceDeltaToolCall(index=0, id='call_391c8e5787bc4972a388aa', function=ChoiceDeltaToolCallFunction(arguments=None, name='get_current_weather'), type='function')]
[ChoiceDeltaToolCall(index=0, id='call_391c8e5787bc4972a388aa', function=ChoiceDeltaToolCallFunction(arguments=' {"location": "Singapore"}', name=None), type='function')]
None
See the Streaming section above for code to aggregate argument chunks.

Thinking mode

Deep thinking models reason before generating tool calls. Set enable_thinking=True to see the model's reasoning process. The response includes reasoning_content before tool calls:
  1. The model outputs reasoning_content showing its analysis of user intent, tool selection, and parameter planning.
  2. The model then outputs the tool_calls as usual.
When passing the assistant message back in subsequent requests, you must include the reasoning_content field.
With thinking mode enabled, the tool_choice parameter only supports "auto" (default) or "none".
Python
completion = client.chat.completions.create(
  model="qwen3.6-plus",  # Use a thinking-capable model
  messages=messages,
  tools=tools,
  extra_body={"enable_thinking": True},
  stream=True,
)
For complete streaming parse code with reasoning_content handling, see Thinking mode.

Best practices

  • Test tool selection accuracy: Build an evaluation dataset mirroring real scenarios. Track tool selection accuracy, parameter extraction accuracy, and end-to-end success rate.
  • Optimize tool descriptions: When the model selects the wrong tool or extracts wrong parameters, refine descriptions and system prompts before upgrading models.
  • Keep candidate tool sets small: Limit tools passed to the model to no more than 20. Use a routing layer (semantic search, keyword filtering, or a lightweight LLM router) to pre-filter large tool libraries.
  • Apply least-privilege principle: Default to read-only tools. Never give the model direct access to dangerous operations (code execution, file deletion, financial transfers).
  • Add human confirmation for write operations: The model can generate action requests, but irreversible actions (sending email, modifying data) should require user confirmation.
  • Set timeouts and provide fallback responses: Assign independent timeouts to each step. On failure, return a clear message such as "Sorry, I cannot retrieve that information right now. Please try again later."
  • Show progress in the UI: Display messages like "Looking up weather for you..." when starting tool execution.

Pass tool information via system message

We recommend using the tools parameter (shown throughout this guide). If you need direct control over the tool prompt, embed tool definitions in the system message using this template:
Python
import os
from openai import OpenAI
import json

client = OpenAI(
  api_key=os.getenv("DASHSCOPE_API_KEY"),
  base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)

tools = [
  {
    "type": "function",
    "function": {
      "name": "get_current_time",
      "description": "Useful when you want to know the current time.",
      "parameters": {}
    }
  },
  {
    "type": "function",
    "function": {
      "name": "get_current_weather",
      "description": "Useful when you want to check the weather for a specific city.",
      "parameters": {
        "type": "object",
        "properties": {
          "location": {
            "type": "string",
            "description": "City or county, such as Singapore or New York."
          }
        },
        "required": ["location"]
      }
    }
  }
]

custom_prompt = "You are a helpful assistant."
tools_content = "\n".join(json.dumps(tool, ensure_ascii=False) for tool in tools)

system_prompt = f"""{custom_prompt}

# Tools

You may call one or more functions to assist with the user query.

You are provided with function signatures within <tools></tools> XML tags:
<tools>
{tools_content}
</tools>

For each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:
<tool_call>
{{"name": <function-name>, "arguments": <args-json-object>}}
</tool_call>"""

messages = [
  {"role": "system", "content": system_prompt},
  {"role": "user", "content": "What time is it?"}
]

completion = client.chat.completions.create(
  model="qwen3.6-plus",
  messages=messages,
)
print(completion.choices[0].message.content)
After running the code, use an XML parser to extract tool call information from between the <tool_call> and </tool_call> tags.

Billing

In addition to tokens in the messages array, tool descriptions also count as input tokens and are billed as part of the prompt.

Error codes

If a call fails, see Error messages.
Function calling | Qwen Cloud