Connect models to external tools
Function calling enables LLMs to use external tools -- such as APIs, databases, or custom functions -- to answer questions they cannot solve on their own. You define tools, the model decides when to call them, and your application executes the calls.
Function calling enables LLMs to use external tools through multi-step interactions between your application and the model.
All general-purpose text generation models support function calling, including third-party models (DeepSeek, Kimi, GLM, MiniMax). For vision, Qwen3-VL and qwen3-omni-flash also support it. See Models for the full list.
This section shows how to use function calling with a weather lookup scenario.
Each tool is a JSON object with the following structure:
By default, the model returns one tool call per response. If the user's request requires multiple independent tool calls -- such as "What's the weather in Beijing and Shanghai?" -- set
The
The
A user might ask "What's the weather in Beijing?" in round one and "What about Shanghai?" in round two. Without round-one context, the model cannot determine which tool to call. In multi-turn conversations, keep the messages array after each round. Then add the new user message and invoke function calling again. The messages structure looks like this:
When streaming with function calling, tool call information arrives in chunks:
When building the assistant message for the second model call, replace the
The Responses API uses a different response format for tool calls. Instead of
Step 2 -- Parse the function_call output item:
The response
Step 3 -- Return the tool result:
After executing the function, pass the result back using a
During the tool information retrieval phase,
Running the code gives this output:
See the Streaming section above for code to aggregate argument chunks.
Deep thinking models reason before generating tool calls. Set
For complete streaming parse code with
In addition to tokens in the messages array, tool descriptions also count as input tokens and are billed as part of the prompt.
If a call fails, see Error messages.
How it works
Function calling enables LLMs to use external tools through multi-step interactions between your application and the model.
- Make the first model call
- Receive the model's tool call instruction (tool name and input parameters)
If no tool is needed, the model returns a natural-language response.
- Run the tool in your application
- Make the second model call
- Receive the final model response
Supported models
All general-purpose text generation models support function calling, including third-party models (DeepSeek, Kimi, GLM, MiniMax). For vision, Qwen3-VL and qwen3-omni-flash also support it. See Models for the full list.
Getting started
This section shows how to use function calling with a weather lookup scenario.
- OpenAI compatible
- DashScope
Tool schema reference
Each tool is a JSON object with the following structure:
| Field | Description |
|---|---|
type | Always "function". |
function.name | The function name. Must match the function your application executes. |
function.description | Describes the tool's purpose. The model uses this to decide when to call the tool. |
function.parameters | A JSON Schema object describing the function's input parameters. Omit if the function takes no input. |
function.parameters.properties | Each key is a parameter name. Values describe the parameter type and purpose. |
function.parameters.required | Array of required parameter names. |
Write clear, specific descriptions. The model relies on
description fields to select the right tool and extract the right parameters.Specify tool calling behavior
Parallel tool calling
By default, the model returns one tool call per response. If the user's request requires multiple independent tool calls -- such as "What's the weather in Beijing and Shanghai?" -- set parallel_tool_calls to true:
tool_calls array then contains multiple entries:
Use parallel tool calling only when tasks are independent. If tasks depend on each other (such as when tool A's input relies on tool B's output), use the while loop from Getting started to call tools serially.
Forced tool calling (tool_choice)
The tool_choice parameter controls whether the model calls a tool. The default is "auto" (model decides).
-
Force a specific tool: Set
tool_choiceto{"type": "function", "function": {"name": "get_current_weather"}}. The model skips tool selection and always calls the specified function. -
Block all tools: Set
tool_choiceto"none". The model replies directly without calling any tool.
Models like
qwen3.6-plus enable thinking mode by default. Thinking mode only supports tool_choice set to "auto" or "none". To force a specific tool, disable thinking mode first by setting enable_thinking to false.
Remove the tool_choice parameter when summarizing tool outputs. Otherwise, the API still returns tool call information instead of a natural-language response.
Multi-turn conversations
A user might ask "What's the weather in Beijing?" in round one and "What about Shanghai?" in round two. Without round-one context, the model cannot determine which tool to call. In multi-turn conversations, keep the messages array after each round. Then add the new user message and invoke function calling again. The messages structure looks like this:
Streaming
For general streaming concepts (SSE protocol, how to enable streaming, billing, and token usage), see Streaming output.
- Tool function name: returned in the first chunk.
- Tool call arguments: returned incrementally across subsequent chunks.
stream=True to your request, then join the chunks:
tool_calls field with the aggregated content.
Responses API
The Responses API uses a different response format for tool calls. Instead of tool_calls on the assistant message, tool calls appear as function_call items in the output array.
Step 1 -- Send tools with your request:
Python
output array contains a function_call item:
function_call_output item:
Python
In the Responses API, tool definitions use a flat structure (
name and parameters at the top level) rather than the nested function wrapper used in Chat Completions. See Responses API for details.Qwen3-Omni-Flash
During the tool information retrieval phase, Qwen3-Omni-Flash differs from other models in two ways:
-
Streaming output is required: Set
stream=Truewhen retrieving tool information. -
Output text only (recommended): Set
modalities=["text"]to avoid unnecessary audio output during tool selection.
See Audio and video file understanding for details on Qwen3-Omni-Flash.
Output
Thinking mode
Deep thinking models reason before generating tool calls. Set enable_thinking=True to see the model's reasoning process. The response includes reasoning_content before tool calls:
- The model outputs
reasoning_contentshowing its analysis of user intent, tool selection, and parameter planning. - The model then outputs the
tool_callsas usual.
reasoning_content field.
With thinking mode enabled, the
tool_choice parameter only supports "auto" (default) or "none".Python
reasoning_content handling, see Thinking mode.
Best practices
- Test tool selection accuracy: Build an evaluation dataset mirroring real scenarios. Track tool selection accuracy, parameter extraction accuracy, and end-to-end success rate.
- Optimize tool descriptions: When the model selects the wrong tool or extracts wrong parameters, refine descriptions and system prompts before upgrading models.
- Keep candidate tool sets small: Limit tools passed to the model to no more than 20. Use a routing layer (semantic search, keyword filtering, or a lightweight LLM router) to pre-filter large tool libraries.
- Apply least-privilege principle: Default to read-only tools. Never give the model direct access to dangerous operations (code execution, file deletion, financial transfers).
- Add human confirmation for write operations: The model can generate action requests, but irreversible actions (sending email, modifying data) should require user confirmation.
- Set timeouts and provide fallback responses: Assign independent timeouts to each step. On failure, return a clear message such as "Sorry, I cannot retrieve that information right now. Please try again later."
- Show progress in the UI: Display messages like "Looking up weather for you..." when starting tool execution.
Pass tool information via system message
Alternative: embed tool info in system prompt
Alternative: embed tool info in system prompt
We recommend using the After running the code, use an XML parser to extract tool call information from between the
tools parameter (shown throughout this guide). If you need direct control over the tool prompt, embed tool definitions in the system message using this template:Python
<tool_call> and </tool_call> tags.