Continue from a prefix
Partial Mode generates content that continues from a given prefix. It ensures that the model's output connects seamlessly with the prefix.
To use Partial Mode, configure the
The model then starts generating text from the specified prefix.
Get an API key and set it as an environment variable. To use the SDK, install it. If you are in a sub-workspace, ensure the super administrator has granted model access to your workspace.
The following example uses Response
Qwen-VL models support Partial Mode for requests that include image or video data. This is useful for scenarios such as generating product descriptions, creating social media content, writing news articles, and creative copywriting.
Response
If the value of the ResponseA
You are billed for both input tokens and output tokens. The prefix is counted as part of the input tokens.
If a call fails, see Error messages.
How it works
To use Partial Mode, configure the messages array. In the last message of the array, set the role to assistant and provide the prefix in the content field. You must also set the "partial": true parameter in that message. The messages format is as follows:
Supported models
- Qwen-Max series qwen3-max, qwen3-max-2026-01-23, qwen3-max-2025-09-23, qwen3-max-preview (non-thinking mode), qwen-max, qwen-max-latest, qwen-max-2025-01-25, and later snapshots
- Qwen-Plus series (non-thinking mode) qwen3.6-plus, qwen3.5-plus, qwen3.5-plus-2026-02-15 and later snapshots, qwen-plus, qwen-plus-latest, qwen-plus-2025-01-25, and later snapshots
- Qwen-Flash series (non-thinking mode) qwen3.5-flash, qwen3.5-flash-2026-02-23 and later snapshots, qwen-flash, qwen-flash-2025-07-28
- Qwen-Coder series qwen3-coder-plus, qwen3-coder-flash, qwen3-coder-480b-a35b-instruct, qwen3-coder-30b-a3b-instruct
-
Qwen-VL series
- qwen3-vl-plus series (non-thinking mode): qwen3-vl-plus, qwen3-vl-plus-2025-09-23, and later snapshots
- qwen3-vl-flash series (non-thinking mode): qwen3-vl-flash, qwen3-vl-flash-2025-10-15, and later snapshots
- qwen-vl-max series: qwen-vl-max, qwen-vl-max-latest, qwen-vl-max-2025-04-08, and later snapshots
- qwen-vl-plus series: qwen-vl-plus, qwen-vl-plus-latest, qwen-vl-plus-2025-01-25, and later snapshots
- Qwen-Turbo series (non-thinking mode) qwen-turbo, qwen-turbo-latest, qwen-turbo-2024-11-01, and later snapshots
- Qwen open source series qwen3.5-397b-a17b, qwen3.5-122b-a10b, qwen3.5-27b, qwen3.5-35b-a3b (non-thinking mode), Qwen3 open source models (non-thinking mode), Qwen2.5 text models, Qwen3-VL open source models (non-thinking mode)
The thinking mode model currently does not support the prefix continuation feature.
Getting started
Prerequisites
Get an API key and set it as an environment variable. To use the SDK, install it. If you are in a sub-workspace, ensure the super administrator has granted model access to your workspace.
The DashScope Java SDK is not supported.
Sample code
The following example uses qwen3-coder-plus to complete a Python function.
- OpenAI compatible
- DashScope
Output may vary by model version. Any valid Fibonacci implementation is acceptable.
Full JSON response
Full JSON response
Use cases
Pass images or videos
Qwen-VL models support Partial Mode for requests that include image or video data. This is useful for scenarios such as generating product descriptions, creating social media content, writing news articles, and creative copywriting.
- OpenAI compatible
- DashScope
Full JSON response
Full JSON response
Continue from incomplete output
If the value of the max_tokens parameter is too small, the Large Language Model (LLM) may return incomplete content. You can use Partial Mode to continue generating from that point and ensure that the output is semantically complete.
- OpenAI compatible
- DashScope
length reason indicates that the max_tokens limit was reached. A stop reason indicates that the model finished generating text naturally or encountered a stop word defined in the stop parameter.