Skip to main content
Text generation

Partial mode

Continue from a prefix

Partial Mode generates content that continues from a given prefix. It ensures that the model's output connects seamlessly with the prefix.

How it works

To use Partial Mode, configure the messages array. In the last message of the array, set the role to assistant and provide the prefix in the content field. You must also set the "partial": true parameter in that message. The messages format is as follows:
[
  {
    "role": "user",
    "content": "Complete this Fibonacci function. Do not add anything else."
  },
  {
    "role": "assistant",
    "content": "def calculate_fibonacci(n):\n    if n <= 1:\n        return n\n    else:\n",
    "partial": true
  }
]
The model then starts generating text from the specified prefix.

Supported models

  • Qwen-Max series qwen3-max, qwen3-max-2026-01-23, qwen3-max-2025-09-23, qwen3-max-preview (non-thinking mode), qwen-max, qwen-max-latest, qwen-max-2025-01-25, and later snapshots
  • Qwen-Plus series (non-thinking mode) qwen3.6-plus, qwen3.5-plus, qwen3.5-plus-2026-02-15 and later snapshots, qwen-plus, qwen-plus-latest, qwen-plus-2025-01-25, and later snapshots
  • Qwen-Flash series (non-thinking mode) qwen3.5-flash, qwen3.5-flash-2026-02-23 and later snapshots, qwen-flash, qwen-flash-2025-07-28
  • Qwen-Coder series qwen3-coder-plus, qwen3-coder-flash, qwen3-coder-480b-a35b-instruct, qwen3-coder-30b-a3b-instruct
  • Qwen-VL series
    • qwen3-vl-plus series (non-thinking mode): qwen3-vl-plus, qwen3-vl-plus-2025-09-23, and later snapshots
    • qwen3-vl-flash series (non-thinking mode): qwen3-vl-flash, qwen3-vl-flash-2025-10-15, and later snapshots
    • qwen-vl-max series: qwen-vl-max, qwen-vl-max-latest, qwen-vl-max-2025-04-08, and later snapshots
    • qwen-vl-plus series: qwen-vl-plus, qwen-vl-plus-latest, qwen-vl-plus-2025-01-25, and later snapshots
  • Qwen-Turbo series (non-thinking mode) qwen-turbo, qwen-turbo-latest, qwen-turbo-2024-11-01, and later snapshots
  • Qwen open source series qwen3.5-397b-a17b, qwen3.5-122b-a10b, qwen3.5-27b, qwen3.5-35b-a3b (non-thinking mode), Qwen3 open source models (non-thinking mode), Qwen2.5 text models, Qwen3-VL open source models (non-thinking mode)
The thinking mode model currently does not support the prefix continuation feature.

Getting started

Prerequisites

Get an API key and set it as an environment variable. To use the SDK, install it. If you are in a sub-workspace, ensure the super administrator has granted model access to your workspace.
The DashScope Java SDK is not supported.

Sample code

The following example uses qwen3-coder-plus to complete a Python function.
  • OpenAI compatible
  • DashScope
import os
from openai import OpenAI

# 1. Initialize the client
client = OpenAI(
  api_key=os.getenv("DASHSCOPE_API_KEY"),
  base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
# 2. Define the code prefix to complete
prefix = """def calculate_fibonacci(n):
  if n <= 1:
    return n
  else:
"""

# 3. Make a Partial Mode request
# Note: The last message in the messages array must have role "assistant" and include "partial": True
completion = client.chat.completions.create(
  model="qwen3-coder-plus",
  messages=[
    {"role": "user", "content": "Complete this Fibonacci function. Do not add anything else."},
    {"role": "assistant", "content": prefix, "partial": True},
  ],
)

# 4. Manually join the prefix and the model's generated content
generated_code = completion.choices[0].message.content
complete_code = prefix + generated_code

print(complete_code)
Response
Output may vary by model version. Any valid Fibonacci implementation is acceptable.
def calculate_fibonacci(n):
  if n <= 1:
    return n
  else:
    return calculate_fibonacci(n-1) + calculate_fibonacci(n-2)
{
  "choices": [
    {
      "message": {
        "content": "        return calculate_fibonacci(n-1) + calculate_fibonacci(n-2)",
        "role": "assistant"
      },
      "finish_reason": "stop",
      "index": 0,
      "logprobs": null
    }
  ],
  "object": "chat.completion",
  "usage": {
    "prompt_tokens": 48,
    "completion_tokens": 19,
    "total_tokens": 67,
    "prompt_tokens_details": {
      "cache_type": "implicit",
      "cached_tokens": 0
    }
  },
  "created": 1756800231,
  "system_fingerprint": null,
  "model": "qwen3-coder-plus",
  "id": "chatcmpl-d103b1cf-4bda-942f-92d6-d7ecabfeeccb"
}

Use cases

Pass images or videos

Qwen-VL models support Partial Mode for requests that include image or video data. This is useful for scenarios such as generating product descriptions, creating social media content, writing news articles, and creative copywriting.
  • OpenAI compatible
  • DashScope
import os
from openai import OpenAI

client = OpenAI(
  api_key=os.getenv("DASHSCOPE_API_KEY"),
  base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
)

completion = client.chat.completions.create(
  model="qwen3-vl-plus",
  messages=[
    {
      "role": "user",
      "content": [
        {
          "type": "image_url",
          "image_url": {
            "url": "https://img.alicdn.com/imgextra/i3/O1CN01zFX2Bs1Q0f9pESgPC_!!6000000001914-2-tps-450-450.png"
          },
        },
        {"type": "text", "text": "I want to post this on social media. Help me write a caption."},
      ],
    },
    {
      "role": "assistant",
      "content": "Today I discovered a hidden-gem café",
      "partial": True,
    },
  ],
)
print(completion.choices[0].message.content)
Response
— the tiramisu here is pure bliss! Every bite delivers perfect harmony between coffee and cream. Pure joy! #FoodShare #Tiramisu #CoffeeTime

Hope you like this caption! Let me know if you need any changes.
{
  "choices": [
    {
      "message": {
        "content": "— the tiramisu here is pure bliss! Every bite delivers perfect harmony between coffee and cream. Pure joy! #FoodShare #Tiramisu #CoffeeTime\n\nHope you like this caption! Let me know if you need any changes.",
        "role": "assistant"
      },
      "finish_reason": "stop",
      "index": 0,
      "logprobs": null
    }
  ],
  "object": "chat.completion",
  "usage": {
    "prompt_tokens": 282,
    "completion_tokens": 56,
    "total_tokens": 338,
    "prompt_tokens_details": {
      "cached_tokens": 0
    }
  },
  "created": 1756802933,
  "system_fingerprint": null,
  "model": "qwen3-vl-plus",
  "id": "chatcmpl-5780cbb7-ebae-9c63-b098-f8cc49e321f0"
}

Continue from incomplete output

If the value of the max_tokens parameter is too small, the Large Language Model (LLM) may return incomplete content. You can use Partial Mode to continue generating from that point and ensure that the output is semantically complete.
  • OpenAI compatible
  • DashScope
import os
from openai import OpenAI

client = OpenAI(
  api_key=os.getenv("DASHSCOPE_API_KEY"),
  base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)

def chat_completion(messages,max_tokens=None):
  response = client.chat.completions.create(
    model="qwen3.6-plus",
    messages=messages,
    max_tokens=max_tokens
  )
  print(f"### Reason generation stopped: {response.choices[0].finish_reason}")

  return response.choices[0].message.content

# Example usage
messages = [{"role": "user", "content": "Write a short sci-fi story"}]

# First call with max_tokens set to 40
first_content = chat_completion(messages, max_tokens=40)
print(first_content)
# Add the first response as an assistant message and set partial=True
messages.append({"role": "assistant", "content": first_content, "partial": True})

# Second call
second_content = chat_completion(messages)
print("### Complete content:")
print(first_content+second_content)
ResponseA length reason indicates that the max_tokens limit was reached. A stop reason indicates that the model finished generating text naturally or encountered a stop word defined in the stop parameter.
### Reason generation stopped: length
**"The End of Memory"**

In the distant future, Earth is no longer fit for human life. The atmosphere is polluted, oceans are dry, and cities lie in ruins. Humans migrated to a habitable planet named "Eden," with blue skies, fresh air, and endless resources.

However, Eden is not a true paradise. It holds no human history, no past, and no memory.

...
**"If we forget who we are, are we still human?"**

— End —

Billing

You are billed for both input tokens and output tokens. The prefix is counted as part of the input tokens.

Error codes

If a call fails, see Error messages.
Partial mode | Qwen Cloud