Partial mode - Qwen Cloud

Partial Mode generates content that continues from a given prefix. It ensures that the model's output connects seamlessly with the prefix.

How it works

To use Partial Mode, configure the messages array. In the last message of the array, set the role to assistant and provide the prefix in the content field. You must also set the "partial": true parameter in that message. The messages format is as follows:

[
  {
    "role": "user",
    "content": "Complete this Fibonacci function. Do not add anything else."
  },
  {
    "role": "assistant",
    "content": "def calculate_fibonacci(n):\n    if n <= 1:\n        return n\n    else:\n",
    "partial": true
  }
]

The model then starts generating text from the specified prefix.

Supported models

Qwen-Max series
Qwen-Plus series (non-thinking mode)
Qwen-Flash series (non-thinking mode)
Qwen-Coder series
Qwen-VL series — qwen-vl-max and qwen-vl-plus support thinking mode; qwen3-vl-plus and qwen3-vl-flash are non-thinking mode only
Qwen-Turbo series (non-thinking mode)
Qwen open source series — Qwen3.6 and Qwen3.5 MoE/dense models support thinking mode; Qwen3.6-35B-A3B, Qwen3.5-35B-A3B, Qwen3, and Qwen3-VL open source models are non-thinking mode only

For model IDs and snapshot versions, see Text generation models.

Thinking mode does not support prefix continuation. Use non-thinking mode for all models that support it, or choose a model series that only operates in non-thinking mode.

Getting started

Prerequisites

Get an API key and set it as an environment variable. To use the SDK, install it. If you are in a sub-workspace, ensure the super administrator has granted model access to your workspace.

The DashScope Java SDK is not supported.

Sample code

The following example uses qwen3-coder-plus to complete a Python function.

OpenAI compatible
DashScope

import os
from openai import OpenAI

# 1. Initialize the client
client = OpenAI(
  api_key=os.getenv("DASHSCOPE_API_KEY"),
  base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
# 2. Define the code prefix to complete
prefix = """def calculate_fibonacci(n):
  if n <= 1:
    return n
  else:
"""

# 3. Make a Partial Mode request
# Note: The last message in the messages array must have role "assistant" and include "partial": True
completion = client.chat.completions.create(
  model="qwen3-coder-plus",
  messages=[
    {"role": "user", "content": "Complete this Fibonacci function. Do not add anything else."},
    {"role": "assistant", "content": prefix, "partial": True},
  ],
)

# 4. Manually join the prefix and the model's generated content
generated_code = completion.choices[0].message.content
complete_code = prefix + generated_code

print(complete_code)

Response

Output may vary by model version. Any valid Fibonacci implementation is acceptable.

def calculate_fibonacci(n):
  if n <= 1:
    return n
  else:
    return calculate_fibonacci(n-1) + calculate_fibonacci(n-2)

Full JSON response

{
  "choices": [
    {
      "message": {
        "content": "        return calculate_fibonacci(n-1) + calculate_fibonacci(n-2)",
        "role": "assistant"
      },
      "finish_reason": "stop",
      "index": 0,
      "logprobs": null
    }
  ],
  "object": "chat.completion",
  "usage": {
    "prompt_tokens": 48,
    "completion_tokens": 19,
    "total_tokens": 67,
    "prompt_tokens_details": {
      "cache_type": "implicit",
      "cached_tokens": 0
    }
  },
  "created": 1756800231,
  "system_fingerprint": null,
  "model": "qwen3-coder-plus",
  "id": "chatcmpl-d103b1cf-4bda-942f-92d6-d7ecabfeeccb"
}

import os
import dashscope

dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'

# Define the code prefix to be completed
prefix = """def calculate_fibonacci(n):
  if n <= 1:
    return n
  else:
"""

messages = [
  {
    "role": "user",
    "content": "Complete this Fibonacci function. Do not add any other content."
  },
  {
    "role": "assistant",
    "content": prefix,
    "partial": True
  }
]

response = dashscope.Generation.call(
  api_key=os.getenv("DASHSCOPE_API_KEY"),
  model='qwen3-coder-plus',  # Use a code model
  messages=messages,
  result_format='message',
)

# Manually concatenate the prefix and the content generated by the model
generated_code = response.output.choices[0].message.content
complete_code = prefix + generated_code

print(complete_code)

Response

Output may vary by model version. Any valid Fibonacci implementation is acceptable.

def calculate_fibonacci(n):
  if n <= 1:
    return n
  else:
    return calculate_fibonacci(n-1) + calculate_fibonacci(n-2)

Full JSON response

{
  "output": {
    "choices": [
      {
        "message": {
          "content": "        return calculate_fibonacci(n-1) + calculate_fibonacci(n-2)",
          "role": "assistant"
        },
        "finish_reason": "stop"
      }
    ]
  },
  "usage": {
    "total_tokens": 67,
    "output_tokens": 19,
    "input_tokens": 48,
    "prompt_tokens_details": {
      "cached_tokens": 0
    }
  },
  "request_id": "c61c62e5-cf97-90bc-a4ee-50e5e117b93f"
}

Use cases

Pass images or videos

Qwen-VL models support Partial Mode for requests that include image or video data. This is useful for scenarios such as generating product descriptions, creating social media content, writing news articles, and creative copywriting.

OpenAI compatible
DashScope

import os
from openai import OpenAI

client = OpenAI(
  api_key=os.getenv("DASHSCOPE_API_KEY"),
  base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
)

completion = client.chat.completions.create(
  model="qwen3-vl-plus",
  messages=[
    {
      "role": "user",
      "content": [
        {
          "type": "image_url",
          "image_url": {
            "url": "https://img.alicdn.com/imgextra/i3/O1CN01zFX2Bs1Q0f9pESgPC_!!6000000001914-2-tps-450-450.png"
          },
        },
        {"type": "text", "text": "I want to post this on social media. Help me write a caption."},
      ],
    },
    {
      "role": "assistant",
      "content": "Today I discovered a hidden-gem café",
      "partial": True,
    },
  ],
)
print(completion.choices[0].message.content)

Response

— the tiramisu here is pure bliss! Every bite delivers perfect harmony between coffee and cream. Pure joy! #FoodShare #Tiramisu #CoffeeTime

Hope you like this caption! Let me know if you need any changes.

Full JSON response

{
  "choices": [
    {
      "message": {
        "content": "— the tiramisu here is pure bliss! Every bite delivers perfect harmony between coffee and cream. Pure joy! #FoodShare #Tiramisu #CoffeeTime\n\nHope you like this caption! Let me know if you need any changes.",
        "role": "assistant"
      },
      "finish_reason": "stop",
      "index": 0,
      "logprobs": null
    }
  ],
  "object": "chat.completion",
  "usage": {
    "prompt_tokens": 282,
    "completion_tokens": 56,
    "total_tokens": 338,
    "prompt_tokens_details": {
      "cached_tokens": 0
    }
  },
  "created": 1756802933,
  "system_fingerprint": null,
  "model": "qwen3-vl-plus",
  "id": "chatcmpl-5780cbb7-ebae-9c63-b098-f8cc49e321f0"
}

import os
import dashscope

dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'

messages = [
  {
    "role": "user",
    "content": [
      {
        "image": "https://img.alicdn.com/imgextra/i3/O1CN01zFX2Bs1Q0f9pESgPC_!!6000000001914-2-tps-450-450.png"
      },
      {"text": "I want to post this on social media. Help me write a caption."},
    ],
  },
  {"role": "assistant", "content": "Today I discovered a hidden-gem café", "partial": True},
]

response = dashscope.MultiModalConversation.call(
  api_key=os.getenv("DASHSCOPE_API_KEY"),
  model="qwen3-vl-plus",
  messages=messages
)

print(response.output.choices[0].message.content[0]["text"])

Response

— the tiramisu here is pure bliss! Every bite delivers perfect harmony between coffee and cream. Pure joy! #FoodShare #Tiramisu #CoffeeTime

Hope you like this caption! Let me know if you need any changes.

Full JSON response

{
  "output": {
    "choices": [
      {
        "message": {
          "content": [
            {
              "text": "— the tiramisu here is pure bliss! Every bite delivers perfect harmony between coffee and cream. Pure joy! #FoodShare #Tiramisu #CoffeeTime\n\nHope you like this caption! Let me know if you need any changes."
            }
          ],
          "role": "assistant"
        },
        "finish_reason": "stop"
      }
    ]
  },
  "usage": {
    "total_tokens": 339,
    "input_tokens_details": {
      "image_tokens": 258,
      "text_tokens": 24
    },
    "output_tokens": 57,
    "input_tokens": 282,
    "output_tokens_details": {
      "text_tokens": 57
    },
    "image_tokens": 258
  },
  "request_id": "c741328c-23dc-9286-bfa7-626a4092ca09"
}

Continue from incomplete output

If the value of the max_tokens parameter is too small, the Large Language Model (LLM) may return incomplete content. You can use Partial Mode to continue generating from that point and ensure that the output is semantically complete.

OpenAI compatible
DashScope

import os
from openai import OpenAI

client = OpenAI(
  api_key=os.getenv("DASHSCOPE_API_KEY"),
  base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)

def chat_completion(messages,max_tokens=None):
  response = client.chat.completions.create(
    model="qwen3.7-plus",
    messages=messages,
    max_tokens=max_tokens
  )
  print(f"### Reason generation stopped: {response.choices[0].finish_reason}")

  return response.choices[0].message.content

# Example usage
messages = [{"role": "user", "content": "Write a short sci-fi story"}]

# First call with max_tokens set to 40
first_content = chat_completion(messages, max_tokens=40)
print(first_content)
# Add the first response as an assistant message and set partial=True
messages.append({"role": "assistant", "content": first_content, "partial": True})

# Second call
second_content = chat_completion(messages)
print("### Complete content:")
print(first_content+second_content)

ResponseA length reason indicates that the max_tokens limit was reached. A stop reason indicates that the model finished generating text naturally or encountered a stop word defined in the stop parameter.

### Reason generation stopped: length
**"The End of Memory"**

In the distant future, Earth is no longer fit for human life. The atmosphere is polluted, oceans are dry, and cities lie in ruins. Humans migrated to a habitable planet named "Eden," with blue skies, fresh air, and endless resources.

However, Eden is not a true paradise. It holds no human history, no past, and no memory.

...
**"If we forget who we are, are we still human?"**

— End —

import os
import dashscope
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'

# Use MultiModalConversation for qwen3.7-plus and qwen3.5-plus.
# For text-only models such as qwen-plus and qwen3-max, use dashscope.Generation.call instead.
def chat_completion(messages, max_tokens=None):
  response = dashscope.MultiModalConversation.call(
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    model='qwen3.7-plus',
    messages=messages,
    max_tokens=max_tokens,
  )

  print(f"### Reason generation stopped: {response.output.choices[0].finish_reason}")
  return response.output.choices[0].message.content[0]["text"]

# Example usage
messages = [{"role": "user", "content": [{"text": "Write a short sci-fi story"}]}]

# First call with max_tokens set to 40
first_content = chat_completion(messages, max_tokens=40)
print(first_content)

# Add the first response as an assistant message and set partial=True
messages.append({"role": "assistant", "content": first_content, "partial": True})

# Second call
second_content = chat_completion(messages)
print("### Complete content:")
print(first_content + second_content)

Response

### Reason generation stopped: length
Title: **"Origami Time"**
---

In 2179, humanity finally mastered time travel. But this technology did not rely on massive machines or complex energy fields. It relied on paper.

A single sheet of paper.

It was called "Origami Time," made from an unknown alien material. Scientists could not explain how it worked. They only knew that drawing a scene on the paper and folding it in a specific way opened a door to the past or future.

...

"You are not the key to time. You are just a reminder that our future is always in our hands."

Then I tore it into pieces.

---

**(End)**

Billing

You are billed for both input tokens and output tokens. The prefix is counted as part of the input tokens.

Error codes

If a call fails, see Error messages.

​How it works

​Supported models

​Getting started

​Prerequisites

​Sample code

​Use cases

​Pass images or videos

​Continue from incomplete output

​Billing

​Error codes

How it works

Supported models

Getting started

Prerequisites

Sample code

Use cases

Pass images or videos

Continue from incomplete output

Billing

Error codes