Skip to main content
Text generation

Generate text

Make your first text generation call

Text generation models take natural language input and generate text for tasks such as question answering, writing, summarization, translation, and structured output generation.

Request structure

Text generation requests are typically sent as a messages array. Each message includes a role and content.
  • System message: Provides high-level instructions or sets the model's behavior.
  • User message: Contains the user's input or task.
  • Assistant message: Contains the model's response.
A typical request includes a user message and can optionally include a system message for more stable or more controllable output.
The system message is optional, but recommended when you want more consistent behavior.
[
  {"role": "system", "content": "You are a helpful assistant. Answer clearly and concisely."},
  {"role": "user", "content": "Summarize the benefits of solar energy in three bullet points."}
]
The model returns its reply as an assistant message.
{
  "role": "assistant",
  "content": "- Reduces reliance on fossil fuels.\n- Lowers long-term electricity costs.\n- Produces electricity with minimal operating emissions."
}

Make your first call

Before you begin, get an API key, set it as an environment variable, and, if needed, install the OpenAI or DashScope SDK. Choose the API style that matches your stack:
  • Start with OpenAI Compatible -- Responses API for new integrations.
  • Use OpenAI Compatible -- Chat Completions API if you are migrating existing OpenAI-compatible code.
  • Use DashScope if you prefer the native SDK.
  • OpenAI Compatible -- Responses API
  • OpenAI Compatible -- Chat Completions API
  • DashScope
  • DashScope -- Qwen3.5/3.6
For usage notes, code examples, and migration guidance, see OpenAI compatible - Responses.
import os
from openai import OpenAI

try:
  client = OpenAI(
    # If you have not set an environment variable, replace the next line with your API key: api_key="sk-xxx",
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    base_url="https://dashscope-intl.aliyuncs.com/api/v2/apps/protocols/compatible-mode/v1",
  )

  response = client.responses.create(
    model="qwen3.6-plus",
    input="Summarize the benefits of solar energy in three bullet points."
  )

  print(response)
except Exception as e:
  print(f"Error message: {e}")
ResponseThe response includes these main fields:
  • id: The response ID.
  • output: The output list. Includes reasoning and message.
    The reasoning field appears only when deep thinking is enabled (for example, it is enabled by default in the Qwen3.5 and Qwen3.6 series).
  • usage: Token usage statistics.
Example text output:
- Reduces reliance on fossil fuels.
- Lowers long-term electricity costs.
- Produces electricity with minimal operating emissions.
{
  "created_at": 1772249518,
  "id": "7ad48c6b-3cc4-904f-9284-5f419c6c5xxx",
  "model": "qwen3.6-plus",
  "object": "response",
  "output": [
    {
      "id": "msg_94805179-2801-45da-ac1c-a87e8ea20xxx",
      "summary": [
        {
          "text": "The user wants a concise answer in exactly three bullet points. Focus on the most broadly useful benefits of solar energy: reduced reliance on fossil fuels, long-term cost savings, and lower operating emissions. Keep the wording simple and direct.\n",
          "type": "summary_text"
        }
      ],
      "type": "reasoning"
    },
    {
      "content": [
        {
          "annotations": [],
          "text": "- Reduces reliance on fossil fuels.\n- Lowers long-term electricity costs.\n- Produces electricity with minimal operating emissions.",
          "type": "output_text"
        }
      ],
      "id": "msg_35be06c6-ca4d-4f2b-9677-7897e488dxxx",
      "role": "assistant",
      "status": "completed",
      "type": "message"
    }
  ],
  "parallel_tool_calls": false,
  "status": "completed",
  "tool_choice": "auto",
  "tools": [],
  "usage": {
    "input_tokens": 54,
    "input_tokens_details": {
      "cached_tokens": 0
    },
    "output_tokens": 662,
    "output_tokens_details": {
      "reasoning_tokens": 447
    },
    "total_tokens": 716,
    "x_details": [
      {
        "input_tokens": 54,
        "output_tokens": 662,
        "output_tokens_details": {
          "reasoning_tokens": 447
        },
        "total_tokens": 716,
        "x_billing_type": "response_api"
      }
    ]
  }
}

Handle requests asynchronously

Once a basic synchronous request is working, asynchronous calls can improve throughput for high-concurrency workloads.
  • OpenAI Compatible -- Chat Completions API
  • DashScope
import os
import asyncio
from openai import AsyncOpenAI
import platform

# Create an asynchronous client instance
client = AsyncOpenAI(
  # If you have not set an environment variable, replace the line below with: api_key="sk-xxx",
  api_key=os.getenv("DASHSCOPE_API_KEY"),
  base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
)

# Define an asynchronous task list
async def task(question):
  print(f"Send question: {question}")
  response = await client.chat.completions.create(
    messages=[
      {"role": "user", "content": question}
    ],
    model="qwen3.6-plus",
  )
  print(f"Model response: {response.choices[0].message.content}")

# Main asynchronous function
async def main():
  questions = [
    "Summarize the benefits of solar energy in three bullet points.",
    "Write a subject line for a product launch email.",
    "Translate \"Welcome to our platform\" into Spanish."
  ]
  tasks = [task(q) for q in questions]
  await asyncio.gather(*tasks)

if __name__ == '__main__':
  # Set event loop policy
  if platform.system() == 'Windows':
    asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())
  # Run the main coroutine
  asyncio.run(main(), debug=False)

Response
Because the call is asynchronous, the order of the responses may differ from the example.
Send question: Summarize the benefits of solar energy in three bullet points.
Send question: Write a subject line for a product launch email.
Send question: Translate "Welcome to our platform" into Spanish.
Model response: - Reduces reliance on fossil fuels.
- Lowers long-term electricity costs.
- Produces electricity with minimal operating emissions.
Model response: Meet our newest product launch
Model response: Bienvenido a nuestra plataforma.

Going live

Build better context

Feeding raw data directly to a large language model increases costs and reduces quality because of context-length limits. Context engineering improves output quality and efficiency by dynamically loading precise knowledge. Core techniques:
  • Prompt engineering: Design and refine text instructions (prompts) to guide the model toward the desired outputs. For more information, see Text-to-text prompt guide.
  • Retrieval-augmented generation (RAG): Use this technique when the model must answer questions using external knowledge bases, such as product documentation or technical manuals.
  • Tool calling: Allows the model to fetch real-time data, such as weather or traffic, or perform actions, such as calling an API or sending an email.
  • Memory mechanisms: Provide the model with short-term and long-term memory to understand conversation history.

Tune generation behavior

The temperature and top_p parameters control the diversity of the generated text. Higher values increase diversity, and lower values increase predictability. To assess the effects of these parameters, adjust only one at a time.
  • temperature: A value in the range of [0, 2) that adjusts randomness.
  • top_p: A value in the range of (0, 1.0] that filters responses by a probability threshold.
The following examples show how different settings affect the output. The prompt is: "Write a three-sentence story starring a cat and a sunbeam."
  1. High diversity (for example, temperature=0.9): Use this setting for creative writing, brainstorming, or marketing copy where novelty and imagination are important.
Sunlight sliced through the window, and the ginger cat crept toward the glowing square, its fur instantly gilded like molten honey.
It tapped the light with a paw, sinking into warmth as if stepping into a sunlit pool, and the golden tide flowed up its spine.
The afternoon grew heavy—the cat curled in liquid gold, hearing time melt softly in its purr.
  1. High predictability (for example, temperature=0.1): Use this setting for factual Q&A, code generation, or legal text where accuracy and consistency are critical.
An old cat napped on the windowsill, counting sunbeams.
The sunlight hopped across its mottled back like pages turning in an old photo album.
Dust rose and settled, whispering: you were young once, and I burned bright.
temperature:
  • A higher temperature flattens the token probability distribution. This makes high-probability tokens less likely and low-probability tokens more likely, which causes the model to be more random when it chooses the next token.
  • A lower temperature sharpens the token probability distribution. This makes high-probability tokens even more likely and low-probability tokens less likely, which causes the model to favor high-probability tokens.
top_p:Top-p sampling selects from the smallest set of top tokens whose cumulative probability exceeds a specified threshold, such as 0.8. This method sorts all possible next tokens by probability and then accumulates the probabilities from highest to lowest until the sum reaches the threshold. The model then randomly selects one token from this set.
  • A higher top_p value considers more tokens, which increases diversity.
  • A lower top_p value considers fewer tokens, which increases focus and predictability.
# Recommended parameter settings for different scenarios
SCENARIO_CONFIGS = {
  # Creative writing
  "creative_writing": {
    "temperature": 0.9,
    "top_p": 0.95
  },
  # Code generation
  "code_generation": {
    "temperature": 0.2,
    "top_p": 0.8
  },
  # Factual Q&A
  "factual_qa": {
    "temperature": 0.1,
    "top_p": 0.7
  },
  # Translation
  "translation": {
    "temperature": 0.3,
    "top_p": 0.8
  }
}

# OpenAI usage example
# completion = client.chat.completions.create(
#     model="qwen3.6-plus",
#     messages=[{"role": "user", "content": "Write a poem about the moon"}],
#     **SCENARIO_CONFIGS["creative_writing"]
# )
# DashScope usage example
# response = Generation.call(
#     # If you have not set an environment variable, replace the line below with: api_key = "sk-xxx",
#     api_key=os.getenv("DASHSCOPE_API_KEY"),
#     model="qwen-plus",
#     messages=[{"role": "user", "content": "Write a Python function that checks if input n is prime. Output only code."}],
#     result_format="message",
#     **SCENARIO_CONFIGS["code_generation"]
# )

Explore more text generation features

For complex scenarios:
  • Multi-turn conversations: Use this feature for follow-up questions or information gathering that requires continuous dialogue.
  • Streaming output: Use this feature for chatbots or real-time code generation to improve the user experience and avoid timeouts caused by long responses.
  • Deep thinking: Use this feature for complex reasoning or policy analysis that requires high-quality, structured answers.
  • Structured output: Use this feature when you need the model to reply in a stable JSON format for programmatic use or data parsing.
  • Partial mode: Use this feature for code completion or long-form writing where the model continues from existing text.

Reference

For a complete list of model invocation parameters, see OpenAI Compatible API Reference and DashScope API Reference.

FAQ

Why can't the Qwen API analyze web links? The Qwen API cannot directly access or parse web links. You can use Function calling, or combine them with web scraping tools such as Python's Beautiful Soup to read webpage content. Why do Qwen web app and API responses differ? The Qwen web app includes additional engineering optimizations beyond the Qwen API, enabling features such as webpage parsing, web search, image drawing, and PPT creation. These capabilities are not part of the core large language model API. You can replicate them using Function calling to enhance model performance. Can the model directly generate Word, Excel, PDF, or PPT files? No, they cannot. Qwen Cloud text generation models output only plain text. You can convert the text to your desired format using code or third-party libraries.