Skip to main content
Qwen-TTS

Speech synthesis

Qwen-TTS API reference

POST
/api/v1/services/aigc/multimodal-generation/generation
# Install the latest version of the DashScope SDK
import os
import dashscope

dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'

text = "Let me recommend a T-shirt to everyone. This one is really super nice. The color is very elegant, and it's also a perfect item to match. Everyone can buy it without hesitation. It's truly beautiful and very forgiving on the figure. No matter what body type you have, it will look great. I recommend everyone to place an order."
# SpeechSynthesizer interface usage: dashscope.audio.qwen_tts.SpeechSynthesizer.call(...)
response = dashscope.MultiModalConversation.call(
  # To use the instruction control feature, replace the model with qwen3-tts-instruct-flash
  model="qwen3-tts-flash",
  # If the environment variable is not configured, replace the following line with your API key: api_key="sk-xxx"
  api_key=os.getenv("DASHSCOPE_API_KEY"),
  text=text,
  voice="Cherry"
  # To use the instruction control feature, uncomment the following line and replace the model with qwen3-tts-instruct-flash
  # instructions='Fast speech rate, with a clear rising intonation, suitable for introducing fashion products.',
  # optimize_instructions=True
)
print(response)
{
  "status_code": 200,
  "request_id": "5c63c65c-cad8-4bf4-959d-xxxxxxxxxxxx",
  "code": "",
  "message": "",
  "output": {
    "text": null,
    "choices": null,
    "finish_reason": "stop",
    "audio": {
      "url": "https://example.oss.aliyuncs.com/audio-result.wav?Expires=1766113409&OSSAccessKeyId=LTAIxxxx&Signature=xxxx",
      "data": "",
      "id": "audio_5c63c65c-cad8-4bf4-959d-xxxxxxxxxxxx",
      "expires_at": 1766113409
    }
  },
  "usage": {
    "input_tokens": 0,
    "output_tokens": 0,
    "total_tokens": 1121,
    "characters": 195,
    "input_tokens_details": {
      "text_tokens": 76
    },
    "output_tokens_details": {
      "audio_tokens": 1045,
      "text_tokens": 0
    }
  }
}
The DashScope Python SDK uses MultiModalConversation instead of SpeechSynthesizer. Usage and parameters are identical.

Authorizations

string
header
required

DashScope API key. Get your API key from Qwen Cloud console.

Header Parameters

enum<string>

Set to enable for streaming output via HTTP. The Python SDK uses the stream parameter instead. The Java SDK uses the streamCall interface instead.

enable

Body

application/json
string
required

The model name.

object
required

The input for speech synthesis.

Response

200-application/json
integer

HTTP status code. Examples: 200 (success), 400 (client error), 401 (unauthorized), 404 (not found), 500 (server error).

200
string

The unique ID of the request. Use it to locate and troubleshoot issues.

5c63c65c-cad8-4bf4-959d-xxxxxxxxxxxx
string

Displays the error code when the request fails. See Error codes.

string

Displays the error message when the request fails. See Error codes.

object

The model's output.

object

Token or character consumption information. Qwen-TTS returns token consumption. Qwen3-TTS-Flash returns character consumption.