Natural voices with Qwen3
Supported models
Use an API key when calling the following models:
- Qwen3-TTS-Instruct-Flash: qwen3-tts-instruct-flash (stable version, currently equivalent to qwen3-tts-instruct-flash-2026-01-26), qwen3-tts-instruct-flash-2026-01-26 (latest snapshot)
- Qwen3-TTS-VD: qwen3-tts-vd-2026-01-26 (latest snapshot)
- Qwen3-TTS-VC: qwen3-tts-vc-2026-01-22 (latest snapshot)
- Qwen3-TTS-Flash: qwen3-tts-flash (stable version, currently equivalent to qwen3-tts-flash-2025-11-27), qwen3-tts-flash-2025-11-27, qwen3-tts-flash-2025-09-18
Model selection
| Scenario | Model | Reason |
|---|---|---|
| Voice customization for branding, exclusive voices, or expanding system voices (based on text description) | qwen3-tts-vd-2026-01-26 | Supports voice design to create custom voices from text descriptions without audio samples. Ideal for designing brand-specific voices from scratch. |
| Voice customization for branding, exclusive voices, or expanding system voices (based on audio samples) | qwen3-tts-vc-2026-01-22 | Supports voice cloning to replicate voices from audio samples and create lifelike brand voiceprints with high fidelity. |
| Emotional content production (audiobooks, radio dramas, game/animation dubbing) | qwen3-tts-instruct-flash | Supports instruction control to precisely adjust pitch, speaking rate, emotion, and character personality using natural language. Ideal for scenarios requiring rich expressiveness. |
| Mobile navigation or notification announcements | qwen3-tts-flash | Simple per-character billing. Suitable for short-text, high-frequency scenarios. |
| E-learning course narration | qwen3-tts-flash | Supports multiple languages and dialects for regional teaching needs. |
| Batch audiobook production | qwen3-tts-flash | Cost-effective with rich voice options for expressive content. |
Getting started
Prerequisites
- Get an API key and set it as an environment variable.
- To use the SDK, install it. The Java SDK requires version 2.21.9+. The Python SDK requires version 1.24.6+.
In the DashScope Python SDK, the
SpeechSynthesizer interface has been replaced by MultiModalConversation. To use the new interface, simply replace the name. All other parameters are fully compatible.Use system voice
Use a system voice for speech synthesis.
Non-streaming output
Use the returned url to retrieve the synthesized audio. The URL is valid for 24 hours.
You must import the Gson dependency for Java. If you use Maven or Gradle, add the dependency as follows:
- Maven
- Gradle
Add the following content to
pom.xml:Use cloned voice
Voice cloning does not provide preview audio. Apply the cloned voice to speech synthesis to evaluate the result.
These examples adapt the non-streaming output code, replacing the voice parameter with a cloned voice.
- Key principle: The model used for voice cloning (
target_model) must match the model used for speech synthesis (model). Otherwise, synthesis fails. - This example uses the local audio file
voice.mp3for voice cloning. Replace this path when running the code.
- Maven
- Gradle
Add the following content to your
pom.xml:When using a custom voice generated by voice cloning for speech synthesis, set the voice as follows:
Use designed voice
Voice design returns preview audio. Listen to the preview to confirm it meets your expectations before using it for synthesis to reduce costs.
1
Generate a custom voice and preview the result
If you are satisfied with the result, proceed to the next step. Otherwise, generate it again.You need to import the Gson dependency for Java. If you are using Maven or Gradle, add the dependency as follows:
- Maven
- Gradle
Add the following content to
pom.xml:When using a custom voice generated by voice design for speech synthesis, you must set the voice as follows:
2
Use the custom voice for speech synthesis
Use the custom voice generated in the previous step for non-streaming speech synthesis.This example adapts the non-streaming output code, replacing the
voice parameter with the custom voice generated by voice design. For streaming synthesis, see Getting started.Key principle: The model used for voice design (target_model) must be the same as the model used for subsequent speech synthesis (model). Otherwise, the synthesis will fail.Instruction control
Control pitch, speed, emotion, and timbre using natural language instructions instead of audio parameters.
Supported models: Qwen3-TTS-Instruct-Flash series only.
Usage: Specify instructions in the instructions parameter. Example: "Fast-paced with rising intonation, suitable for fashion products."
Supported languages: Chinese and English only.
Length limit: Maximum 1600 tokens.
Scenarios:
- Audiobook and radio drama voice-overs
- Advertising and promotional video voice-overs
- Game role and animation voice-overs
- Emotional intelligent voice assistants
- Documentary and news broadcasting
- Be specific: Use descriptive words such as "deep," "crisp," or "fast-paced." Avoid vague words such as "nice" or "normal."
- Be multi-dimensional: Combine multiple dimensions such as pitch, speed, and emotion. Single-dimension descriptions such as "high-pitched" are too broad.
- Be objective: Focus on physical and perceptual features, not personal preferences. Use "high-pitched and energetic" instead of "my favorite sound."
- Be original: Describe sound qualities instead of requesting imitation of specific people. The model does not support direct imitation.
- Be concise: Ensure every word serves a purpose. Avoid repetitive synonyms or meaningless intensifiers.
| Dimension | Example |
|---|---|
| Pitch | High, medium, low, high-pitched, low-pitched |
| Speed | Fast, medium, slow, fast-paced, slow-paced |
| Emotion | Cheerful, calm, gentle, serious, lively, composed, soothing |
| Characteristics | Magnetic, crisp, hoarse, mellow, sweet, deep, powerful |
| Usage | News broadcast, ad voice-over, audiobook, animation role, voice assistant, documentary narration |
- Standard broadcast style: Clear and precise articulation, well-rounded pronunciation.
- Progressive emotional effect: Volume rapidly increases from normal conversation to a shout, with a straightforward personality and easily excited, expressive emotions.
- Special emotional state: A sobbing tone causes slightly slurred and hoarse pronunciation, with noticeable tension in the crying voice.
- Ad voice-over style: High-pitched, medium speed, full of energy and appeal, suitable for ad voice-overs.
- Gentle and soothing style: Slow-paced, with a gentle and sweet pitch, and a soothing, warm tone, like a caring friend.
Voice customization
Qwen3-TTS supports both voice cloning (Qwen3-TTS-VC) and voice design (Qwen3-TTS-VD). See Voice cloning (Qwen) and Voice design (Qwen) for the API reference.
API reference
Model comparison
| Features | Qwen3-TTS-Instruct-Flash | Qwen3-TTS-VD | Qwen3-TTS-VC | Qwen3-TTS-Flash | Qwen-TTS |
|---|---|---|---|---|---|
| Supported languages | Varies by voice: Chinese (Mandarin), English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese | Chinese (Mandarin), English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese | Varies by voice: Chinese (Mandarin, Shanghainese, Beijing dialect, Sichuan dialect, Nanjing dialect, Shaanxi dialect, Southern Min, Tianjin dialect), Cantonese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | Varies by voice: Chinese (Mandarin, Shanghainese, Beijing dialect, Sichuan dialect), English | |
| Audio format | wav: for non-streaming output; pcm: for streaming output, Base64-encoded | wav: for non-streaming output; pcm: for streaming output, Base64-encoded | wav: for non-streaming output; pcm: for streaming output, Base64-encoded | wav: for non-streaming output; pcm: for streaming output, Base64-encoded | wav: for non-streaming output; pcm: for streaming output, Base64-encoded |
| Audio sampling rate | 24 kHz | 24 kHz | 24 kHz | 24 kHz | 24 kHz |
| Voice cloning | Not supported | Not supported | Supported | Not supported | Not supported |
| Voice design | Not supported | Supported | Not supported | Not supported | Not supported |
| SSML | Not supported | Not supported | Not supported | Not supported | Not supported |
| LaTeX | Not supported | Not supported | Not supported | Not supported | Not supported |
| Volume control | Supported | Supported | Supported | Not supported | Not supported |
| Speech rate control | Supported | Supported | Supported | Not supported | Not supported |
| Pitch control | Supported | Supported | Supported | Not supported | Not supported |
| Bitrate control | Not supported | Not supported | Not supported | Not supported | Not supported |
| Timestamp | Not supported | Not supported | Not supported | Not supported | Not supported |
| Instruction control (Instruct) | Supported | Not supported | Not supported | Not supported | Not supported |
| Streaming input | Not supported | Not supported | Not supported | Not supported | Not supported |
| Streaming output | Supported | Supported | Supported | Supported | Supported |
| Rate limits | RPM: 180 | RPM: 180 | RPM: 180 | RPM varies by model: qwen3-tts-flash, qwen3-tts-flash-2025-11-27: 180; qwen3-tts-flash-2025-09-18: 10 | RPM: 10; TPM, including input and output tokens: 100,000 |
| Connection type | Java/Python SDK, WebSocket API | Java/Python SDK, WebSocket API | Java/Python SDK, WebSocket API | Java/Python SDK, WebSocket API | Java/Python SDK, WebSocket API |
| Pricing | $0.115 per 10K characters | $0.115 per 10K characters | $0.115 per 10K characters | $0.1 per 10K characters | N/A |
System voices
Supported voices vary by model. Set the voice request parameter to the value in the voice parameter column in the voice list.
| voice parameter | Details | Supported languages | Supported models |
|---|---|---|---|
| Cherry | Voice name: Cherry. A sunny, positive, friendly, and natural young woman (female) | Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | Qwen3-TTS-Instruct-Flash, Qwen3-TTS-Flash, Qwen-TTS |
| Serena | Voice name: Serena. A gentle young woman (female) | Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | Qwen3-TTS-Instruct-Flash, Qwen3-TTS-Flash, Qwen-TTS |
| Ethan | Voice name: Ethan. Standard Mandarin with a slight northern accent. Sunny, warm, energetic, and vibrant (male) | Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | Qwen3-TTS-Instruct-Flash, Qwen3-TTS-Flash, Qwen-TTS |
| Chelsie | Voice name: Chelsie. A two-dimensional virtual girlfriend (female) | Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | Qwen3-TTS-Instruct-Flash, Qwen3-TTS-Flash, Qwen-TTS |
| Momo | Voice name: Momo. Playful and mischievous, cheering you up (female) | Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | Qwen3-TTS-Instruct-Flash, Qwen3-TTS-Flash |
| Vivian | Voice name: Vivian. Confident, cute, and slightly feisty (female) | Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | Qwen3-TTS-Instruct-Flash, Qwen3-TTS-Flash |
| Moon | Voice name: Moon. A bold and handsome man named Yuebai (male) | Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | Qwen3-TTS-Instruct-Flash, Qwen3-TTS-Flash |
| Maia | Voice name: Maia. A blend of intellect and gentleness (female) | Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | Qwen3-TTS-Instruct-Flash, Qwen3-TTS-Flash |
| Kai | Voice name: Kai. A soothing audio spa for your ears (male) | Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | Qwen3-TTS-Instruct-Flash, Qwen3-TTS-Flash |
| Nofish | Voice name: Nofish. A designer who cannot pronounce retroflex sounds (male) | Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | Qwen3-TTS-Instruct-Flash, Qwen3-TTS-Flash |
| Bella | Voice name: Bella. A little girl who drinks but never throws punches when drunk (female) | Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | Qwen3-TTS-Instruct-Flash, Qwen3-TTS-Flash |
| Jennifer | Voice name: Jennifer. A premium, cinematic-quality American English female voice (female) | Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | Qwen3-TTS-Flash |
| Ryan | Voice name: Ryan. Full of rhythm, bursting with dramatic flair, balancing authenticity and tension (male) | Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | Qwen3-TTS-Flash |
| Katerina | Voice name: Katerina. A mature-woman voice with rich, memorable rhythm (female) | Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | Qwen3-TTS-Flash |
| Aiden | Voice name: Aiden. An American English young man skilled in cooking (male) | Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | Qwen3-TTS-Flash |
| Eldric Sage | Voice name: Eldric Sage. A calm and wise elder (male) | Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | Qwen3-TTS-Instruct-Flash, Qwen3-TTS-Flash |
| Mia | Voice name: Mia. Gentle as spring water, obedient as fresh snow (female) | Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | Qwen3-TTS-Instruct-Flash, Qwen3-TTS-Flash |
| Mochi | Voice name: Mochi. A clever, quick-witted young adult (male) | Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | Qwen3-TTS-Instruct-Flash, Qwen3-TTS-Flash |
| Bellona | Voice name: Bellona. A powerful, clear voice that brings characters to life | Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | Qwen3-TTS-Instruct-Flash, Qwen3-TTS-Flash |
| Vincent | Voice name: Vincent. A uniquely raspy, smoky voice (male) | Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | Qwen3-TTS-Instruct-Flash, Qwen3-TTS-Flash |
| Bunny | Voice name: Bunny. A little girl overflowing with "cuteness" (female) | Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | Qwen3-TTS-Instruct-Flash, Qwen3-TTS-Flash |
| Neil | Voice name: Neil. A flat baseline intonation with precise, clear pronunciation (male) | Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | Qwen3-TTS-Instruct-Flash, Qwen3-TTS-Flash |
| Elias | Voice name: Elias. Maintains academic rigor while using storytelling techniques (female) | Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | Qwen3-TTS-Instruct-Flash, Qwen3-TTS-Flash |
| Arthur | Voice name: Arthur. A simple, earthy voice steeped in time (male) | Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | Qwen3-TTS-Instruct-Flash, Qwen3-TTS-Flash |
| Nini | Voice name: Nini. A soft, clingy voice like sweet rice cakes (female) | Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | Qwen3-TTS-Instruct-Flash, Qwen3-TTS-Flash |
| Ebona | Voice name: Ebona. A whisper like a rusty key slowly turning in the darkest corner (female) | Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | Qwen3-TTS-Instruct-Flash, Qwen3-TTS-Flash |
| Seren | Voice name: Seren. A gentle, soothing voice to help you fall asleep faster (female) | Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | Qwen3-TTS-Instruct-Flash, Qwen3-TTS-Flash |
| Pip | Voice name: Pip. A playful, mischievous boy full of childlike wonder (male) | Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | Qwen3-TTS-Instruct-Flash, Qwen3-TTS-Flash |
| Stella | Voice name: Stella. A cloyingly sweet, dazed teenage-girl voice (female) | Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | Qwen3-TTS-Instruct-Flash, Qwen3-TTS-Flash |
| Bodega | Voice name: Bodega. A passionate Spanish man (male) | Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | Qwen3-TTS-Flash |
| Sonrisa | Voice name: Sonrisa. A cheerful, outgoing Latin American woman (female) | Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | Qwen3-TTS-Flash |
| Alek | Voice name: Alek. Cold like the Russian spirit, yet warm like wool coat lining (male) | Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | Qwen3-TTS-Flash |
| Dolce | Voice name: Dolce. A laid-back Italian man (male) | Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | Qwen3-TTS-Flash |
| Sohee | Voice name: Sohee. A warm, cheerful, emotionally expressive Korean unnie (female) | Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | Qwen3-TTS-Flash |
| Ono Anna | Voice name: Ono Anna. A clever, spirited childhood friend (female) | Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | Qwen3-TTS-Flash |
| Lenn | Voice name: Lenn. Rational at heart, rebellious in detail | Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | Qwen3-TTS-Flash |
| Emilien | Voice name: Emilien. A romantic French big brother (male) | Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | Qwen3-TTS-Flash |
| Andre | Voice name: Andre. A magnetic, natural, and steady male voice | Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | Qwen3-TTS-Flash |
| Radio Gol | Voice name: Radio Gol. Football poet (male) | Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | Qwen3-TTS-Flash |
| Jada | Voice name: Shanghai - Jada. A fast-paced, energetic Shanghai auntie (female) | Shanghainese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | Qwen3-TTS-Flash, Qwen-TTS |
| Dylan | Voice name: Beijing - Dylan. A young man raised in Beijing's hutongs (male) | Beijing dialect, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | Qwen3-TTS-Flash, Qwen-TTS |
| Li | Voice name: Nanjing - Li. A patient yoga teacher (male) | Nanjing dialect, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | Qwen3-TTS-Flash |
| Marcus | Voice name: Shaanxi - Marcus. The authentic Shaanxi flavor (male) | Shaanxi dialect, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | Qwen3-TTS-Flash |
| Roy | Voice name: Southern Min - Roy. A humorous, straightforward, lively Taiwanese guy (male) | Southern Min, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | Qwen3-TTS-Flash |
| Peter | Voice name: Tianjin - Peter. Tianjin-style crosstalk, professional foil (male) | Tianjin dialect, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | Qwen3-TTS-Flash |
| Sunny | Voice name: Sichuan - Sunny. A Sichuan girl sweet enough to melt your heart (female) | Sichuan dialect, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | Qwen3-TTS-Flash, Qwen-TTS |
| Eric | Voice name: Sichuan - Eric. A Sichuanese man from Chengdu (male) | Sichuan dialect, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | Qwen3-TTS-Flash |
| Rocky | Voice name: Cantonese - Rocky. A humorous, witty live chatter (male) | Cantonese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | Qwen3-TTS-Flash |
| Kiki | Voice name: Cantonese - Kiki. A sweet Hong Kong girl best friend (female) | Cantonese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | Qwen3-TTS-Flash |
FAQ
Q: How long is the audio file URL valid?
The audio file URL expires after 24 hours.
Learn more
- Real-time speech synthesis (CosyVoice & Qwen-TTS-Realtime) — Real-time streaming speech synthesis with WebSocket
- Voice list