Create custom voices from text descriptions for use with Qwen TTS models.
Voice design generates custom voices from text descriptions. After creating a voice, use the returned voice name with Qwen TTS or Realtime streaming TTS.
Voice design uses two models: a design model and a target synthesis model.
A voice description (
Good descriptions:
If a call fails, see Error messages.
Common voice design errors:
The
target_model in voice design must match the model in synthesis. Mismatched models cause failures.How it works
- Write a voice description (
voice_prompt) and preview text (preview_text). - Send a Create voice request with your
target_model. - The API returns a voice name and Base64-encoded preview audio. Decode the Base64 string to get the audio file (WAV format).
- Listen to the preview. If satisfied, use the voice name for synthesis. Otherwise, create a new voice.
Supported models
Voice design uses two models: a design model and a target synthesis model.
| Model | Value | Use with |
|---|---|---|
| Voice design model | qwen-voice-design | All voice design operations (fixed value) |
| Real-time synthesis target | qwen3-tts-vd-realtime-2026-01-15 | Realtime streaming TTS |
| Real-time synthesis target (earlier version) | qwen3-tts-vd-realtime-2025-12-16 | Realtime streaming TTS |
| Non-real-time synthesis target | qwen3-tts-vd-2026-01-26 | Qwen TTS |
Voice design models (
qwen3-tts-vd-*) only support custom-designed voices. They do not support system voices (Chelsie, Serena, Ethan, Cherry).Supported languages
| Code | Language |
|---|---|
zh | Chinese |
en | English |
de | German |
it | Italian |
pt | Portuguese |
es | Spanish |
ja | Japanese |
ko | Korean |
fr | French |
ru | Russian |
voice_prompt supports Chinese and English only. The language parameter must match the preview_text language.
Write effective voice descriptions
A voice description (voice_prompt) tells the model what voice to generate. Combine gender, age, tone, and use case to define a distinctive voice.
Constraints
- Max length: 2,048 characters.
- Languages: Chinese and English only.
Description dimensions
| Dimension | Examples |
|---|---|
| Gender | Male, female, neutral |
| Age | Child (5--12), teenager (13--18), young adult (19--35), middle-aged (36--55), elderly (55+) |
| Pitch | High, medium, low, high-pitched, low-pitched |
| Pace | Fast, medium, slow, fast-paced, slow-paced |
| Emotion | Cheerful, calm, gentle, serious, lively, composed, soothing |
| Characteristics | Magnetic, crisp, hoarse, mellow, sweet, rich, powerful |
| Use case | News broadcast, ad voice-over, audiobook, animation character, voice assistant, documentary narration |
Tips
- Be specific. Use concrete qualities like "deep," "crisp," or "fast-paced." Avoid vague terms like "nice" or "normal."
- Use multiple dimensions. Combine gender, age, emotion, and use case. "Female voice" alone is too broad.
- Be objective. Focus on physical and perceptual features. Write "high-pitched and energetic" instead of "my favorite voice."
- Be original. Describe voice qualities directly. Celebrity imitation is not supported and involves copyright risks.
- Be concise. Every word should serve a purpose. Avoid synonyms and meaningless intensifiers.
Examples
Good descriptions:
- "A young, lively female voice with a fast pace and noticeable upward inflection, suitable for fashion product introductions."
- "A calm, middle-aged male voice with a slow pace and deep, magnetic tone, suitable for news or documentary narration."
- "A cute child's voice, around 8 years old, with a slightly childish tone, suitable for animation character voice-overs."
| Description | Issue | Improvement |
|---|---|---|
| "A nice voice" | Too vague | "A young female voice with a clear vocal line and gentle tone." |
| "A voice like a certain celebrity" | Celebrity imitation not supported | "A mature, magnetic male voice with a calm pace." |
| "A very, very, very nice female voice" | Redundant repetition | "A female voice, 20--24 years old, with a light tone and sweet quality." |
Error codes
If a call fails, see Error messages.
Common voice design errors:
| HTTP status | Error code | Cause | Resolution |
|---|---|---|---|
| 400 | BadRequest.VoiceNotFound | The specified voice does not exist (in voice design or synthesis operations) | Verify the voice name with List voices or Query a voice. If the voice does not exist, create a new voice with Create voice. |
Next steps
- Voice design API reference -- API parameters and response format
- Realtime streaming TTS -- Use custom voices for real-time synthesis
- Qwen TTS -- Use custom voices for non-streaming synthesis
- Get an API key -- Set up authentication