Choose a model for speech synthesis, voice cloning, and voice design.
You have 18 text-to-speech models across four families. Two questions narrow the field.
Built-in voices, no setup required. Choose a model, pick a voice, and start synthesizing.
Create a unique voice by cloning from audio samples or designing one from a text description.
Three approaches, ranked by flexibility:
- Do you need a custom voice, or will a built-in voice work?
- Do you need real-time streaming, or is non-streaming acceptable?
Standard TTS or custom voice?
Standard TTS
Built-in voices, no setup required. Choose a model, pick a voice, and start synthesizing.
| Model | Family | Key strength |
|---|---|---|
qwen3-tts-flash | Qwen3-TTS | Low latency, good quality |
qwen3-tts-flash-2025-11-27 | Qwen3-TTS | Low latency, good quality (snapshot) |
qwen3-tts-flash-2025-09-18 | Qwen3-TTS | Low latency, good quality (snapshot) |
qwen3-tts-flash-realtime | Qwen3-TTS | Realtime streaming, low latency |
qwen3-tts-flash-realtime-2025-11-27 | Qwen3-TTS | Realtime streaming, low latency (snapshot) |
qwen3-tts-flash-realtime-2025-09-18 | Qwen3-TTS | Realtime streaming, low latency (snapshot) |
qwen3-tts-instruct-flash | Qwen3-TTS | Instruction control (speed, emotion, style) |
qwen3-tts-instruct-flash-2026-01-26 | Qwen3-TTS | Instruction control (snapshot) |
qwen3-tts-instruct-flash-realtime | Qwen3-TTS | Realtime streaming with instruction control |
qwen3-tts-instruct-flash-realtime-2026-01-22 | Qwen3-TTS | Realtime streaming with instruction control (snapshot) |
cosyvoice-v3-plus | CosyVoice | High quality, rich voice library |
cosyvoice-v3-flash | CosyVoice | Fast synthesis |
Custom voice
Create a unique voice by cloning from audio samples or designing one from a text description.
| Model | Family | Key strength |
|---|---|---|
qwen3-tts-vc-2026-01-22 | Voice Cloning | Clone a voice from audio samples |
qwen3-tts-vc-realtime-2026-01-15 | Voice Cloning | Real-time voice cloning |
qwen3-tts-vc-realtime-2025-11-27 | Voice Cloning | Real-time voice cloning |
qwen3-tts-vd-2026-01-26 | Voice Design | Design a voice from a text description |
qwen3-tts-vd-realtime-2026-01-15 | Voice Design | Real-time voice design |
qwen3-tts-vd-realtime-2025-12-16 | Voice Design | Real-time voice design |
Cloning vs. design: Voice cloning reproduces a specific voice from audio samples. Voice design creates a new voice from a text description of the desired characteristics (such as "a warm, low-pitched female voice"). Use cloning when you have a target voice; use design when you want to create one from scratch.
Controlling how the voice sounds
Three approaches, ranked by flexibility:
-
Instruction control (
qwen3-tts-instruct-flash,qwen3-tts-instruct-flash-realtime) — Describe the desired delivery in natural language. Control speed, emotion, and style per request. Most flexible. -
Voice design (
qwen3-tts-vd-*) — Generate a custom voice from a text description. Good for creating a brand voice without audio samples. -
Voice cloning (
qwen3-tts-vc-*) — Reproduce an existing voice from audio samples. Best when you need to match a specific person's voice.
Full comparison
| Model | Family | Streaming | Custom voice | Instruction control |
|---|---|---|---|---|
qwen3-tts-flash | Qwen3-TTS | ✓ | ✗ | ✗ |
qwen3-tts-flash-2025-11-27 | Qwen3-TTS | ✓ | ✗ | ✗ |
qwen3-tts-flash-2025-09-18 | Qwen3-TTS | ✓ | ✗ | ✗ |
qwen3-tts-flash-realtime | Qwen3-TTS | ✓ | ✗ | ✗ |
qwen3-tts-flash-realtime-2025-11-27 | Qwen3-TTS | ✓ | ✗ | ✗ |
qwen3-tts-flash-realtime-2025-09-18 | Qwen3-TTS | ✓ | ✗ | ✗ |
qwen3-tts-instruct-flash | Qwen3-TTS | ✓ | ✗ | ✓ |
qwen3-tts-instruct-flash-2026-01-26 | Qwen3-TTS | ✓ | ✗ | ✓ |
qwen3-tts-instruct-flash-realtime | Qwen3-TTS | ✓ | ✗ | ✓ |
qwen3-tts-instruct-flash-realtime-2026-01-22 | Qwen3-TTS | ✓ | ✗ | ✓ |
qwen3-tts-vc-2026-01-22 | Voice Cloning | ✗ | ✓ | ✗ |
qwen3-tts-vc-realtime-2026-01-15 | Voice Cloning | ✓ | ✓ | ✗ |
qwen3-tts-vc-realtime-2025-11-27 | Voice Cloning | ✓ | ✓ | ✗ |
qwen3-tts-vd-2026-01-26 | Voice Design | ✗ | ✓ | ✗ |
qwen3-tts-vd-realtime-2026-01-15 | Voice Design | ✓ | ✓ | ✗ |
qwen3-tts-vd-realtime-2025-12-16 | Voice Design | ✓ | ✓ | ✗ |
cosyvoice-v3-plus | CosyVoice | ✓ | ✗ | ✗ |
cosyvoice-v3-flash | CosyVoice | ✓ | ✗ | ✗ |