Skip to main content
Text-to-speech

Text-to-speech models

Choose a model for speech synthesis, voice cloning, and voice design.

Two questions narrow the field: do you need a custom voice or will a built-in voice work, and do you need real-time streaming?

Built-in or custom voice?

Built-in voices

Pick a voice from the library and start synthesizing immediately.
  • CosyVoice — rich voice library, high quality, no setup beyond picking a voice
  • Qwen3-TTS — low-latency streaming; add -instruct for natural-language control over speed, emotion, and style

Custom voice

Need a voice that doesn't exist in the library?
  • Voice Cloning — reproduce a specific person's voice from audio samples. Use when you have a target voice to match.
  • Voice Design — create a new voice from a text description (e.g., "a warm, low-pitched female voice"). Use when you want a brand voice without audio samples.

Controlling how the voice sounds

Three approaches, ranked by flexibility:
  1. Instruction control (qwen3-tts-instruct-flash, qwen3-tts-instruct-flash-realtime) — Describe the desired delivery in natural language. Control speed, emotion, and style per request. Most flexible.
  2. Voice design (qwen3-tts-vd-*) — Generate a custom voice from a text description. Good for creating a brand voice without audio samples.
  3. Voice cloning (qwen3-tts-vc-*) — Reproduce an existing voice from audio samples. Best when you need to match a specific person's voice.
ModelFamilyStreamingCustom voiceInstruction control
cosyvoice-v3-plusCosyVoice
qwen3-tts-flashQwen3-TTS
qwen3-tts-flash-realtimeQwen3-TTS
qwen3-tts-instruct-flashQwen3-TTS
qwen3-tts-vc-realtime-2026-01-15Voice Cloning
qwen3-tts-vd-realtime-2026-01-15Voice Design

All models

ModelStreamingCustom voiceInstruction control
cosyvoice-v3-plus
cosyvoice-v3-flash
ModelStreamingCustom voiceInstruction control
qwen3-tts-flash
qwen3-tts-flash-realtime
qwen3-tts-instruct-flash
qwen3-tts-instruct-flash-realtime
ModelStreamingCustom voiceInstruction control
qwen3-tts-vc-2026-01-22
qwen3-tts-vc-realtime-2026-01-15
qwen3-tts-vd-2026-01-26
qwen3-tts-vd-realtime-2026-01-15
Previous generation models. We recommend the latest versions above for new projects.
ModelFamilyStreamingCustom voiceInstruction control
qwen3-tts-flash-2025-11-27Qwen3-TTS
qwen3-tts-flash-2025-09-18Qwen3-TTS
qwen3-tts-flash-realtime-2025-11-27Qwen3-TTS
qwen3-tts-flash-realtime-2025-09-18Qwen3-TTS
qwen3-tts-instruct-flash-2026-01-26Qwen3-TTS
qwen3-tts-instruct-flash-realtime-2026-01-22Qwen3-TTS
qwen3-tts-vc-realtime-2025-11-27Voice Cloning
qwen3-tts-vd-realtime-2025-12-16Voice Design

Learn more