Text-to-speech models

Two questions narrow the field: do you need a custom voice or will a built-in voice work, and do you need real-time streaming?

Built-in or custom voice?

Pick a voice from the library and start synthesizing immediately.

CosyVoice — rich voice library, high quality, no setup beyond picking a voice
Qwen3-TTS — low-latency streaming; add -instruct for natural-language control over speed, emotion, and style

Need a voice that doesn't exist in the library?

Voice Cloning — reproduce a specific person's voice from audio samples. Use when you have a target voice to match.
Voice Design — create a new voice from a text description (e.g., "a warm, low-pitched female voice"). Use when you want a brand voice without audio samples.

Three approaches, ranked by flexibility:

Instruction control (cosyvoice-v3-flash, qwen3-tts-instruct-flash, qwen3-tts-instruct-flash-realtime) — Describe the desired delivery in natural language. Control speed, emotion, and style per request. Most flexible. For details, see Real-time speech synthesis > Instruction control.
Voice design (qwen3-tts-vd-*) — Generate a custom voice from a text description. Good for creating a brand voice without audio samples.
Voice cloning (qwen3-tts-vc-*) — Reproduce an existing voice from audio samples. Best when you need to match a specific person's voice.

Model	Family	Streaming	Custom voice	Instruction control
`cosyvoice-v3-plus`	CosyVoice	✓	—	—
`qwen3-tts-flash`	Qwen3-TTS	✓	—	—
`qwen3-tts-flash-realtime`	Qwen3-TTS	✓	—	—
`qwen3-tts-instruct-flash`	Qwen3-TTS	✓	—	✓
`qwen3-tts-vc-realtime-2026-01-15`	Voice Cloning	✓	✓	—
`qwen3-tts-vd-realtime-2026-01-15`	Voice Design	✓	✓	—

CosyVoice

Model	Streaming	Custom voice	Instruction control
`cosyvoice-v3-plus`	✓	—	—
`cosyvoice-v3-flash`	✓	—	—

Qwen3-TTS

Model	Streaming	Custom voice	Instruction control
`qwen3-tts-flash`	✓	—	—
`qwen3-tts-flash-realtime`	✓	—	—
`qwen3-tts-instruct-flash`	✓	—	✓
`qwen3-tts-instruct-flash-realtime`	✓	—	✓

Voice Cloning & Design

Model	Streaming	Custom voice	Instruction control
`qwen3-tts-vc-2026-01-22`	✗	✓	—
`qwen3-tts-vc-realtime-2026-01-15`	✓	✓	—
`qwen3-tts-vd-2026-01-26`	✗	✓	—
`qwen3-tts-vd-realtime-2026-01-15`	✓	✓	—

Legacy

Previous generation models. We recommend the latest versions above for new projects.

Model	Family	Streaming	Custom voice	Instruction control
`qwen3-tts-flash-2025-11-27`	Qwen3-TTS	✓	—	—
`qwen3-tts-flash-2025-09-18`	Qwen3-TTS	✓	—	—
`qwen3-tts-flash-realtime-2025-11-27`	Qwen3-TTS	✓	—	—
`qwen3-tts-flash-realtime-2025-09-18`	Qwen3-TTS	✓	—	—
`qwen3-tts-instruct-flash-2026-01-26`	Qwen3-TTS	✓	—	✓
`qwen3-tts-instruct-flash-realtime-2026-01-22`	Qwen3-TTS	✓	—	✓
`qwen3-tts-vc-realtime-2025-11-27`	Voice Cloning	✓	✓	—
`qwen3-tts-vd-realtime-2025-12-16`	Voice Design	✓	✓	—