Skip to main content
Text-to-Speech

Speech synthesis

Natural voices with Qwen3

Supported models

Use an API key when calling the following models:
  • Qwen3-TTS-Instruct-Flash: qwen3-tts-instruct-flash (stable version, currently equivalent to qwen3-tts-instruct-flash-2026-01-26), qwen3-tts-instruct-flash-2026-01-26 (latest snapshot)
  • Qwen3-TTS-VD: qwen3-tts-vd-2026-01-26 (latest snapshot)
  • Qwen3-TTS-VC: qwen3-tts-vc-2026-01-22 (latest snapshot)
  • Qwen3-TTS-Flash: qwen3-tts-flash (stable version, currently equivalent to qwen3-tts-flash-2025-11-27), qwen3-tts-flash-2025-11-27, qwen3-tts-flash-2025-09-18
For more information, see Model list.

Model selection

ScenarioModelReason
Voice customization for branding, exclusive voices, or expanding system voices (based on text description)qwen3-tts-vd-2026-01-26Supports voice design to create custom voices from text descriptions without audio samples. Ideal for designing brand-specific voices from scratch.
Voice customization for branding, exclusive voices, or expanding system voices (based on audio samples)qwen3-tts-vc-2026-01-22Supports voice cloning to replicate voices from audio samples and create lifelike brand voiceprints with high fidelity.
Emotional content production (audiobooks, radio dramas, game/animation dubbing)qwen3-tts-instruct-flashSupports instruction control to precisely adjust pitch, speaking rate, emotion, and character personality using natural language. Ideal for scenarios requiring rich expressiveness.
Mobile navigation or notification announcementsqwen3-tts-flashSimple per-character billing. Suitable for short-text, high-frequency scenarios.
E-learning course narrationqwen3-tts-flashSupports multiple languages and dialects for regional teaching needs.
Batch audiobook productionqwen3-tts-flashCost-effective with rich voice options for expressive content.
For more details, see Model comparison.

Getting started

Prerequisites
In the DashScope Python SDK, the SpeechSynthesizer interface has been replaced by MultiModalConversation. To use the new interface, simply replace the name. All other parameters are fully compatible.

Use system voice

Use a system voice for speech synthesis. Non-streaming output Use the returned url to retrieve the synthesized audio. The URL is valid for 24 hours. You must import the Gson dependency for Java. If you use Maven or Gradle, add the dependency as follows:
  • Maven
  • Gradle
Add the following content to pom.xml:
<!-- https://mvnrepository.com/artifact/com.google.code.gson/gson -->
<dependency>
  <groupId>com.google.code.gson</groupId>
  <artifactId>gson</artifactId>
  <version>2.13.1</version>
</dependency>
import os
import dashscope

dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'

text = "Today is a wonderful day to build something people love!"
# To use the SpeechSynthesizer interface: dashscope.audio.qwen_tts.SpeechSynthesizer.call(...)
response = dashscope.MultiModalConversation.call(
  # Replace the model with qwen3-tts-instruct-flash to use instruction control.
  model="qwen3-tts-flash",
  # If you have not configured an environment variable, replace the following line with your API key: api_key = "sk-xxx"
  api_key=os.getenv("DASHSCOPE_API_KEY"),
  text=text,
  voice="Cherry",
  language_type="English", # We recommend that this matches the text language to ensure correct pronunciation and natural intonation.
  # To use instruction control, uncomment the following lines and replace the model with qwen3-tts-instruct-flash.
  # instructions='Speak quickly with a noticeable rising intonation, suitable for introducing fashion products.',
  # optimize_instructions=True,
  stream=False
)
print(response)
Streaming output Stream audio data in Base64 format. The last packet contains the URL for the complete audio file.
# coding=utf-8
#
# Installation instructions for pyaudio:
# APPLE Mac OS X
#   brew install portaudio
#   pip install pyaudio
# Debian/Ubuntu
#   sudo apt-get install python-pyaudio python3-pyaudio
#   or
#   pip install pyaudio
# CentOS
#   sudo yum install -y portaudio portaudio-devel && pip install pyaudio
# Microsoft Windows
#   python -m pip install pyaudio

import os
import dashscope
import pyaudio
import time
import base64
import numpy as np

dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'

p = pyaudio.PyAudio()
# Create an audio stream
stream = p.open(format=pyaudio.paInt16,
        channels=1,
        rate=24000,
        output=True)


text = "Today is a wonderful day to build something people love!"
response = dashscope.MultiModalConversation.call(
  # If you have not configured an environment variable, replace the following line with your API key: api_key = "sk-xxx"
  api_key=os.getenv("DASHSCOPE_API_KEY"),
  # Replace the model with qwen3-tts-instruct-flash to use instruction control.
  model="qwen3-tts-flash",
  text=text,
  voice="Cherry",
  language_type="English", # We recommend that this matches the text language to ensure correct pronunciation and natural intonation.
  # To use instruction control, uncomment the following lines and replace the model with qwen3-tts-instruct-flash.
  # instructions='Speak quickly with a noticeable rising intonation, suitable for introducing fashion products.',
  # optimize_instructions=True,
  stream=True
)

for chunk in response:
  if chunk.output is not None:
      audio = chunk.output.audio
      if audio.data is not None:
          wav_bytes = base64.b64decode(audio.data)
          audio_np = np.frombuffer(wav_bytes, dtype=np.int16)
          # Play the audio data directly
          stream.write(audio_np.tobytes())
      if chunk.output.finish_reason == "stop":
          print("finish at: {} ", chunk.output.audio.expires_at)
time.sleep(0.8)
# Clean up resources
stream.stop_stream()
stream.close()
p.terminate()

Use cloned voice

Voice cloning does not provide preview audio. Apply the cloned voice to speech synthesis to evaluate the result. These examples adapt the non-streaming output code, replacing the voice parameter with a cloned voice.
  • Key principle: The model used for voice cloning (target_model) must match the model used for speech synthesis (model). Otherwise, synthesis fails.
  • This example uses the local audio file voice.mp3 for voice cloning. Replace this path when running the code.
Add the Gson dependency for Java. If you use Maven or Gradle, add the dependency as follows:
  • Maven
  • Gradle
Add the following content to your pom.xml:
<!-- https://mvnrepository.com/artifact/com.google.code.gson/gson -->
<dependency>
  <groupId>com.google.code.gson</groupId>
  <artifactId>gson</artifactId>
  <version>2.13.1</version>
</dependency>
When using a custom voice generated by voice cloning for speech synthesis, set the voice as follows:
MultiModalConversationParam param = MultiModalConversationParam.builder()
        .parameter("voice", "your_voice") // Replace the voice parameter with the custom voice generated by cloning
        .build();
import os
import requests
import base64
import pathlib
import dashscope

# ======= Constant configuration =======
DEFAULT_TARGET_MODEL = "qwen3-tts-vc-2026-01-22"  # Use the same model for voice cloning and speech synthesis
DEFAULT_PREFERRED_NAME = "guanyu"
DEFAULT_AUDIO_MIME_TYPE = "audio/mpeg"
VOICE_FILE_PATH = "voice.mp3"  # Relative path to the local audio file used for voice cloning


def create_voice(file_path: str,
                 target_model: str = DEFAULT_TARGET_MODEL,
                 preferred_name: str = DEFAULT_PREFERRED_NAME,
                 audio_mime_type: str = DEFAULT_AUDIO_MIME_TYPE) -> str:
  """
  Create a voice and return the voice parameter.
  """
  # If you haven't configured an environment variable, replace the following line with: api_key = "sk-xxx"
  api_key = os.getenv("DASHSCOPE_API_KEY")

  file_path_obj = pathlib.Path(file_path)
  if not file_path_obj.exists():
    raise FileNotFoundError(f"Audio file does not exist: {file_path}")

  base64_str = base64.b64encode(file_path_obj.read_bytes()).decode()
  data_uri = f"data:{audio_mime_type};base64,{base64_str}"

  url = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization"
  payload = {
    "model": "qwen-voice-enrollment", # Do not change this value
    "input": {
      "action": "create",
      "target_model": target_model,
      "preferred_name": preferred_name,
      "audio": {"data": data_uri}
    }
  }
  headers = {
    "Authorization": f"Bearer {api_key}",
    "Content-Type": "application/json"
  }

  resp = requests.post(url, json=payload, headers=headers)
  if resp.status_code != 200:
    raise RuntimeError(f"Failed to create voice: {resp.status_code}, {resp.text}")

  try:
    return resp.json()["output"]["voice"]
  except (KeyError, ValueError) as e:
    raise RuntimeError(f"Failed to parse voice response: {e}")


if __name__ == '__main__':
  dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'

  text = "How's the weather today?"
  # SpeechSynthesizer interface usage: dashscope.audio.qwen_tts.SpeechSynthesizer.call(...)
  response = dashscope.MultiModalConversation.call(
    model=DEFAULT_TARGET_MODEL,
    # If you haven't configured an environment variable, replace the following line with: api_key = "sk-xxx"
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    text=text,
    voice=create_voice(VOICE_FILE_PATH), # Replace the voice parameter with the custom voice generated by cloning
    stream=False
  )
  print(response)

Use designed voice

Voice design returns preview audio. Listen to the preview to confirm it meets your expectations before using it for synthesis to reduce costs.
1

Generate a custom voice and preview the result

If you are satisfied with the result, proceed to the next step. Otherwise, generate it again.You need to import the Gson dependency for Java. If you are using Maven or Gradle, add the dependency as follows:
  • Maven
  • Gradle
Add the following content to pom.xml:
<!-- https://mvnrepository.com/artifact/com.google.code.gson/gson -->
<dependency>
  <groupId>com.google.code.gson</groupId>
  <artifactId>gson</artifactId>
  <version>2.13.1</version>
</dependency>
When using a custom voice generated by voice design for speech synthesis, you must set the voice as follows:
MultiModalConversationParam param = MultiModalConversationParam.builder()
        .parameter("voice", "your_voice") // Replace the voice parameter with the custom voice generated by voice design
        .build();
import requests
import base64
import os

def create_voice_and_play():
  # If the environment variable is not set, replace the following line with your API key: api_key = "sk-xxx"
  api_key = os.getenv("DASHSCOPE_API_KEY")

  if not api_key:
    print("Error: DASHSCOPE_API_KEY environment variable not found. Please set the API key first.")
    return None, None, None

  # Prepare request data
  headers = {
    "Authorization": f"Bearer {api_key}",
    "Content-Type": "application/json"
  }

  data = {
    "model": "qwen-voice-design",
    "input": {
      "action": "create",
      "target_model": "qwen3-tts-vd-2026-01-26",
      "voice_prompt": "A composed middle-aged male announcer with a deep, rich and magnetic voice, a steady speaking speed and clear articulation, is suitable for news broadcasting or documentary commentary.",
      "preview_text": "Dear listeners, hello everyone. Welcome to the evening news.",
      "preferred_name": "announcer",
      "language": "en"
    },
    "parameters": {
      "sample_rate": 24000,
      "response_format": "wav"
    }
  }

  url = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization"

  try:
    # Send the request
    response = requests.post(
      url,
      headers=headers,
      json=data,
      timeout=60  # Add a timeout setting
    )

    if response.status_code == 200:
      result = response.json()

      # Get the voice name
      voice_name = result["output"]["voice"]
      print(f"Voice name: {voice_name}")

      # Get the preview audio data
      base64_audio = result["output"]["preview_audio"]["data"]

      # Decode the Base64 audio data
      audio_bytes = base64.b64decode(base64_audio)

      # Save the audio file locally
      filename = f"{voice_name}_preview.wav"

      # Write the audio data to a local file
      with open(filename, 'wb') as f:
        f.write(audio_bytes)

      print(f"Audio saved to local file: {filename}")
      print(f"File path: {os.path.abspath(filename)}")

      return voice_name, audio_bytes, filename
    else:
      print(f"Request failed with status code: {response.status_code}")
      print(f"Response content: {response.text}")
      return None, None, None

  except requests.exceptions.RequestException as e:
    print(f"A network request error occurred: {e}")
    return None, None, None
  except KeyError as e:
    print(f"Response data format error, missing required field: {e}")
    print(f"Response content: {response.text if 'response' in locals() else 'No response'}")
    return None, None, None
  except Exception as e:
    print(f"An unknown error occurred: {e}")
    return None, None, None

if __name__ == "__main__":
  print("Starting to create voice...")
  voice_name, audio_data, saved_filename = create_voice_and_play()

  if voice_name:
    print(f"\nSuccessfully created voice '{voice_name}'")
    print(f"Audio file saved as: '{saved_filename}'")
    print(f"File size: {os.path.getsize(saved_filename)} bytes")
  else:
    print("\nVoice creation failed")
2

Use the custom voice for speech synthesis

Use the custom voice generated in the previous step for non-streaming speech synthesis.This example adapts the non-streaming output code, replacing the voice parameter with the custom voice generated by voice design. For streaming synthesis, see Getting started.Key principle: The model used for voice design (target_model) must be the same as the model used for subsequent speech synthesis (model). Otherwise, the synthesis will fail.
import os
import dashscope


if __name__ == '__main__':
  dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'

  text = "How is the weather today?"
  # How to use the SpeechSynthesizer interface: dashscope.audio.qwen_tts.SpeechSynthesizer.call(...)
  response = dashscope.MultiModalConversation.call(
    model="qwen3-tts-vd-2026-01-26",
    # If the environment variable is not set, replace the following line with your API key: api_key = "sk-xxx"
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    text=text,
    voice="myvoice", # Replace the voice parameter with the custom voice generated by voice design
    stream=False
  )
  print(response)

Instruction control

Control pitch, speed, emotion, and timbre using natural language instructions instead of audio parameters. Supported models: Qwen3-TTS-Instruct-Flash series only. Usage: Specify instructions in the instructions parameter. Example: "Fast-paced with rising intonation, suitable for fashion products." Supported languages: Chinese and English only. Length limit: Maximum 1600 tokens. Scenarios:
  • Audiobook and radio drama voice-overs
  • Advertising and promotional video voice-overs
  • Game role and animation voice-overs
  • Emotional intelligent voice assistants
  • Documentary and news broadcasting
Writing high-quality sound descriptions Core principles
  1. Be specific: Use descriptive words such as "deep," "crisp," or "fast-paced." Avoid vague words such as "nice" or "normal."
  2. Be multi-dimensional: Combine multiple dimensions such as pitch, speed, and emotion. Single-dimension descriptions such as "high-pitched" are too broad.
  3. Be objective: Focus on physical and perceptual features, not personal preferences. Use "high-pitched and energetic" instead of "my favorite sound."
  4. Be original: Describe sound qualities instead of requesting imitation of specific people. The model does not support direct imitation.
  5. Be concise: Ensure every word serves a purpose. Avoid repetitive synonyms or meaningless intensifiers.
Dimension description reference: You can combine multiple dimensions to create richer audio effects.
DimensionExample
PitchHigh, medium, low, high-pitched, low-pitched
SpeedFast, medium, slow, fast-paced, slow-paced
EmotionCheerful, calm, gentle, serious, lively, composed, soothing
CharacteristicsMagnetic, crisp, hoarse, mellow, sweet, deep, powerful
UsageNews broadcast, ad voice-over, audiobook, animation role, voice assistant, documentary narration
Examples
  • Standard broadcast style: Clear and precise articulation, well-rounded pronunciation.
  • Progressive emotional effect: Volume rapidly increases from normal conversation to a shout, with a straightforward personality and easily excited, expressive emotions.
  • Special emotional state: A sobbing tone causes slightly slurred and hoarse pronunciation, with noticeable tension in the crying voice.
  • Ad voice-over style: High-pitched, medium speed, full of energy and appeal, suitable for ad voice-overs.
  • Gentle and soothing style: Slow-paced, with a gentle and sweet pitch, and a soothing, warm tone, like a caring friend.

Voice customization

Qwen3-TTS supports both voice cloning (Qwen3-TTS-VC) and voice design (Qwen3-TTS-VD). See Voice cloning (Qwen) and Voice design (Qwen) for the API reference.

API reference

Model comparison

FeaturesQwen3-TTS-Instruct-FlashQwen3-TTS-VDQwen3-TTS-VCQwen3-TTS-FlashQwen-TTS
Supported languagesVaries by voice: Chinese (Mandarin), English, Spanish, Russian, Italian, French, Korean, Japanese, German, PortugueseChinese (Mandarin), English, Spanish, Russian, Italian, French, Korean, Japanese, German, PortugueseVaries by voice: Chinese (Mandarin, Shanghainese, Beijing dialect, Sichuan dialect, Nanjing dialect, Shaanxi dialect, Southern Min, Tianjin dialect), Cantonese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, KoreanVaries by voice: Chinese (Mandarin, Shanghainese, Beijing dialect, Sichuan dialect), English
Audio formatwav: for non-streaming output; pcm: for streaming output, Base64-encodedwav: for non-streaming output; pcm: for streaming output, Base64-encodedwav: for non-streaming output; pcm: for streaming output, Base64-encodedwav: for non-streaming output; pcm: for streaming output, Base64-encodedwav: for non-streaming output; pcm: for streaming output, Base64-encoded
Audio sampling rate24 kHz24 kHz24 kHz24 kHz24 kHz
Voice cloningNot supportedNot supportedSupportedNot supportedNot supported
Voice designNot supportedSupportedNot supportedNot supportedNot supported
SSMLNot supportedNot supportedNot supportedNot supportedNot supported
LaTeXNot supportedNot supportedNot supportedNot supportedNot supported
Volume controlSupportedSupportedSupportedNot supportedNot supported
Speech rate controlSupportedSupportedSupportedNot supportedNot supported
Pitch controlSupportedSupportedSupportedNot supportedNot supported
Bitrate controlNot supportedNot supportedNot supportedNot supportedNot supported
TimestampNot supportedNot supportedNot supportedNot supportedNot supported
Instruction control (Instruct)SupportedNot supportedNot supportedNot supportedNot supported
Streaming inputNot supportedNot supportedNot supportedNot supportedNot supported
Streaming outputSupportedSupportedSupportedSupportedSupported
Rate limitsRPM: 180RPM: 180RPM: 180RPM varies by model: qwen3-tts-flash, qwen3-tts-flash-2025-11-27: 180; qwen3-tts-flash-2025-09-18: 10RPM: 10; TPM, including input and output tokens: 100,000
Connection typeJava/Python SDK, WebSocket APIJava/Python SDK, WebSocket APIJava/Python SDK, WebSocket APIJava/Python SDK, WebSocket APIJava/Python SDK, WebSocket API
Pricing$0.115 per 10K characters$0.115 per 10K characters$0.115 per 10K characters$0.1 per 10K charactersN/A

System voices

Supported voices vary by model. Set the voice request parameter to the value in the voice parameter column in the voice list.
voice parameterDetailsSupported languagesSupported models
CherryVoice name: Cherry. A sunny, positive, friendly, and natural young woman (female)Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, KoreanQwen3-TTS-Instruct-Flash, Qwen3-TTS-Flash, Qwen-TTS
SerenaVoice name: Serena. A gentle young woman (female)Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, KoreanQwen3-TTS-Instruct-Flash, Qwen3-TTS-Flash, Qwen-TTS
EthanVoice name: Ethan. Standard Mandarin with a slight northern accent. Sunny, warm, energetic, and vibrant (male)Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, KoreanQwen3-TTS-Instruct-Flash, Qwen3-TTS-Flash, Qwen-TTS
ChelsieVoice name: Chelsie. A two-dimensional virtual girlfriend (female)Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, KoreanQwen3-TTS-Instruct-Flash, Qwen3-TTS-Flash, Qwen-TTS
MomoVoice name: Momo. Playful and mischievous, cheering you up (female)Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, KoreanQwen3-TTS-Instruct-Flash, Qwen3-TTS-Flash
VivianVoice name: Vivian. Confident, cute, and slightly feisty (female)Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, KoreanQwen3-TTS-Instruct-Flash, Qwen3-TTS-Flash
MoonVoice name: Moon. A bold and handsome man named Yuebai (male)Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, KoreanQwen3-TTS-Instruct-Flash, Qwen3-TTS-Flash
MaiaVoice name: Maia. A blend of intellect and gentleness (female)Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, KoreanQwen3-TTS-Instruct-Flash, Qwen3-TTS-Flash
KaiVoice name: Kai. A soothing audio spa for your ears (male)Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, KoreanQwen3-TTS-Instruct-Flash, Qwen3-TTS-Flash
NofishVoice name: Nofish. A designer who cannot pronounce retroflex sounds (male)Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, KoreanQwen3-TTS-Instruct-Flash, Qwen3-TTS-Flash
BellaVoice name: Bella. A little girl who drinks but never throws punches when drunk (female)Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, KoreanQwen3-TTS-Instruct-Flash, Qwen3-TTS-Flash
JenniferVoice name: Jennifer. A premium, cinematic-quality American English female voice (female)Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, KoreanQwen3-TTS-Flash
RyanVoice name: Ryan. Full of rhythm, bursting with dramatic flair, balancing authenticity and tension (male)Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, KoreanQwen3-TTS-Flash
KaterinaVoice name: Katerina. A mature-woman voice with rich, memorable rhythm (female)Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, KoreanQwen3-TTS-Flash
AidenVoice name: Aiden. An American English young man skilled in cooking (male)Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, KoreanQwen3-TTS-Flash
Eldric SageVoice name: Eldric Sage. A calm and wise elder (male)Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, KoreanQwen3-TTS-Instruct-Flash, Qwen3-TTS-Flash
MiaVoice name: Mia. Gentle as spring water, obedient as fresh snow (female)Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, KoreanQwen3-TTS-Instruct-Flash, Qwen3-TTS-Flash
MochiVoice name: Mochi. A clever, quick-witted young adult (male)Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, KoreanQwen3-TTS-Instruct-Flash, Qwen3-TTS-Flash
BellonaVoice name: Bellona. A powerful, clear voice that brings characters to lifeChinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, KoreanQwen3-TTS-Instruct-Flash, Qwen3-TTS-Flash
VincentVoice name: Vincent. A uniquely raspy, smoky voice (male)Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, KoreanQwen3-TTS-Instruct-Flash, Qwen3-TTS-Flash
BunnyVoice name: Bunny. A little girl overflowing with "cuteness" (female)Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, KoreanQwen3-TTS-Instruct-Flash, Qwen3-TTS-Flash
NeilVoice name: Neil. A flat baseline intonation with precise, clear pronunciation (male)Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, KoreanQwen3-TTS-Instruct-Flash, Qwen3-TTS-Flash
EliasVoice name: Elias. Maintains academic rigor while using storytelling techniques (female)Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, KoreanQwen3-TTS-Instruct-Flash, Qwen3-TTS-Flash
ArthurVoice name: Arthur. A simple, earthy voice steeped in time (male)Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, KoreanQwen3-TTS-Instruct-Flash, Qwen3-TTS-Flash
NiniVoice name: Nini. A soft, clingy voice like sweet rice cakes (female)Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, KoreanQwen3-TTS-Instruct-Flash, Qwen3-TTS-Flash
EbonaVoice name: Ebona. A whisper like a rusty key slowly turning in the darkest corner (female)Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, KoreanQwen3-TTS-Instruct-Flash, Qwen3-TTS-Flash
SerenVoice name: Seren. A gentle, soothing voice to help you fall asleep faster (female)Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, KoreanQwen3-TTS-Instruct-Flash, Qwen3-TTS-Flash
PipVoice name: Pip. A playful, mischievous boy full of childlike wonder (male)Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, KoreanQwen3-TTS-Instruct-Flash, Qwen3-TTS-Flash
StellaVoice name: Stella. A cloyingly sweet, dazed teenage-girl voice (female)Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, KoreanQwen3-TTS-Instruct-Flash, Qwen3-TTS-Flash
BodegaVoice name: Bodega. A passionate Spanish man (male)Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, KoreanQwen3-TTS-Flash
SonrisaVoice name: Sonrisa. A cheerful, outgoing Latin American woman (female)Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, KoreanQwen3-TTS-Flash
AlekVoice name: Alek. Cold like the Russian spirit, yet warm like wool coat lining (male)Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, KoreanQwen3-TTS-Flash
DolceVoice name: Dolce. A laid-back Italian man (male)Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, KoreanQwen3-TTS-Flash
SoheeVoice name: Sohee. A warm, cheerful, emotionally expressive Korean unnie (female)Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, KoreanQwen3-TTS-Flash
Ono AnnaVoice name: Ono Anna. A clever, spirited childhood friend (female)Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, KoreanQwen3-TTS-Flash
LennVoice name: Lenn. Rational at heart, rebellious in detailChinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, KoreanQwen3-TTS-Flash
EmilienVoice name: Emilien. A romantic French big brother (male)Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, KoreanQwen3-TTS-Flash
AndreVoice name: Andre. A magnetic, natural, and steady male voiceChinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, KoreanQwen3-TTS-Flash
Radio GolVoice name: Radio Gol. Football poet (male)Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, KoreanQwen3-TTS-Flash
JadaVoice name: Shanghai - Jada. A fast-paced, energetic Shanghai auntie (female)Shanghainese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, KoreanQwen3-TTS-Flash, Qwen-TTS
DylanVoice name: Beijing - Dylan. A young man raised in Beijing's hutongs (male)Beijing dialect, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, KoreanQwen3-TTS-Flash, Qwen-TTS
LiVoice name: Nanjing - Li. A patient yoga teacher (male)Nanjing dialect, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, KoreanQwen3-TTS-Flash
MarcusVoice name: Shaanxi - Marcus. The authentic Shaanxi flavor (male)Shaanxi dialect, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, KoreanQwen3-TTS-Flash
RoyVoice name: Southern Min - Roy. A humorous, straightforward, lively Taiwanese guy (male)Southern Min, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, KoreanQwen3-TTS-Flash
PeterVoice name: Tianjin - Peter. Tianjin-style crosstalk, professional foil (male)Tianjin dialect, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, KoreanQwen3-TTS-Flash
SunnyVoice name: Sichuan - Sunny. A Sichuan girl sweet enough to melt your heart (female)Sichuan dialect, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, KoreanQwen3-TTS-Flash, Qwen-TTS
EricVoice name: Sichuan - Eric. A Sichuanese man from Chengdu (male)Sichuan dialect, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, KoreanQwen3-TTS-Flash
RockyVoice name: Cantonese - Rocky. A humorous, witty live chatter (male)Cantonese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, KoreanQwen3-TTS-Flash
KikiVoice name: Cantonese - Kiki. A sweet Hong Kong girl best friend (female)Cantonese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, KoreanQwen3-TTS-Flash

FAQ

Q: How long is the audio file URL valid? The audio file URL expires after 24 hours.

Learn more