Voice design | Qwen Cloud

Use the returned voice name with Qwen TTS or Realtime streaming TTS. For an overview of how voice design works, supported models and languages, and tips for writing effective voice descriptions, see Voice design guide.

The target_model in voice design must match the model in synthesis. Mismatched models cause failures.

Prerequisites

Get an API key and set it as an environment variable.
Install the DashScope SDK (SDK examples only).

API reference

All operations use the same endpoint and authentication. Set the action parameter to choose the operation.

Common request details

Endpoint

POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization

Request headers

Header	Type	Required	Description
Authorization	string	Yes	`Bearer $DASHSCOPE_API_KEY`
Content-Type	string	Yes	`application/json`

Use the same account for voice design and synthesis.

Create a voice

Creates a custom voice from a text description and returns preview audio. Request syntax

{
  "model": "qwen-voice-design",
  "input": {
    "action": "create",
    "target_model": "<target-synthesis-model>",
    "voice_prompt": "<voice-description>",
    "preview_text": "<text-for-preview-audio>",
    "preferred_name": "<keyword-for-voice-name>",
    "language": "<language-code>"
  },
  "parameters": {
    "sample_rate": 24000,
    "response_format": "wav"
  }
}

model is the voice design model (always qwen-voice-design). target_model is the synthesis model that drives the created voice. Do not confuse them.

Request parameters

Parameter	Type	Default	Required	Description
model	string	--	Yes	Voice design model. Fixed to `qwen-voice-design`.
action	string	--	Yes	Operation type. Fixed to `create`.
target_model	string	--	Yes	Synthesis model for the voice. Must match the model in subsequent synthesis calls. Values: `qwen3-tts-vd-realtime-2026-01-15`, `qwen3-tts-vd-realtime-2025-12-16` (real-time), `qwen3-tts-vd-2026-01-26` (non-real-time).
voice_prompt	string	--	Yes	Voice description. Max 2,048 characters. Chinese and English only. See Write effective voice descriptions.
preview_text	string	--	Yes	Text for the preview audio. Max 1,024 characters. Must be in a supported language.
preferred_name	string	--	No	Keyword for the voice name (alphanumeric and underscores, max 16 characters). Appears in the generated voice name. Example: `announcer` produces `qwen-tts-vd-announcer-voice-20251201102800-a1b2`.
language	string	`zh`	No	Language code for the generated voice. Must match the `preview_text` language. Valid values: `zh`, `en`, `de`, `it`, `pt`, `es`, `ja`, `ko`, `fr`, `ru`.
sample_rate	int	24000	No	Sample rate in Hz for the preview audio. Valid values: `8000`, `16000`, `24000`, `48000`.
response_format	string	`wav`	No	Audio format for the preview. Valid values: `pcm`, `wav`, `mp3`, `opus`.

Response example

{
  "output": {
    "preview_audio": {
      "data": "{base64_encoded_audio}",
      "sample_rate": 24000,
      "response_format": "wav"
    },
    "target_model": "qwen3-tts-vd-realtime-2026-01-15",
    "voice": "qwen-tts-vd-announcer-voice-20251201102800-a1b2"
  },
  "usage": {
    "count": 1
  },
  "request_id": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
}

Response parameters

Parameter	Type	Description
voice	string	Generated voice name. Pass this as the `voice` parameter in the synthesis API.
preview_audio.data	string	Base64-encoded preview audio.
preview_audio.sample_rate	int	Sample rate of the preview audio (matches request or defaults to 24000).
preview_audio.response_format	string	Format of the preview audio (matches request or defaults to `wav`).
target_model	string	Synthesis model bound to this voice.
usage.count	int	Voice creations billed. Always `1` for a successful creation ($0.2 per count).
request_id	string	Request ID for troubleshooting.

List voices

Returns a paginated list of voices under your account. Request syntax

{
  "model": "qwen-voice-design",
  "input": {
    "action": "list",
    "page_size": 10,
    "page_index": 0
  }
}

Request parameters

Parameter	Type	Default	Required	Description
model	string	--	Yes	Fixed to `qwen-voice-design`.
action	string	--	Yes	Fixed to `list`.
page_index	integer	0	No	Page number. Range: 0--200.
page_size	integer	10	No	Results per page. Must be greater than 0.

Response example

{
  "output": {
    "page_index": 0,
    "page_size": 2,
    "total_count": 26,
    "voice_list": [
      {
        "gmt_create": "2025-12-10 17:04:54",
        "gmt_modified": "2025-12-10 17:04:54",
        "language": "zh",
        "preview_text": "Dear listeners, hello everyone. Welcome to today's program.",
        "target_model": "qwen3-tts-vd-realtime-2026-01-15",
        "voice": "qwen-tts-vd-announcer-voice-20251210170454-a1b2",
        "voice_prompt": "A composed middle-aged male announcer with a deep, rich and magnetic voice, a steady speaking speed and clear articulation, suitable for news broadcasting or documentary commentary."
      }
    ]
  },
  "usage": {},
  "request_id": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
}

Response parameters

Parameter	Type	Description
page_index	integer	Current page number.
page_size	integer	Entries per page.
total_count	integer	Total number of voices.
voice_list[].voice	string	Voice name.
voice_list[].target_model	string	Synthesis model bound to this voice.
voice_list[].language	string	Language code.
voice_list[].voice_prompt	string	Voice description.
voice_list[].preview_text	string	Preview text.
voice_list[].gmt_create	string	Creation timestamp.
voice_list[].gmt_modified	string	Last modified timestamp.
request_id	string	Request ID.

Query a voice

Returns details about a specific voice. Request syntax

{
  "model": "qwen-voice-design",
  "input": {
    "action": "query",
    "voice": "<voice-name>"
  }
}

Request parameters

Parameter	Type	Default	Required	Description
model	string	--	Yes	Fixed to `qwen-voice-design`.
action	string	--	Yes	Fixed to `query`.
voice	string	--	Yes	Voice name to query.

Response example (voice found)

{
  "output": {
    "gmt_create": "2025-12-10 14:54:09",
    "gmt_modified": "2025-12-10 17:47:48",
    "language": "zh",
    "preview_text": "Dear listeners, hello everyone.",
    "target_model": "qwen3-tts-vd-realtime-2026-01-15",
    "voice": "qwen-tts-vd-announcer-voice-20251210145409-a1b2",
    "voice_prompt": "A composed middle-aged male announcer with a deep, rich and magnetic voice, a steady speaking speed and clear articulation, suitable for news broadcasting or documentary commentary."
  },
  "usage": {},
  "request_id": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
}

Response example (voice not found) If the voice does not exist, the API returns HTTP 400 with VoiceNotFound:

{
  "request_id": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
  "code": "VoiceNotFound",
  "message": "Voice not found: qwen-tts-vd-announcer-voice-xxxx"
}

Response parameters

Parameter	Type	Description
voice	string	Voice name.
target_model	string	Synthesis model bound to this voice.
language	string	Language code.
voice_prompt	string	Voice description.
preview_text	string	Preview text.
gmt_create	string	Creation time.
gmt_modified	string	Last modification time.
request_id	string	Request ID.

Delete a voice

Deletes a voice and releases its quota. Request syntax

{
  "model": "qwen-voice-design",
  "input": {
    "action": "delete",
    "voice": "<voice-name>"
  }
}

Request parameters

Parameter	Type	Default	Required	Description
model	string	--	Yes	Fixed to `qwen-voice-design`.
action	string	--	Yes	Fixed to `delete`.
voice	string	--	Yes	Voice name to delete.

Response example

{
  "output": {
    "voice": "qwen-tts-vd-announcer-voice-20251210145409-a1b2"
  },
  "usage": {},
  "request_id": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
}

Response parameters

Parameter	Type	Description
voice	string	Deleted voice name.
request_id	string	Request ID.

Sample code

Create a voice and preview

cURL
Python
Java

curl -X POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
  "model": "qwen-voice-design",
  "input": {
    "action": "create",
    "target_model": "qwen3-tts-vd-realtime-2026-01-15",
    "voice_prompt": "A composed middle-aged male announcer with a deep, rich and magnetic voice, a steady speaking speed and clear articulation, suitable for news broadcasting or documentary commentary.",
    "preview_text": "Dear listeners, hello everyone. Welcome to the evening news.",
    "preferred_name": "announcer",
    "language": "en"
  },
  "parameters": {
    "sample_rate": 24000,
    "response_format": "wav"
  }
}'

import requests
import base64
import os

def create_voice():
  """Create a custom voice and save the preview audio."""
  # Load API key from environment variable
  api_key = os.getenv("DASHSCOPE_API_KEY")
  if not api_key:
    print("Error: DASHSCOPE_API_KEY not set.")
    return None, None

  data = {
    "model": "qwen-voice-design",
    "input": {
      "action": "create",
      "target_model": "qwen3-tts-vd-realtime-2026-01-15",
      "voice_prompt": "A composed middle-aged male announcer with a deep, rich "
                           "and magnetic voice, a steady speaking speed and clear "
                           "articulation, suitable for news broadcasting or "
                           "documentary commentary.",
      "preview_text": "Dear listeners, hello everyone. Welcome to the evening news.",
      "preferred_name": "announcer",
      "language": "en"
    },
    "parameters": {
      "sample_rate": 24000,
      "response_format": "wav"
    }
  }

  response = requests.post(
    "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization",
    headers={
      "Authorization": f"Bearer {api_key}",
      "Content-Type": "application/json"
    },
    json=data,
    timeout=60
  )

  if response.status_code == 200:
    result = response.json()
    voice_name = result["output"]["voice"]
    audio_bytes = base64.b64decode(result["output"]["preview_audio"]["data"])

    # Save preview audio
    filename = f"{voice_name}_preview.wav"
    with open(filename, "wb") as f:
      f.write(audio_bytes)

    print(f"Voice created: {voice_name}")
    print(f"Preview saved to: {filename}")
    return voice_name, filename
  else:
    print(f"Request failed ({response.status_code}): {response.text}")
    return None, None

if __name__ == "__main__":
  create_voice()

Add the Gson dependency to your project:

<!-- https://mvnrepository.com/artifact/com.google.code.gson/gson -->
<dependency>
  <groupId>com.google.code.gson</groupId>
  <artifactId>gson</artifactId>
  <version>2.13.1</version>
</dependency>

import com.google.gson.JsonObject;
import com.google.gson.JsonParser;
import java.io.*;
import java.net.HttpURLConnection;
import java.net.URL;
import java.util.Base64;

public class Main {
  public static void main(String[] args) {
    new Main().createVoice();
  }

  public void createVoice() {
    // Load API key from environment variable
    String apiKey = System.getenv("DASHSCOPE_API_KEY");

    String jsonBody = "{\n" +
        "    \"model\": \"qwen-voice-design\",\n" +
        "    \"input\": {\n" +
        "        \"action\": \"create\",\n" +
        "        \"target_model\": \"qwen3-tts-vd-realtime-2026-01-15\",\n" +
        "        \"voice_prompt\": \"A composed middle-aged male announcer with a deep, rich and magnetic voice, a steady speaking speed and clear articulation, suitable for news broadcasting or documentary commentary.\",\n" +
        "        \"preview_text\": \"Dear listeners, hello everyone. Welcome to the evening news.\",\n" +
        "        \"preferred_name\": \"announcer\",\n" +
        "        \"language\": \"en\"\n" +
        "    },\n" +
        "    \"parameters\": {\n" +
        "        \"sample_rate\": 24000,\n" +
        "        \"response_format\": \"wav\"\n" +
        "    }\n" +
        "}";

    HttpURLConnection connection = null;
    try {
      URL url = new URL("https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization");
      connection = (HttpURLConnection) url.openConnection();
      connection.setRequestMethod("POST");
      connection.setRequestProperty("Authorization", "Bearer " + apiKey);
      connection.setRequestProperty("Content-Type", "application/json");
      connection.setDoOutput(true);

      // Send request body
      try (OutputStream os = connection.getOutputStream()) {
        os.write(jsonBody.getBytes("UTF-8"));
        os.flush();
      }

      int responseCode = connection.getResponseCode();
      if (responseCode == HttpURLConnection.HTTP_OK) {
        StringBuilder response = new StringBuilder();
        try (BufferedReader br = new BufferedReader(
            new InputStreamReader(connection.getInputStream(), "UTF-8"))) {
          String line;
          while ((line = br.readLine()) != null) {
            response.append(line.trim());
          }
        }

        // Parse response and save preview audio
        JsonObject jsonResponse = JsonParser.parseString(response.toString()).getAsJsonObject();
        JsonObject output = jsonResponse.getAsJsonObject("output");
        String voiceName = output.get("voice").getAsString();
        String base64Audio = output.getAsJsonObject("preview_audio").get("data").getAsString();

        byte[] audioBytes = Base64.getDecoder().decode(base64Audio);
        String filename = voiceName + "_preview.wav";
        try (FileOutputStream fos = new FileOutputStream(filename)) {
          fos.write(audioBytes);
        }

        System.out.println("Voice created: " + voiceName);
        System.out.println("Preview saved to: " + filename);
      } else {
        StringBuilder error = new StringBuilder();
        try (BufferedReader br = new BufferedReader(
            new InputStreamReader(connection.getErrorStream(), "UTF-8"))) {
          String line;
          while ((line = br.readLine()) != null) {
            error.append(line.trim());
          }
        }
        System.out.println("Request failed (" + responseCode + "): " + error);
      }
    } catch (Exception e) {
      System.err.println("Error: " + e.getMessage());
      e.printStackTrace();
    } finally {
      if (connection != null) connection.disconnect();
    }
  }
}

Use a custom voice for synthesis

After creating a voice, pass the returned voice name to the synthesis API. The model must match the target_model from voice design.

Bidirectional streaming (real-time)

Use with qwen3-tts-vd-realtime-2026-01-15. See Realtime streaming TTS for details.

Python
Java

# pyaudio installation:
#   macOS:   brew install portaudio && pip install pyaudio
#   Ubuntu:  sudo apt-get install python3-pyaudio  (or pip install pyaudio)
#   CentOS:  sudo yum install -y portaudio portaudio-devel && pip install pyaudio
#   Windows: python -m pip install pyaudio

import pyaudio
import os
import base64
import threading
import time
import dashscope
from dashscope.audio.qwen_tts_realtime import QwenTtsRealtime, QwenTtsRealtimeCallback, AudioFormat

TEXT_TO_SYNTHESIZE = [
  "Right? I really like this kind of supermarket,",
  "especially during the New Year.",
  "Going to the supermarket",
  "just makes me feel",
  "super, super happy!",
  "I want to buy so many things!"
]

def init_dashscope_api_key():
  """Load the API key from environment variable."""
  dashscope.api_key = os.getenv("DASHSCOPE_API_KEY")

class MyCallback(QwenTtsRealtimeCallback):
  """Callback for streaming TTS playback."""
  def __init__(self):
    self.complete_event = threading.Event()
    self._player = pyaudio.PyAudio()
    self._stream = self._player.open(
      format=pyaudio.paInt16, channels=1, rate=24000, output=True
    )

  def on_open(self) -> None:
    print("[TTS] Connection established")

  def on_close(self, close_status_code, close_msg) -> None:
    self._stream.stop_stream()
    self._stream.close()
    self._player.terminate()
    print(f"[TTS] Connection closed, code={close_status_code}, msg={close_msg}")

  def on_event(self, response: dict) -> None:
    event_type = response.get("type", "")
    if event_type == "session.created":
      print(f'[TTS] Session started: {response["session"]["id"]}')
    elif event_type == "response.audio.delta":
      audio_data = base64.b64decode(response["delta"])
      self._stream.write(audio_data)
    elif event_type == "response.done":
      print(f"[TTS] Response complete, ID: {qwen_tts_realtime.get_last_response_id()}")
    elif event_type == "session.finished":
      print("[TTS] Session finished")
      self.complete_event.set()

  def wait_for_finished(self):
    self.complete_event.wait()

if __name__ == "__main__":
  init_dashscope_api_key()

  callback = MyCallback()
  qwen_tts_realtime = QwenTtsRealtime(
    model="qwen3-tts-vd-realtime-2026-01-15",
    callback=callback,
    url="wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime"
  )
  qwen_tts_realtime.connect()

  qwen_tts_realtime.update_session(
    voice="<your-voice-name>",  # Replace with your voice design voice name
    response_format=AudioFormat.PCM_24000HZ_MONO_16BIT,
    mode="server_commit"
  )

  for text_chunk in TEXT_TO_SYNTHESIZE:
    print(f"[Sending text]: {text_chunk}")
    qwen_tts_realtime.append_text(text_chunk)
    time.sleep(0.1)

  qwen_tts_realtime.finish()
  callback.wait_for_finished()

  print(f"[Metric] session_id={qwen_tts_realtime.get_session_id()}, "
          f"first_audio_delay={qwen_tts_realtime.get_first_audio_delay()}s")

import com.alibaba.dashscope.audio.qwen_tts_realtime.*;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.google.gson.JsonObject;

import javax.sound.sampled.*;
import java.util.Base64;
import java.util.Queue;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.atomic.AtomicReference;
import java.util.concurrent.ConcurrentLinkedQueue;
import java.util.concurrent.atomic.AtomicBoolean;

public class Main {
  private static String[] textToSynthesize = {
      "Right? I really like this kind of supermarket,",
      "especially during the New Year.",
      "Going to the supermarket",
      "just makes me feel",
      "super, super happy!",
      "I want to buy so many things!"
  };

  // Real-time PCM audio player
  public static class RealtimePcmPlayer {
    private int sampleRate;
    private SourceDataLine line;
    private Thread decoderThread;
    private Thread playerThread;
    private AtomicBoolean stopped = new AtomicBoolean(false);
    private Queue<String> b64AudioBuffer = new ConcurrentLinkedQueue<>();
    private Queue<byte[]> rawAudioBuffer = new ConcurrentLinkedQueue<>();

    public RealtimePcmPlayer(int sampleRate) throws LineUnavailableException {
      this.sampleRate = sampleRate;
      AudioFormat audioFormat = new AudioFormat(this.sampleRate, 16, 1, true, false);
      DataLine.Info info = new DataLine.Info(SourceDataLine.class, audioFormat);
      line = (SourceDataLine) AudioSystem.getLine(info);
      line.open(audioFormat);
      line.start();

      decoderThread = new Thread(() -> {
        while (!stopped.get()) {
          String b64Audio = b64AudioBuffer.poll();
          if (b64Audio != null) {
            rawAudioBuffer.add(Base64.getDecoder().decode(b64Audio));
          } else {
            try { Thread.sleep(100); } catch (InterruptedException e) { throw new RuntimeException(e); }
          }
        }
      });

      playerThread = new Thread(() -> {
        while (!stopped.get()) {
          byte[] rawAudio = rawAudioBuffer.poll();
          if (rawAudio != null) {
            int bytesWritten = 0;
            while (bytesWritten < rawAudio.length) {
              bytesWritten += line.write(rawAudio, bytesWritten, rawAudio.length - bytesWritten);
            }
            int audioLength = rawAudio.length / (this.sampleRate * 2 / 1000);
            try { Thread.sleep(audioLength - 10); } catch (InterruptedException e) { throw new RuntimeException(e); }
          } else {
            try { Thread.sleep(100); } catch (InterruptedException e) { throw new RuntimeException(e); }
          }
        }
      });

      decoderThread.start();
      playerThread.start();
    }

    public void write(String b64Audio) { b64AudioBuffer.add(b64Audio); }

    public void waitForComplete() throws InterruptedException {
      while (!b64AudioBuffer.isEmpty() || !rawAudioBuffer.isEmpty()) { Thread.sleep(100); }
      line.drain();
    }

    public void shutdown() throws InterruptedException {
      stopped.set(true);
      decoderThread.join();
      playerThread.join();
      if (line != null && line.isRunning()) { line.drain(); line.close(); }
    }
  }

  public static void main(String[] args) throws Exception {
    QwenTtsRealtimeParam param = QwenTtsRealtimeParam.builder()
        .model("qwen3-tts-vd-realtime-2026-01-15")
        .url("wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime")
        .apikey(System.getenv("DASHSCOPE_API_KEY"))
        .build();

    AtomicReference<CountDownLatch> completeLatch = new AtomicReference<>(new CountDownLatch(1));
    RealtimePcmPlayer audioPlayer = new RealtimePcmPlayer(24000);

    QwenTtsRealtime qwenTtsRealtime = new QwenTtsRealtime(param, new QwenTtsRealtimeCallback() {
      @Override
      public void onOpen() { }

      @Override
      public void onEvent(JsonObject message) {
        String type = message.get("type").getAsString();
        switch (type) {
          case "response.audio.delta":
            audioPlayer.write(message.get("delta").getAsString());
            break;
          case "session.finished":
            completeLatch.get().countDown();
            break;
        }
      }

      @Override
      public void onClose(int code, String reason) { }
    });

    try {
      qwenTtsRealtime.connect();
    } catch (NoApiKeyException e) {
      throw new RuntimeException(e);
    }

    QwenTtsRealtimeConfig config = QwenTtsRealtimeConfig.builder()
        .voice("<your-voice-name>")  // Replace with your voice design voice name
        .responseFormat(QwenTtsRealtimeAudioFormat.PCM_24000HZ_MONO_16BIT)
        .mode("server_commit")
        .build();
    qwenTtsRealtime.updateSession(config);

    for (String text : textToSynthesize) {
      qwenTtsRealtime.appendText(text);
      Thread.sleep(100);
    }
    qwenTtsRealtime.finish();
    completeLatch.get().await();

    audioPlayer.waitForComplete();
    audioPlayer.shutdown();
    System.exit(0);
  }
}

Non-streaming and unidirectional streaming

Use with qwen3-tts-vd-2026-01-26. Pass the returned voice name to the synthesis API with the matching model. See Qwen TTS for details and code examples.

Query voices

cURL
Python
Java

# Query a specific voice
curl -X POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
  "model": "qwen-voice-design",
  "input": {
    "action": "query",
    "voice": "<your-voice-name>"
  }
}'

# List all voices (paginated)
curl -X POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
  "model": "qwen-voice-design",
  "input": {
    "action": "list",
    "page_size": 10,
    "page_index": 0
  }
}'

import requests
import os

def query_voice(voice_name):
  """Get details for a specific voice."""
  api_key = os.getenv("DASHSCOPE_API_KEY")

  response = requests.post(
    "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization",
    headers={
      "Authorization": f"Bearer {api_key}",
      "Content-Type": "application/json"
    },
    json={
      "model": "qwen-voice-design",
      "input": {
        "action": "query",
        "voice": voice_name
      }
    }
  )

  if response.status_code == 200:
    result = response.json()
    print(f"Voice: {result['output']['voice']}")
    print(f"Model: {result['output']['target_model']}")
    print(f"Created: {result['output']['gmt_create']}")
    return result
  else:
    error = response.json()
    if error.get("code") == "VoiceNotFound":
      print(f"Voice not found: {voice_name}")
    else:
      print(f"Request failed ({response.status_code}): {response.text}")
    return None

def list_voices(page_index=0, page_size=10):
  """List all voices with pagination."""
  api_key = os.getenv("DASHSCOPE_API_KEY")

  response = requests.post(
    "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization",
    headers={
      "Authorization": f"Bearer {api_key}",
      "Content-Type": "application/json"
    },
    json={
      "model": "qwen-voice-design",
      "input": {
        "action": "list",
        "page_size": page_size,
        "page_index": page_index
      }
    }
  )

  if response.status_code == 200:
    result = response.json()
    total = result["output"]["total_count"]
    voices = result["output"]["voice_list"]
    print(f"Total voices: {total}")
    for v in voices:
      print(f"  - {v['voice']} ({v['language']}, {v['target_model']})")
    return result
  else:
    print(f"Request failed ({response.status_code}): {response.text}")
    return None

if __name__ == "__main__":
  list_voices()

Query a specific voice:

import com.google.gson.JsonObject;
import com.google.gson.JsonParser;
import java.io.*;
import java.net.HttpURLConnection;
import java.net.URL;

public class Main {

  public static void main(String[] args) {
    Main example = new Main();
    String voiceName = "<your-voice-name>";  // Replace with the actual voice name
    System.out.println("Querying voice: " + voiceName);
    example.queryVoice(voiceName);
  }

  public void queryVoice(String voiceName) {
    String apiKey = System.getenv("DASHSCOPE_API_KEY");

    String jsonBody = "{\n" +
        "    \"model\": \"qwen-voice-design\",\n" +
        "    \"input\": {\n" +
        "        \"action\": \"query\",\n" +
        "        \"voice\": \"" + voiceName + "\"\n" +
        "    }\n" +
        "}";

    HttpURLConnection connection = null;
    try {
      URL url = new URL("https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization");
      connection = (HttpURLConnection) url.openConnection();
      connection.setRequestMethod("POST");
      connection.setRequestProperty("Authorization", "Bearer " + apiKey);
      connection.setRequestProperty("Content-Type", "application/json");
      connection.setDoOutput(true);
      connection.setDoInput(true);

      try (OutputStream os = connection.getOutputStream()) {
        byte[] input = jsonBody.getBytes("UTF-8");
        os.write(input, 0, input.length);
        os.flush();
      }

      int responseCode = connection.getResponseCode();
      if (responseCode == HttpURLConnection.HTTP_OK) {
        StringBuilder response = new StringBuilder();
        try (BufferedReader br = new BufferedReader(
            new InputStreamReader(connection.getInputStream(), "UTF-8"))) {
          String responseLine;
          while ((responseLine = br.readLine()) != null) {
            response.append(responseLine.trim());
          }
        }

        JsonObject jsonResponse = JsonParser.parseString(response.toString()).getAsJsonObject();

        if (jsonResponse.has("code") && "VoiceNotFound".equals(jsonResponse.get("code").getAsString())) {
          String errorMessage = jsonResponse.has("message") ?
              jsonResponse.get("message").getAsString() : "Voice not found";
          System.out.println("Voice not found: " + voiceName);
          System.out.println("Error message: " + errorMessage);
          return;
        }

        JsonObject outputObj = jsonResponse.getAsJsonObject("output");
        System.out.println("Successfully queried voice information:");
        System.out.println("  Voice Name: " + outputObj.get("voice").getAsString());
        System.out.println("  Creation Time: " + outputObj.get("gmt_create").getAsString());
        System.out.println("  Modification Time: " + outputObj.get("gmt_modified").getAsString());
        System.out.println("  Language: " + outputObj.get("language").getAsString());
        System.out.println("  Preview Text: " + outputObj.get("preview_text").getAsString());
        System.out.println("  Model: " + outputObj.get("target_model").getAsString());
        System.out.println("  Voice Description: " + outputObj.get("voice_prompt").getAsString());
      } else {
        StringBuilder errorResponse = new StringBuilder();
        try (BufferedReader br = new BufferedReader(
            new InputStreamReader(connection.getErrorStream(), "UTF-8"))) {
          String responseLine;
          while ((responseLine = br.readLine()) != null) {
            errorResponse.append(responseLine.trim());
          }
        }
        System.out.println("Request failed with status code: " + responseCode);
        System.out.println("Error response: " + errorResponse.toString());
      }
    } catch (Exception e) {
      System.err.println("An error occurred during the request: " + e.getMessage());
      e.printStackTrace();
    } finally {
      if (connection != null) {
        connection.disconnect();
      }
    }
  }
}

List all voices (paginated):

import com.google.gson.Gson;
import com.google.gson.JsonArray;
import com.google.gson.JsonObject;
import java.io.*;
import java.net.HttpURLConnection;
import java.net.URL;

public class Main {
  public static void main(String[] args) {
    String apiKey = System.getenv("DASHSCOPE_API_KEY");
    String apiUrl = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization";

    String jsonPayload =
        "{"
            + "\"model\": \"qwen-voice-design\","
            + "\"input\": {"
            +     "\"action\": \"list\","
            +     "\"page_size\": 10,"
            +     "\"page_index\": 0"
            + "}"
            + "}";

    try {
      HttpURLConnection con = (HttpURLConnection) new URL(apiUrl).openConnection();
      con.setRequestMethod("POST");
      con.setRequestProperty("Authorization", "Bearer " + apiKey);
      con.setRequestProperty("Content-Type", "application/json");
      con.setDoOutput(true);

      try (OutputStream os = con.getOutputStream()) {
        os.write(jsonPayload.getBytes("UTF-8"));
      }

      int status = con.getResponseCode();
      BufferedReader br = new BufferedReader(new InputStreamReader(
          status >= 200 && status < 300 ? con.getInputStream() : con.getErrorStream(), "UTF-8"));

      StringBuilder response = new StringBuilder();
      String line;
      while ((line = br.readLine()) != null) {
        response.append(line);
      }
      br.close();

      System.out.println("HTTP Status Code: " + status);

      if (status == 200) {
        Gson gson = new Gson();
        JsonObject jsonObj = gson.fromJson(response.toString(), JsonObject.class);
        JsonArray voiceList = jsonObj.getAsJsonObject("output").getAsJsonArray("voice_list");

        System.out.println("\nQueried voice list:");
        for (int i = 0; i < voiceList.size(); i++) {
          JsonObject voiceItem = voiceList.get(i).getAsJsonObject();
          String voice = voiceItem.get("voice").getAsString();
          String gmtCreate = voiceItem.get("gmt_create").getAsString();
          String targetModel = voiceItem.get("target_model").getAsString();
          System.out.printf("- Voice: %s  Creation Time: %s  Model: %s\n",
              voice, gmtCreate, targetModel);
        }
      }
    } catch (Exception e) {
      e.printStackTrace();
    }
  }
}

Delete a voice

cURL
Python
Java

curl -X POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
  "model": "qwen-voice-design",
  "input": {
    "action": "delete",
    "voice": "<your-voice-name>"
  }
}'

import requests
import os

def delete_voice(voice_name):
  """Delete a voice and release the quota."""
  api_key = os.getenv("DASHSCOPE_API_KEY")

  response = requests.post(
    "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization",
    headers={
      "Authorization": f"Bearer {api_key}",
      "Content-Type": "application/json"
    },
    json={
      "model": "qwen-voice-design",
      "input": {
        "action": "delete",
        "voice": voice_name
      }
    }
  )

  if response.status_code == 200:
    print(f"Deleted: {voice_name}")
    return True
  else:
    print(f"Request failed ({response.status_code}): {response.text}")
    return False

if __name__ == "__main__":
  delete_voice("<your-voice-name>")

import com.google.gson.JsonObject;
import com.google.gson.JsonParser;
import java.io.*;
import java.net.HttpURLConnection;
import java.net.URL;

public class Main {

  public static void main(String[] args) {
    Main example = new Main();
    String voiceName = "<your-voice-name>";  // Replace with the actual voice name
    System.out.println("Deleting voice: " + voiceName);
    example.deleteVoice(voiceName);
  }

  public void deleteVoice(String voiceName) {
    String apiKey = System.getenv("DASHSCOPE_API_KEY");

    String jsonBody = "{\n" +
        "    \"model\": \"qwen-voice-design\",\n" +
        "    \"input\": {\n" +
        "        \"action\": \"delete\",\n" +
        "        \"voice\": \"" + voiceName + "\"\n" +
        "    }\n" +
        "}";

    HttpURLConnection connection = null;
    try {
      URL url = new URL("https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization");
      connection = (HttpURLConnection) url.openConnection();
      connection.setRequestMethod("POST");
      connection.setRequestProperty("Authorization", "Bearer " + apiKey);
      connection.setRequestProperty("Content-Type", "application/json");
      connection.setDoOutput(true);
      connection.setDoInput(true);

      try (OutputStream os = connection.getOutputStream()) {
        byte[] input = jsonBody.getBytes("UTF-8");
        os.write(input, 0, input.length);
        os.flush();
      }

      int responseCode = connection.getResponseCode();
      if (responseCode == HttpURLConnection.HTTP_OK) {
        StringBuilder response = new StringBuilder();
        try (BufferedReader br = new BufferedReader(
            new InputStreamReader(connection.getInputStream(), "UTF-8"))) {
          String responseLine;
          while ((responseLine = br.readLine()) != null) {
            response.append(responseLine.trim());
          }
        }

        JsonObject jsonResponse = JsonParser.parseString(response.toString()).getAsJsonObject();

        if (jsonResponse.has("code") && jsonResponse.get("code").getAsString().contains("VoiceNotFound")) {
          String errorMessage = jsonResponse.has("message") ?
              jsonResponse.get("message").getAsString() : "Voice not found";
          System.out.println("Voice does not exist: " + voiceName);
          System.out.println("Error message: " + errorMessage);
        } else if (jsonResponse.has("usage")) {
          System.out.println("Voice deleted successfully: " + voiceName);
          String requestId = jsonResponse.has("request_id") ?
              jsonResponse.get("request_id").getAsString() : "N/A";
          System.out.println("Request ID: " + requestId);
        } else {
          System.out.println("Unexpected response format: " + response.toString());
        }
      } else {
        StringBuilder errorResponse = new StringBuilder();
        try (BufferedReader br = new BufferedReader(
            new InputStreamReader(connection.getErrorStream(), "UTF-8"))) {
          String responseLine;
          while ((responseLine = br.readLine()) != null) {
            errorResponse.append(responseLine.trim());
          }
        }
        System.out.println("Request failed with status code: " + responseCode);
        System.out.println("Error response: " + errorResponse.toString());
      }
    } catch (Exception e) {
      System.err.println("An error occurred during the request: " + e.getMessage());
      e.printStackTrace();
    } finally {
      if (connection != null) {
        connection.disconnect();
      }
    }
  }
}

Voice quota and cleanup

Account limit: 1,000 voices per account. Check the total_count field in the List voices response.
Automatic cleanup: Voices unused for synthesis in the past year are deleted automatically.

​Prerequisites

​API reference

​Common request details

​Create a voice

​List voices

​Query a voice

​Delete a voice

​Sample code

​Create a voice and preview

​Use a custom voice for synthesis

​Bidirectional streaming (real-time)

​Non-streaming and unidirectional streaming

​Query voices

​Delete a voice

​Voice quota and cleanup

Prerequisites

API reference

Common request details

Create a voice

List voices

Query a voice

Delete a voice

Sample code

Create a voice and preview

Use a custom voice for synthesis

Bidirectional streaming (real-time)

Non-streaming and unidirectional streaming

Query voices

Delete a voice

Voice quota and cleanup