The following example shows how to create a CosyVoice voice from a text description and use it for speech synthesis.
CosyVoice Voice Design is available only in the Beijing region (v3.5 series and v3 series).
Step 1: Create a voice from a descriptionCall the API with two parameters: voice_prompt for the voice description, and preview_text for the text read aloud in the preview audio.
Copy
curl -X POST 'https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization' \-H "Authorization: Bearer $DASHSCOPE_API_KEY" \-H "Content-Type: application/json" \-d '{ "model": "voice-enrollment", "input": { "action": "create_voice", "target_model": "cosyvoice-v3.5-plus", "voice_prompt": "A composed middle-aged male announcer with a deep, rich and magnetic voice, a steady speaking speed and clear articulation, is suitable for news broadcasting or documentary commentary.", "preview_text": "Dear listeners, hello everyone. Welcome to the evening news.", "prefix": "announcer" }, "parameters": { "sample_rate": 24000, "response_format": "wav" }}'
Step 2: Synthesize speech with the designed voiceIn the following request, use the voice_id value returned in the previous step.
Copy
# coding=utf-8import dashscopefrom dashscope.audio.tts_v2 import *import os# Get an API key: https://www.alibabacloud.com/help/en/model-studio/get-api-key# If you have not configured the environment variable, replace the following line with your API key: dashscope.api_key = "sk-xxx"dashscope.api_key = os.environ.get('DASHSCOPE_API_KEY')dashscope.base_websocket_api_url='wss://dashscope-intl.aliyuncs.com/api-ws/v1/inference'# Use the same model for voice design and speech synthesismodel = "cosyvoice-v3.5-plus"# Replace the voice parameter with the custom voice generated by voice designvoice = "voice_id"# Instantiate SpeechSynthesizer, passing the model, voice, and other request parameters in the constructorsynthesizer = SpeechSynthesizer(model=model, voice=voice)# Send text for synthesis and get binary audioaudio = synthesizer.call("What is the weather like today?")# Establishing the WebSocket connection is required when sending text for the first time, so the first-package latency includes the connection setup timeprint('[Metric] requestId: {}, first-package latency: {} ms'.format( synthesizer.get_last_request_id(), synthesizer.get_first_package_delay()))# Save the audio to a local filewith open('output.mp3', 'wb') as f: f.write(audio)
Use the returned voice name with the matching synthesis model. The model in synthesis must match the target_model used during voice creation.
cURL
Python
Java
Copy
curl -X POST 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation' \-H "Authorization: Bearer $DASHSCOPE_API_KEY" \-H 'Content-Type: application/json' \-d '{ "model": "qwen3-tts-vd-2026-01-26", "input": { "text": "Welcome to our audiobook. Let me take you on a journey through the wonders of nature.", "voice": "VOICE_NAME" }}'
Replace VOICE_NAME with the voice name returned from the create step. The response contains an output.audio.url field with a download link (valid for 24 hours).
Copy
import requestsimport osvoice_name = "VOICE_NAME" # <-- from the create stepresponse = requests.post( "https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation", headers={ "Authorization": f"Bearer {os.getenv('DASHSCOPE_API_KEY')}", "Content-Type": "application/json" }, json={ "model": "qwen3-tts-vd-2026-01-26", "input": { "text": "Welcome to our audiobook. " "Let me take you on a journey through the wonders of nature.", "voice": voice_name } }, timeout=60)result = response.json()audio_url = result["output"]["audio"]["url"]print(f"Audio URL: {audio_url}")
Copy
import com.google.gson.Gson;import com.google.gson.JsonObject;import java.io.*;import java.net.HttpURLConnection;import java.net.URL;public class VoiceDesignSynthesize { public static void main(String[] args) { String apiKey = System.getenv("DASHSCOPE_API_KEY"); String voiceName = "VOICE_NAME"; // <-- from the create step try { String body = "{" + "\"model\": \"qwen3-tts-vd-2026-01-26\"," + "\"input\": {" + "\"text\": \"Welcome to our audiobook. " + "Let me take you on a journey through the wonders of nature.\"," + "\"voice\": \"" + voiceName + "\"" + "}" + "}"; HttpURLConnection conn = (HttpURLConnection) new URL( "https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation" ).openConnection(); conn.setRequestMethod("POST"); conn.setRequestProperty("Authorization", "Bearer " + apiKey); conn.setRequestProperty("Content-Type", "application/json"); conn.setDoOutput(true); try (OutputStream os = conn.getOutputStream()) { os.write(body.getBytes("UTF-8")); } int status = conn.getResponseCode(); InputStream is = (status >= 200 && status < 300) ? conn.getInputStream() : conn.getErrorStream(); StringBuilder sb = new StringBuilder(); try (BufferedReader br = new BufferedReader(new InputStreamReader(is, "UTF-8"))) { String line; while ((line = br.readLine()) != null) { sb.append(line); } } if (status == 200) { Gson gson = new Gson(); JsonObject result = gson.fromJson(sb.toString(), JsonObject.class); String audioUrl = result.getAsJsonObject("output") .getAsJsonObject("audio").get("url").getAsString(); System.out.println("Audio URL: " + audioUrl); // Download the audio file try (InputStream in = new URL(audioUrl).openStream(); FileOutputStream out = new FileOutputStream("synthesis_output.wav")) { byte[] buffer = new byte[4096]; int bytesRead; while ((bytesRead = in.read(buffer)) != -1) { out.write(buffer, 0, bytesRead); } } System.out.println("Audio saved: synthesis_output.wav"); } else { System.err.println("Error " + status + ": " + sb.toString()); } } catch (Exception e) { e.printStackTrace(); } }}
Total voice limit: Each Qwen Cloud account has a separate limit of 1,000 custom voices for CosyVoice and 1,000 for Qwen-TTS. The two quotas are counted independently.Automatic cleanup: If a voice isn't used in any speech synthesis request for one year, the system automatically deletes it.