cURL
model is the design model (always qwen-voice-design). target_model is the synthesis model that drives the created voice. The target_model must match the model in subsequent synthesis calls — mismatched models cause failures.Create a custom voice from a text description and return preview audio.
curl --request POST \
--url 'https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization' \
--header 'Authorization: Bearer <YOUR_API_KEY>' \
--header 'Content-Type: application/json' \
--data '
{
"model": "qwen-voice-design",
"input": {
"action": "create",
"target_model": "qwen3-tts-vd-realtime-2026-01-15",
"voice_prompt": "<string>",
"preview_text": "<string>",
"preferred_name": "<string>",
"language": "zh"
},
"parameters": {
"sample_rate": 8000,
"response_format": "pcm"
}
}
'{
"output": {
"voice": "qwen-tts-vd-announcer-voice-20251201102800-a1b2",
"preview_audio": {
"data": "{base64_encoded_audio}",
"sample_rate": 24000,
"response_format": "wav"
},
"target_model": "qwen3-tts-vd-realtime-2026-01-15"
},
"usage": {
"count": 1
},
"request_id": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
}model is the design model (always qwen-voice-design). target_model is the synthesis model that drives the created voice. The target_model must match the model in subsequent synthesis calls — mismatched models cause failures.Voice design model. Fixed to qwen-voice-design.
Show child attributes
Operation type. Fixed to create.
Synthesis model for the voice. Must match the model in subsequent synthesis calls. Values: qwen3-tts-vd-realtime-2026-01-15, qwen3-tts-vd-realtime-2025-12-16 (real-time), qwen3-tts-vd-2026-01-26 (non-real-time).
Voice description. Max 2,048 characters. Chinese and English only. See Write effective voice descriptions.
Text for the preview audio. Max 1,024 characters. Must be in a supported language.
Keyword for the voice name (alphanumeric and underscores, max 16 characters). Appears in the generated voice name. Example: announcer produces qwen-tts-vd-announcer-voice-20251201102800-a1b2.
Language code for the generated voice. Must match the preview_text language.
Show child attributes
Generated voice name. Pass this as the voice parameter in the synthesis API.
Synthesis model bound to this voice.
Show child attributes
Voice creations billed. Always 1 for a successful creation ($0.2 per count).
Request ID for troubleshooting.
curl --request POST \
--url 'https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization' \
--header 'Authorization: Bearer <YOUR_API_KEY>' \
--header 'Content-Type: application/json' \
--data '
{
"model": "qwen-voice-design",
"input": {
"action": "create",
"target_model": "qwen3-tts-vd-realtime-2026-01-15",
"voice_prompt": "<string>",
"preview_text": "<string>",
"preferred_name": "<string>",
"language": "zh"
},
"parameters": {
"sample_rate": 8000,
"response_format": "pcm"
}
}
'{
"output": {
"voice": "qwen-tts-vd-announcer-voice-20251201102800-a1b2",
"preview_audio": {
"data": "{base64_encoded_audio}",
"sample_rate": 24000,
"response_format": "wav"
},
"target_model": "qwen3-tts-vd-realtime-2026-01-15"
},
"usage": {
"count": 1
},
"request_id": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
}