curl -X POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "voice-enrollment",
"input": {
"action": "create_voice",
"target_model": "cosyvoice-v3-plus",
"voice_prompt": "A composed middle-aged male announcer with a deep, rich voice",
"preview_text": "Dear listeners, hello everyone",
"prefix": "announcer",
"language_hints": ["en"]
},
"parameters": {
"sample_rate": 24000,
"response_format": "wav"
}
}'{
"output": {
"voice_id": "cosyvoice-v3-plus-vd-announcer-xxxxxx",
"preview_audio": {
"data": "{base64_encoded_audio}",
"sample_rate": 24000,
"response_format": "wav"
},
"target_model": "cosyvoice-v3-plus"
},
"usage": {
"count": 1
},
"request_id": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
}Body
application/jsonFixed to voice-enrollment (shared with voice cloning).
Show child attributes
Show child attributes
Fixed to create_voice (same as voice cloning).
Speech synthesis model for the designed voice. Must match the model in subsequent synthesis calls.
Voice description. Chinese and English only. Max 500 characters.
Text for the preview audio. Chinese and English only. Max 200 characters.
Voice name prefix. Alphanumeric only, max 10 characters. Generated voice name format: {target_model}-vd-{prefix}-{unique_id}.
Specifies the language tendency of the generated voice. Must match the preview_text language. Only the first element is used. Supported: zh, en.
[
"en"
]Response
Show child attributes
Show child attributes
Generated voice ID. Pass as the voice parameter in CosyVoice synthesis calls.
Speech synthesis model bound to this voice.
Request ID for troubleshooting.