curl -X POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "voice-enrollment",
"input": {
"action": "create_voice",
"target_model": "cosyvoice-v3-plus",
"prefix": "myvoice",
"url": "https://your-audio-url.wav",
"language_hints": ["zh"]
}
}'{
"output": {
"voice_id": "cosyvoice-v3-plus-myvoice-xxxxxx"
},
"usage": {
"count": 1
},
"request_id": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
}Body
application/jsonFixed to voice-enrollment.
Show child attributes
Show child attributes
Fixed to create_voice.
Speech synthesis model for the cloned voice. Must match the model in subsequent synthesis calls. Values: cosyvoice-v3-plus, cosyvoice-v3-flash, cosyvoice-v3-plus, cosyvoice-v3-flash.
Voice name prefix. Alphanumeric only, max 10 characters. Generated voice name format: {target_model}-{prefix}-{unique_id}.
Audio file URL for cloning. Must be publicly accessible.
Helps the model identify the audio language for more accurate cloning. Only the first element is used. If the specified language does not match the audio, the system auto-detects. Supported: zh, en, fr, de, ja, ko, ru (v3-plus); adds pt, es, it, th, id, vi for v3-flash.
[
"zh"
]Max audio duration (seconds) used for cloning after preprocessing. Longer duration yields better results. Supported by cosyvoice-v3-plus and v3-flash only.
Enable audio preprocessing (noise reduction, enhancement, volume normalization). Recommended for noisy audio; disable for clean audio to preserve voice fidelity. Supported by cosyvoice-v3-plus and v3-flash only.
Response
Show child attributes
Show child attributes
Generated voice ID. Pass as the voice parameter in CosyVoice synthesis calls.
Request ID for troubleshooting.