curl -X POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen-voice-enrollment",
"input": {
"action": "create",
"target_model": "qwen3-tts-vc-realtime-2026-01-15",
"preferred_name": "guanyu",
"audio": {
"data": "https://your-audio-url.wav"
}
}
}'{
"output": {
"voice": "qwen-omni-vc-guanyu-voice-20250812105009984-838b",
"target_model": "qwen3.5-omni-plus-realtime",
"fallback_mode": false,
"fallback_reason": "<string>"
},
"usage": {
"count": 1
},
"request_id": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
}Body
application/jsonFixed to qwen-voice-enrollment.
Show child attributes
Show child attributes
Fixed to create.
Target model for the cloned voice. Must match the model in subsequent API calls.
Audio data for cloning.
Show child attributes
Show child attributes
Data URL (Base64): data:<mediatype>;base64,<data> where <mediatype> is audio/wav, audio/mpeg, or audio/mp4. Keep encoded data under 10 MB. Audio URL: publicly accessible URL (no auth required).
Voice name keyword (digits, letters, underscores, max 16 characters). Appears in the generated voice name. Example: guanyu produces qwen-tts-vc-guanyu-voice-20250812105009984-838b.
Text matching the audio content. The server validates the match and returns Audio.PreprocessError if significantly different.
Audio language code. Must match the audio language if specified.
Response
Show child attributes
Show child attributes
Generated voice name. Pass as the voice parameter in subsequent Qwen TTS or Realtime Multimodal API calls.
Target model bound to this voice.
true if the voice was created in degraded mode due to poor audio quality or text mismatch.
Reason for degradation. Possible values: no_merged_segments, no_valid_asr_segments, etc. Only returned when fallback_mode is true.
Request ID for troubleshooting.