Create a cloned voice

POST

/services/audio/tts/customization

cURL

curl -X POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
  "model": "qwen-voice-enrollment",
  "input": {
    "action": "create",
    "target_model": "qwen3-tts-vc-realtime-2026-01-15",
    "preferred_name": "guanyu",
    "audio": {
      "data": "https://your-audio-url.wav"
    }
  }
}'

{
  "output": {
    "voice": "qwen-omni-vc-guanyu-voice-20250812105009984-838b",
    "target_model": "qwen3.5-omni-plus-realtime",
    "fallback_mode": false,
    "fallback_reason": "<string>"
  },
  "usage": {
    "count": 1
  },
  "request_id": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
}

Authorizations

string

header

required

DashScope API key. Get one at API key.

Body

application/json

enum<string>

required

Fixed to qwen-voice-enrollment.

Available options:qwen-voice-enrollment

Example:qwen-voice-enrollment

object

required

Show child attributes

enum<string>

required

Fixed to create.

Available options:create

Example:create

enum<string>

required

Target model for the cloned voice. Must match the model in subsequent API calls.

Available options:qwen3.5-omni-plus-realtime,qwen3.5-omni-flash-realtime,qwen3-tts-vc-realtime-2026-01-15,qwen3-tts-vc-realtime-2025-11-27,qwen3-tts-vc-2026-01-22

Example:qwen3.5-omni-plus-realtime

object

required

Audio data for cloning.

Show child attributes

string

required

Data URL (Base64): data:<mediatype>;base64,<data> where <mediatype> is audio/wav, audio/mpeg, or audio/mp4. Keep encoded data under 10 MB. Audio URL: publicly accessible URL (no auth required).

Example:https://your-audio-url.wav

string

Voice name keyword (digits, letters, underscores, max 16 characters). Appears in the generated voice name. Example: guanyu produces qwen-tts-vc-guanyu-voice-20250812105009984-838b.

Example:guanyu

Required range:length <= 16pattern: ^[a-zA-Z0-9_]+$

string

Text matching the audio content. The server validates the match and returns Audio.PreprocessError if significantly different.

Example:Hello, this is a sample text for voice cloning.

enum<string>

Audio language code. Must match the audio language if specified.

Available options:zh,en,de,it,pt,es,ja,ko,fr,ru

Example:zh

Response

200-application/json

object

Show child attributes

string

Generated voice name. Pass as the voice parameter in subsequent Qwen TTS or Realtime Multimodal API calls.

Example:qwen-omni-vc-guanyu-voice-20250812105009984-838b

string

Target model bound to this voice.

Example:qwen3.5-omni-plus-realtime

boolean

true if the voice was created in degraded mode due to poor audio quality or text mismatch.

Example:false

string

Reason for degradation. Possible values: no_merged_segments, no_valid_asr_segments, etc. Only returned when fallback_mode is true.

object

Show child attributes

integer

Billed voice creation operations. Always 1.

Example:1

string

Request ID for troubleshooting.

Example:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx