Skip to main content
Realtime

Voice cloning

Clone voice from audio

Clone a voice from 10-20 seconds of audio. The API returns a voice identifier instantly -- no training required. For an overview of how voice cloning works, model selection, and end-to-end examples, see Voice cloning guide.

Prerequisites

API reference

All three endpoints share the same base URL and headers. Base URL
POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization
Common request headers
HeaderTypeRequiredDescription
AuthorizationstringYesBearer $DASHSCOPE_API_KEY
Content-TypestringYesapplication/json

Create voice

Upload audio to create a cloned voice.

Request body

The model parameter is always qwen-voice-enrollment. The target_model must match the speech synthesis model you use -- otherwise synthesis fails.
{
  "model": "qwen-voice-enrollment",
  "input": {
    "action": "create",
    "target_model": "qwen3-tts-vc-realtime-2026-01-15",
    "preferred_name": "guanyu",
    "audio": {
      "data": "https://xxx.wav"
    },
    "text": "Optional. Text matching the audio content.",
    "language": "Optional. Language code, e.g. zh."
  }
}

Request parameters

ParameterTypeDefaultRequiredDescription
modelstring-YesVoice cloning model. Fixed as qwen-voice-enrollment.
actionstring-YesOperation type. Fixed as create.
target_modelstring-YesSpeech synthesis model for the cloned voice. Supported: qwen3-tts-vc-realtime-2026-01-15, qwen3-tts-vc-realtime-2025-11-27, qwen3-tts-vc-2026-01-22. Must match the model in your synthesis calls.
preferred_namestring-YesVoice name (up to 16 characters: digits, letters, underscores). Appears in the generated voice name. Example: guanyu produces qwen-tts-vc-guanyu-voice-20250812105009984-838b.
audio.datastring-YesAudio for cloning. Two formats: Data URL -- data:<mediatype>;base64,<data> (<mediatype> = audio/wav, audio/mpeg, or audio/mp4). Keep encoded data under 10 MB. Audio URL -- Publicly accessible URL (no auth required).
textstring-NoText matching the audio content. The server validates the match and returns Audio.PreprocessError if significantly different.
languagestring-NoAudio language. Supported: zh, en, de, it, pt, es, ja, ko, fr, ru. Must match the audio if specified.

Response

ParameterTypeDescription
voicestringGenerated voice name. Pass this as the voice parameter in synthesis calls.
target_modelstringSpeech synthesis model bound to this voice.
request_idstringUnique request identifier.
countintegerBilled voice creation operations. Always 1 for create requests. Cost: count x $0.01.

Sample code

  • cURL
  • Python
  • Java
curl -X POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
  "model": "qwen-voice-enrollment",
  "input": {
    "action": "create",
    "target_model": "qwen3-tts-vc-realtime-2026-01-15",
    "preferred_name": "guanyu",
    "audio": {
      "data": "https://xxx.wav"
    }
  }
}'

List voices

List your cloned voices with pagination.

Request body

The model parameter is always qwen-voice-enrollment. Do not modify this value.
{
  "model": "qwen-voice-enrollment",
  "input": {
    "action": "list",
    "page_size": 2,
    "page_index": 0
  }
}

Request parameters

ParameterTypeDefaultRequiredDescription
modelstring-YesVoice cloning model. Fixed as qwen-voice-enrollment.
actionstring-YesOperation type. Fixed as list.
page_indexinteger0NoPage number, starting from 0. Range: 0 -- 1000000.
page_sizeinteger10NoResults per page. Range: 0 -- 1000000.

Response

ParameterTypeDescription
voicestringVoice name. Pass this as the voice parameter in synthesis calls.
gmt_createstringVoice creation timestamp.
target_modelstringSpeech synthesis model bound to this voice.
request_idstringUnique request identifier.
countintegerAlways 0. Listing voices is free.

Sample code

  • cURL
  • Python
  • Java
curl --location --request POST 'https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization' \
--header 'Authorization: Bearer $DASHSCOPE_API_KEY' \
--header 'Content-Type: application/json' \
--data '{
  "model": "qwen-voice-enrollment",
  "input": {
    "action": "list",
    "page_size": 10,
    "page_index": 0
  }
}'

Delete a voice

Delete a voice to free up quota.

Request body

The model parameter is always qwen-voice-enrollment. Do not modify this value.
{
  "model": "qwen-voice-enrollment",
  "input": {
    "action": "delete",
    "voice": "yourVoice"
  }
}

Request parameters

ParameterTypeDefaultRequiredDescription
modelstring-YesVoice cloning model. Fixed as qwen-voice-enrollment.
actionstring-YesOperation type. Fixed as delete.
voicestring-YesThe voice to delete.

Response

ParameterTypeDescription
request_idstringUnique request identifier.
countintegerAlways 0. Deleting voices is free.

Sample code

  • cURL
  • Python
  • Java
curl --location --request POST 'https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization' \
--header 'Authorization: Bearer $DASHSCOPE_API_KEY' \
--header 'Content-Type: application/json' \
--data '{
  "model": "qwen-voice-enrollment",
  "input": {
    "action": "delete",
    "voice": "yourVoice"
  }
}'

Speech synthesis

To use cloned voices for synthesis, see the end-to-end examples or the full docs:

Voice quota and retention

  • Account limit: 1,000 voices per account. Call List voices to check your count.
  • Automatic cleanup: Voices unused for over one year are automatically deleted.
You are responsible for the ownership and legal rights to any voice you provide. Read the Terms of Service before using this API.