Skip to main content
Qwen-Livetranslate

Audio and video translation

LiveTranslate API reference

Translate audio and video through the OpenAI-compatible chat completions endpoint. Both streaming and non-streaming calls are supported. User guide: For tutorials and complete examples, see Audio and video translation.
The DashScope interface is not supported.

Endpoints

SDK base_urlHTTP endpoint
https://dashscope-intl.aliyuncs.com/compatible-mode/v1POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions

Request body

Required parameters

ParameterTypeDescription
modelstringModel name: qwen3-livetranslate-flash or qwen3-livetranslate-flash-2025-12-01.
messagesarrayA single user message. See Message object.
streambooleanWhether to stream the response. Use true for real-time translation progress, false for simpler integration.
translation_optionsobjectTranslation settings. Non-standard OpenAI parameter: pass in extra_body for the Python SDK, or at the top level for Node.js/HTTP. See Translation options.

Optional parameters

ParameterTypeDefaultDescription
modalitiesarray["text"]Output mode. Use ["text", "audio"] for text and audio, or ["text"] for text only.
audioobject-Audio output settings. Required when modalities includes "audio". See Audio output options.
stream_optionsobject-Streaming settings. See Stream options.
max_tokensintegerModel maximumMaximum tokens to generate. Output is truncated if exceeded.
seedinteger-Random seed for reproducible output. Range: [0, 2^31-1].

Sampling parameters

Keep these at their defaults for best translation accuracy.
ParameterTypeDefaultRangeNotes
temperaturefloat0.000001[0, 2)Controls output diversity.
top_pfloat0.8(0, 1.0]Nucleus sampling threshold.
presence_penaltyfloat0[-2.0, 2.0]Reduces repetition when positive.
top_kinteger1>= 0Candidate set size. Disabled when None or > 100 (top_p takes effect instead). Non-standard OpenAI parameter: use extra_body in the Python SDK.
repetition_penaltyfloat1.05> 0Penalizes repeated sequences. Non-standard OpenAI parameter: use extra_body in the Python SDK.

Message object

The messages array must contain exactly one user message. content array items:
FieldTypeRequiredDescription
typestringYesinput_audio for audio, video_url for video.
input_audioobjectWhen type is input_audioAudio input. See below.
video_urlobjectWhen type is video_urlVideo input. See below.
input_audio object:
FieldTypeRequiredDescription
datastringYesAudio file URL or Base64 data URL. For local files, see Send a Base64-encoded local file.
formatstringYesAudio format, such as mp3 or wav.
video_url object:
FieldTypeRequiredDescription
urlstringYesPublic video URL or Base64 data URL. For local files, see Send a Base64-encoded local file.

Translation options

FieldTypeRequiredDescription
source_langstringNoSource language (full English name). See Supported languages. If omitted, the model detects it automatically.
target_langstringYesTarget language (full English name). See Supported languages.
extra_body={"translation_options": {"source_lang": "zh", "target_lang": "en"}}

Audio output options

Required when modalities is ["text", "audio"].
FieldTypeRequiredDescription
voicestringYesOutput voice.
formatstringYesOutput audio format. Only wav is supported.

Stream options

FieldTypeDefaultDescription
include_usagebooleanfalseWhen true, the final chunk includes token usage details.

Response

The API streams chat.completion.chunk objects in three categories: text, audio, and token usage.

Text chunk

Contains incremental translated text in choices[0].delta.content:
{
  "id": "chatcmpl-c22a54b8-40cc-4a1d-988b-f84cdf86868f",
  "choices": [
    {
      "delta": {
        "content": " of",
        "role": null,
        "audio": null
      },
      "finish_reason": null,
      "index": 0
    }
  ],
  "created": 1764755440,
  "model": "qwen3-livetranslate-flash",
  "object": "chat.completion.chunk"
}

Audio chunk

Contains incremental Base64-encoded audio in choices[0].delta.audio.data:
{
  "id": "chatcmpl-c22a54b8-40cc-4a1d-988b-f84cdf86868f",
  "choices": [
    {
      "delta": {
        "content": null,
        "role": null,
        "audio": {
          "data": "///+//7////+////////////AAAAAAAAAAABA......",
          "expires_at": 1764755440,
          "id": "audio_c22a54b8-40cc-4a1d-988b-f84cdf86868f"
        }
      },
      "finish_reason": null,
      "index": 0
    }
  ],
  "created": 1764755440,
  "model": "qwen3-livetranslate-flash",
  "object": "chat.completion.chunk"
}

Token usage chunk

Sent last when include_usage is true. The choices array is empty. usage holds the token breakdown:
{
  "id": "chatcmpl-c22a54b8-40cc-4a1d-988b-f84cdf86868f",
  "choices": [],
  "created": 1764755440,
  "model": "qwen3-livetranslate-flash",
  "object": "chat.completion.chunk",
  "usage": {
    "completion_tokens": 242,
    "prompt_tokens": 415,
    "total_tokens": 657,
    "completion_tokens_details": {
      "accepted_prediction_tokens": null,
      "audio_tokens": 191,
      "reasoning_tokens": null,
      "rejected_prediction_tokens": null,
      "text_tokens": 51
    },
    "prompt_tokens_details": {
      "audio_tokens": 415,
      "cached_tokens": null,
      "text_tokens": 0,
      "video_tokens": null
    }
  }
}
For video input, prompt_tokens_details.audio_tokens includes audio tokens from the video. video_tokens reports the video-specific count.

Response fields

FieldTypeDescription
idstringRequest identifier. Same across all chunks.
choicesarrayGenerated content. Empty in the final usage chunk.
choices[].delta.contentstringIncremental translated text. null in audio chunks.
choices[].delta.audioobjectIncremental audio data. null in text chunks.
choices[].delta.audio.datastringBase64-encoded audio segment.
choices[].delta.audio.idstringOutput audio identifier.
choices[].delta.audio.expires_atintegerTimestamp when the request was created.
choices[].delta.rolestringMessage role. Present only in the first chunk.
choices[].finish_reasonstringstop when complete, length when truncated by max_tokens, null while in progress.
choices[].indexintegerAlways 0.
createdintegerRequest Unix timestamp. Same across all chunks.
modelstringModel used.
objectstringAlways chat.completion.chunk.
usageobjectToken usage. Present only in the final chunk when include_usage is true.
usage.prompt_tokensintegerTotal input tokens.
usage.completion_tokensintegerTotal output tokens.
usage.total_tokensintegerSum of prompt_tokens and completion_tokens.
usage.completion_tokens_details.audio_tokensintegerOutput audio tokens.
usage.completion_tokens_details.text_tokensintegerOutput text tokens.
usage.prompt_tokens_details.audio_tokensintegerInput audio tokens. For video input, includes audio tokens from the video.
usage.prompt_tokens_details.text_tokensintegerInput text tokens. Always 0.
usage.prompt_tokens_details.video_tokensintegerInput video tokens. Present only for video input.

Fields fixed to null

These fields exist for OpenAI compatibility but always return null: reasoning_content, function_call, refusal, tool_calls, logprobs, service_tier, system_fingerprint

References