LiveTranslate API reference
Translate audio and video through the OpenAI-compatible chat completions endpoint. Both streaming and non-streaming calls are supported.
User guide: For tutorials and complete examples, see Audio and video translation.
Keep these at their defaults for best translation accuracy.
The
Required when
The API streams
Contains incremental translated text in
Contains incremental Base64-encoded audio in
Sent last when
These fields exist for OpenAI compatibility but always return
The DashScope interface is not supported.
Endpoints
SDK base_url | HTTP endpoint |
|---|---|
https://dashscope-intl.aliyuncs.com/compatible-mode/v1 | POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions |
Request body
Required parameters
| Parameter | Type | Description |
|---|---|---|
model | string | Model name: qwen3-livetranslate-flash or qwen3-livetranslate-flash-2025-12-01. |
messages | array | A single user message. See Message object. |
stream | boolean | Whether to stream the response. Use true for real-time translation progress, false for simpler integration. |
translation_options | object | Translation settings. Non-standard OpenAI parameter: pass in extra_body for the Python SDK, or at the top level for Node.js/HTTP. See Translation options. |
Optional parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
modalities | array | ["text"] | Output mode. Use ["text", "audio"] for text and audio, or ["text"] for text only. |
audio | object | - | Audio output settings. Required when modalities includes "audio". See Audio output options. |
stream_options | object | - | Streaming settings. See Stream options. |
max_tokens | integer | Model maximum | Maximum tokens to generate. Output is truncated if exceeded. |
seed | integer | - | Random seed for reproducible output. Range: [0, 2^31-1]. |
Sampling parameters
Keep these at their defaults for best translation accuracy.
| Parameter | Type | Default | Range | Notes |
|---|---|---|---|---|
temperature | float | 0.000001 | [0, 2) | Controls output diversity. |
top_p | float | 0.8 | (0, 1.0] | Nucleus sampling threshold. |
presence_penalty | float | 0 | [-2.0, 2.0] | Reduces repetition when positive. |
top_k | integer | 1 | >= 0 | Candidate set size. Disabled when None or > 100 (top_p takes effect instead). Non-standard OpenAI parameter: use extra_body in the Python SDK. |
repetition_penalty | float | 1.05 | > 0 | Penalizes repeated sequences. Non-standard OpenAI parameter: use extra_body in the Python SDK. |
Message object
The messages array must contain exactly one user message.
content array items:
| Field | Type | Required | Description |
|---|---|---|---|
type | string | Yes | input_audio for audio, video_url for video. |
input_audio | object | When type is input_audio | Audio input. See below. |
video_url | object | When type is video_url | Video input. See below. |
input_audio object:
| Field | Type | Required | Description |
|---|---|---|---|
data | string | Yes | Audio file URL or Base64 data URL. For local files, see Send a Base64-encoded local file. |
format | string | Yes | Audio format, such as mp3 or wav. |
video_url object:
| Field | Type | Required | Description |
|---|---|---|---|
url | string | Yes | Public video URL or Base64 data URL. For local files, see Send a Base64-encoded local file. |
Translation options
| Field | Type | Required | Description |
|---|---|---|---|
source_lang | string | No | Source language (full English name). See Supported languages. If omitted, the model detects it automatically. |
target_lang | string | Yes | Target language (full English name). See Supported languages. |
Audio output options
Required when modalities is ["text", "audio"].
| Field | Type | Required | Description |
|---|---|---|---|
voice | string | Yes | Output voice. |
format | string | Yes | Output audio format. Only wav is supported. |
Stream options
| Field | Type | Default | Description |
|---|---|---|---|
include_usage | boolean | false | When true, the final chunk includes token usage details. |
Response
The API streams chat.completion.chunk objects in three categories: text, audio, and token usage.
Text chunk
Contains incremental translated text in choices[0].delta.content:
Audio chunk
Contains incremental Base64-encoded audio in choices[0].delta.audio.data:
Token usage chunk
Sent last when include_usage is true. The choices array is empty. usage holds the token breakdown:
For video input,
prompt_tokens_details.audio_tokens includes audio tokens from the video. video_tokens reports the video-specific count.Response fields
| Field | Type | Description |
|---|---|---|
id | string | Request identifier. Same across all chunks. |
choices | array | Generated content. Empty in the final usage chunk. |
choices[].delta.content | string | Incremental translated text. null in audio chunks. |
choices[].delta.audio | object | Incremental audio data. null in text chunks. |
choices[].delta.audio.data | string | Base64-encoded audio segment. |
choices[].delta.audio.id | string | Output audio identifier. |
choices[].delta.audio.expires_at | integer | Timestamp when the request was created. |
choices[].delta.role | string | Message role. Present only in the first chunk. |
choices[].finish_reason | string | stop when complete, length when truncated by max_tokens, null while in progress. |
choices[].index | integer | Always 0. |
created | integer | Request Unix timestamp. Same across all chunks. |
model | string | Model used. |
object | string | Always chat.completion.chunk. |
usage | object | Token usage. Present only in the final chunk when include_usage is true. |
usage.prompt_tokens | integer | Total input tokens. |
usage.completion_tokens | integer | Total output tokens. |
usage.total_tokens | integer | Sum of prompt_tokens and completion_tokens. |
usage.completion_tokens_details.audio_tokens | integer | Output audio tokens. |
usage.completion_tokens_details.text_tokens | integer | Output text tokens. |
usage.prompt_tokens_details.audio_tokens | integer | Input audio tokens. For video input, includes audio tokens from the video. |
usage.prompt_tokens_details.text_tokens | integer | Input text tokens. Always 0. |
usage.prompt_tokens_details.video_tokens | integer | Input video tokens. Present only for video input. |
Fields fixed to null
These fields exist for OpenAI compatibility but always return null:
reasoning_content, function_call, refusal, tool_calls, logprobs, service_tier, system_fingerprint