Translate audio and video through the OpenAI-compatible chat completions endpoint. Both streaming and non-streaming calls are supported.
The DashScope interface is not supported.
Supported models
qwen3-livetranslate-flash
qwen3-livetranslate-flash-2025-12-01
Prerequisites
- Get an API key
- Set the API key as an environment variable
- Install the OpenAI SDK (Python or Node.js)
Endpoints
SDK base_url | HTTP endpoint |
|---|
https://dashscope-intl.aliyuncs.com/compatible-mode/v1 | POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions |
Getting started
These examples translate an audio file and stream back text and audio.
import os
from openai import OpenAI
client = OpenAI(
api_key=os.getenv("DASHSCOPE_API_KEY"),
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
completion = client.chat.completions.create(
model="qwen3-livetranslate-flash",
messages=[
{
"role": "user",
"content": [
{
"type": "input_audio",
"input_audio": {
"data": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250211/tixcef/cherry.wav",
"format": "wav",
},
}
],
}
],
modalities=["text", "audio"],
audio={"voice": "Cherry", "format": "wav"},
stream=True,
stream_options={"include_usage": True},
extra_body={"translation_options": {"source_lang": "zh", "target_lang": "en"}},
)
for chunk in completion:
print(chunk)
import OpenAI from "openai";
const client = new OpenAI({
apiKey: process.env.DASHSCOPE_API_KEY,
baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
});
async function main() {
const completion = await client.chat.completions.create({
model: "qwen3-livetranslate-flash",
messages: [
{
role: "user",
content: [
{
type: "input_audio",
input_audio: {
data: "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250211/tixcef/cherry.wav",
format: "wav",
},
},
],
},
],
modalities: ["text", "audio"],
audio: { voice: "Cherry", format: "wav" },
stream: true,
stream_options: { include_usage: true },
translation_options: { source_lang: "zh", target_lang: "en" },
});
for await (const chunk of completion) {
console.log(JSON.stringify(chunk));
}
}
main();
curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3-livetranslate-flash",
"messages": [
{
"role": "user",
"content": [
{
"type": "input_audio",
"input_audio": {
"data": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250211/tixcef/cherry.wav",
"format": "wav"
}
}
]
}
],
"modalities": ["text", "audio"],
"audio": {
"voice": "Cherry",
"format": "wav"
},
"stream": true,
"stream_options": {
"include_usage": true
},
"translation_options": {
"source_lang": "zh",
"target_lang": "en"
}
}'
To translate video, set the content type to video_url. All other parameters stay the same.
messages = [
{
"role": "user",
"content": [
{
"type": "video_url",
"video_url": {
"url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241115/cqqkru/1.mp4"
},
}
],
},
]
Request body
Required parameters
| Parameter | Type | Description |
|---|
model | string | Model name: qwen3-livetranslate-flash or qwen3-livetranslate-flash-2025-12-01. |
messages | array | A single user message. See Message object. |
stream | boolean | Whether to stream the response. Use true for real-time translation progress, false for simpler integration. |
translation_options | object | Translation settings. Non-standard OpenAI parameter: pass in extra_body for the Python SDK, or at the top level for Node.js/HTTP. See Translation options. |
Optional parameters
| Parameter | Type | Default | Description |
|---|
modalities | array | ["text"] | Output mode. Use ["text", "audio"] for text and audio, or ["text"] for text only. |
audio | object | - | Audio output settings. Required when modalities includes "audio". See Audio output options. |
stream_options | object | - | Streaming settings. See Stream options. |
max_tokens | integer | Model maximum | Maximum tokens to generate. Output is truncated if exceeded. |
seed | integer | - | Random seed for reproducible output. Range: [0, 2^31-1]. |
Sampling parameters
Keep these at their defaults for best translation accuracy.
| Parameter | Type | Default | Range | Notes |
|---|
temperature | float | 0.000001 | [0, 2) | Controls output diversity. |
top_p | float | 0.8 | (0, 1.0] | Nucleus sampling threshold. |
presence_penalty | float | 0 | [-2.0, 2.0] | Reduces repetition when positive. |
top_k | integer | 1 | >= 0 | Candidate set size. Disabled when None or > 100 (top_p takes effect instead). Non-standard OpenAI parameter: use extra_body in the Python SDK. |
repetition_penalty | float | 1.05 | > 0 | Penalizes repeated sequences. Non-standard OpenAI parameter: use extra_body in the Python SDK. |
Message object
The messages array must contain exactly one user message.
content array items:
| Field | Type | Required | Description |
|---|
type | string | Yes | input_audio for audio, video_url for video. |
input_audio | object | When type is input_audio | Audio input. See below. |
video_url | object | When type is video_url | Video input. See below. |
input_audio object:
| Field | Type | Required | Description |
|---|
data | string | Yes | Audio file URL or Base64 data URL. For local files, see Send a Base64-encoded local file. |
format | string | Yes | Audio format, such as mp3 or wav. |
video_url object:
Translation options
| Field | Type | Required | Description |
|---|
source_lang | string | No | Source language (full English name). See Supported languages. If omitted, the model detects it automatically. |
target_lang | string | Yes | Target language (full English name). See Supported languages. |
extra_body={"translation_options": {"source_lang": "zh", "target_lang": "en"}}
Audio output options
Required when modalities is ["text", "audio"].
| Field | Type | Required | Description |
|---|
voice | string | Yes | Output voice. |
format | string | Yes | Output audio format. Only wav is supported. |
Stream options
| Field | Type | Default | Description |
|---|
include_usage | boolean | false | When true, the final chunk includes token usage details. |
Response
The API streams chat.completion.chunk objects in three categories: text, audio, and token usage.
Text chunk
Contains incremental translated text in choices[0].delta.content:
{
"id": "chatcmpl-c22a54b8-40cc-4a1d-988b-f84cdf86868f",
"choices": [
{
"delta": {
"content": " of",
"role": null,
"audio": null
},
"finish_reason": null,
"index": 0
}
],
"created": 1764755440,
"model": "qwen3-livetranslate-flash",
"object": "chat.completion.chunk"
}
Audio chunk
Contains incremental Base64-encoded audio in choices[0].delta.audio.data:
{
"id": "chatcmpl-c22a54b8-40cc-4a1d-988b-f84cdf86868f",
"choices": [
{
"delta": {
"content": null,
"role": null,
"audio": {
"data": "///+//7////+////////////AAAAAAAAAAABA......",
"expires_at": 1764755440,
"id": "audio_c22a54b8-40cc-4a1d-988b-f84cdf86868f"
}
},
"finish_reason": null,
"index": 0
}
],
"created": 1764755440,
"model": "qwen3-livetranslate-flash",
"object": "chat.completion.chunk"
}
Token usage chunk
Sent last when include_usage is true. The choices array is empty. usage holds the token breakdown:
{
"id": "chatcmpl-c22a54b8-40cc-4a1d-988b-f84cdf86868f",
"choices": [],
"created": 1764755440,
"model": "qwen3-livetranslate-flash",
"object": "chat.completion.chunk",
"usage": {
"completion_tokens": 242,
"prompt_tokens": 415,
"total_tokens": 657,
"completion_tokens_details": {
"accepted_prediction_tokens": null,
"audio_tokens": 191,
"reasoning_tokens": null,
"rejected_prediction_tokens": null,
"text_tokens": 51
},
"prompt_tokens_details": {
"audio_tokens": 415,
"cached_tokens": null,
"text_tokens": 0,
"video_tokens": null
}
}
}
For video input, prompt_tokens_details.audio_tokens includes audio tokens from the video. video_tokens reports the video-specific count.
Response fields
| Field | Type | Description |
|---|
id | string | Request identifier. Same across all chunks. |
choices | array | Generated content. Empty in the final usage chunk. |
choices[].delta.content | string | Incremental translated text. null in audio chunks. |
choices[].delta.audio | object | Incremental audio data. null in text chunks. |
choices[].delta.audio.data | string | Base64-encoded audio segment. |
choices[].delta.audio.id | string | Output audio identifier. |
choices[].delta.audio.expires_at | integer | Timestamp when the request was created. |
choices[].delta.role | string | Message role. Present only in the first chunk. |
choices[].finish_reason | string | stop when complete, length when truncated by max_tokens, null while in progress. |
choices[].index | integer | Always 0. |
created | integer | Request Unix timestamp. Same across all chunks. |
model | string | Model used. |
object | string | Always chat.completion.chunk. |
usage | object | Token usage. Present only in the final chunk when include_usage is true. |
usage.prompt_tokens | integer | Total input tokens. |
usage.completion_tokens | integer | Total output tokens. |
usage.total_tokens | integer | Sum of prompt_tokens and completion_tokens. |
usage.completion_tokens_details.audio_tokens | integer | Output audio tokens. |
usage.completion_tokens_details.text_tokens | integer | Output text tokens. |
usage.prompt_tokens_details.audio_tokens | integer | Input audio tokens. For video input, includes audio tokens from the video. |
usage.prompt_tokens_details.text_tokens | integer | Input text tokens. Always 0. |
usage.prompt_tokens_details.video_tokens | integer | Input video tokens. Present only for video input. |
Fields fixed to null
These fields exist for OpenAI compatibility but always return null:
reasoning_content, function_call, refusal, tool_calls, logprobs, service_tier, system_fingerprint
Send a Base64-encoded local file
To translate a local audio file, encode it as Base64 and pass it as a data URL to input_audio.data. The format is data:audio/<format>;base64,<base64_data> (for example, data:audio/wav;base64,UklGRiQAAABXQVZFZm10...).
Supported audio formats: WAV, MP3, FLAC, AAC, OGG, OPUS, M4A, WMA, AMR. Sample rate: 8kHz-48kHz.
import base64
with open("local_audio.wav", "rb") as f:
audio_base64 = base64.b64encode(f.read()).decode("utf-8")
messages = [
{
"role": "user",
"content": [
{
"type": "input_audio",
"input_audio": {
"data": f"data:audio/wav;base64,{audio_base64}",
"format": "wav",
},
}
],
}
]
For full examples in Python, Node.js, and curl, see Send a Base64-encoded local file in the developer guide.
Usage notes
- Streaming and non-streaming. Both streaming (
stream: true) and non-streaming (stream: false) calls are supported. Use streaming for real-time translation progress; use non-streaming for simpler integration.
- Single message. The
messages array accepts exactly one user message.
- Non-standard parameters.
translation_options, top_k, and repetition_penalty are not in the standard OpenAI API. In the Python SDK, pass them in extra_body. In Node.js or HTTP, include them at the top level.
- Sampling defaults. Defaults are tuned for translation accuracy. Changing
temperature, top_p, top_k, presence_penalty, or repetition_penalty may reduce quality.
- Output audio format. Only
wav is supported.
- Auto language detection. If
source_lang is omitted, the model detects the input language automatically.
References