18-language translation
Model details
| Model | Version | Context window | Max input | Max output |
|---|---|---|---|---|
| qwen3-livetranslate-flash | Stable | 53,248 tokens | 49,152 tokens | 4,096 tokens |
| qwen3-livetranslate-flash-2025-12-01 | Snapshot | 53,248 tokens | 49,152 tokens | 4,096 tokens |
qwen3-livetranslate-flash currently has the same capabilities as qwen3-livetranslate-flash-2025-12-01.
Getting started
Prerequisites
- Get an API key.
- Set it as an environment variable.
- (Optional) If you use the OpenAI SDK, install the SDK.
translation_options to set the source and target languages. The default input is audio. To translate a video file instead, uncomment the video input block in each example.
Specifying
source_lang improves translation accuracy. Omitting it enables automatic language detection.- Python
- Node.js
- curl
Send a Base64-encoded local file
To translate a local audio file, read it and encode it as Base64. Pass the data as a data URI with the format data:audio/<format>;base64,<base64_data> (for example, data:audio/wav;base64,UklGRiQAAABXQVZFZm10...).
Supported audio formats: WAV, MP3, FLAC, AAC, OGG, OPUS, M4A, WMA, AMR. Sample rate: 8kHz-48kHz.
- Python
- Node.js
- curl
Request parameters
Input
The messages array must contain exactly one message with role set to user. The content field holds the audio or video to translate:
- Audio: Set
typetoinput_audio. Provide the file URL or a data URI (for example,data:audio/wav;base64,<base64_data>) ininput_audio.data, and specify the format (for example,wav) ininput_audio.format. See Send a Base64-encoded local file for details. - Video: Set
typetovideo_url. Provide the file URL invideo_url.url.
Translation options
Specify the source and target languages in the translation_options parameter:
translation_options is not a standard OpenAI parameter. Pass it through extra_body:
Output modality
Control the output format with the modalities parameter:
modalities value | Output |
|---|---|
["text"] | Translated text only |
["text", "audio"] | Translated text and Base64-encoded synthesized audio |
audio parameter. See Supported voices for available options.
Constraints
- Single-turn only: The model handles one translation per request. Multi-turn conversations are not supported.
- No system message: The
systemrole is not supported.
Parse the response
Each streaming chunk object contains:
- Text:
chunk.choices[0].delta.content - Audio:
chunk.choices[0].delta.audio["data"](Base64-encoded, 24 kHz sample rate)
Save audio to a file
Concatenate all Base64 audio fragments from the stream, then decode and save the result after the stream completes.
- Python
- Node.js
Real-time playback
Decode each Base64 fragment as it arrives and play it directly. This approach requires platform-specific audio libraries.
- Python
- Node.js
Install
pyaudio first:| Platform | Installation |
|---|---|
| macOS | brew install portaudio && pip install pyaudio |
| Ubuntu / Debian | sudo apt-get install python-pyaudio python3-pyaudio or pip install pyaudio |
| CentOS | sudo yum install -y portaudio portaudio-devel && pip install pyaudio |
| Windows | python -m pip install pyaudio |
Billing
- Audio
- Video
Audio token consumption depends on the audio characteristics (such as sample rate). To see actual token usage, set
stream_options.include_usage to true and check the usage field in the response.Audio shorter than 1 second is billed as 1 second.
Supported languages
The following language codes can be used for source and target languages. Some target languages support text output only.
| Language code | Language | Supported output |
|---|---|---|
| en | English | Audio, text |
| zh | Chinese | Audio, text |
| ru | Russian | Audio, text |
| fr | French | Audio, text |
| de | German | Audio, text |
| pt | Portuguese | Audio, text |
| es | Spanish | Audio, text |
| it | Italian | Audio, text |
| id | Indonesian | Text |
| ko | Korean | Audio, text |
| ja | Japanese | Audio, text |
| vi | Vietnamese | Text |
| th | Thai | Text |
| ar | Arabic | Text |
| yue | Cantonese | Audio, text |
| hi | Hindi | Text |
| el | Greek | Text |
| tr | Turkish | Text |
Supported voices
Set the voice parameter when the output includes synthesized audio.
| Voice name | voice parameter | Description | Supported languages |
|---|---|---|---|
| Cherry | Cherry | A cheerful, friendly, and genuine young woman. | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
| Ethan | Ethan | Standard Mandarin with a slight northern accent. Sunny, warm, energetic, and vibrant. | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
| Nofish | Nofish | A designer who has difficulty pronouncing retroflex consonants. | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
| Shanghai-Jada | Jada | A bustling and energetic Shanghai lady. | Chinese |
| Beijing-Dylan | Dylan | A young man who grew up in the hutongs of Beijing. | Chinese |
| Sichuan-Sunny | Sunny | A sweet girl from Sichuan. | Chinese |
| Tianjin-Peter | Peter | A voice in the style of a Tianjin crosstalk performer (the supporting role). | Chinese |
| Cantonese-Kiki | Kiki | A sweet best friend from Hong Kong. | Cantonese |
| Sichuan-Eric | Eric | A man from Chengdu, Sichuan, who is unconventional and stands out from the crowd. | Chinese |
Alternative: Use Qwen-Omni
You can also use Qwen-Omni (qwen3-omni-flash) with a translation prompt to translate audio and video files.
For full Qwen-Omni capabilities including multimodal conversation, see Audio and video file understanding.
FAQ
When I input a video file, what content is translated?
The model translates the audio track from the video. Visual information serves as context to improve translation accuracy.
For example, if the audio says "This is a mask":
- When the video shows a medical mask, the model translates it as "This is a medical mask."
- When the video shows a masquerade mask, the model translates it as "This is a masquerade mask."