Skip to main content
Qwen-Livetranslate

Audio and video translation

LiveTranslate API reference

Translate audio and video through the OpenAI-compatible chat completions endpoint. Both streaming and non-streaming calls are supported.
The DashScope interface is not supported.

Supported models

  • qwen3-livetranslate-flash
  • qwen3-livetranslate-flash-2025-12-01

Prerequisites

  1. Get an API key
  2. Set the API key as an environment variable
  3. Install the OpenAI SDK (Python or Node.js)

Endpoints

SDK base_urlHTTP endpoint
https://dashscope-intl.aliyuncs.com/compatible-mode/v1POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions

Getting started

These examples translate an audio file and stream back text and audio.
  • Python
  • Node.js
  • curl
import os
from openai import OpenAI

client = OpenAI(
  api_key=os.getenv("DASHSCOPE_API_KEY"),
  base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)

completion = client.chat.completions.create(
  model="qwen3-livetranslate-flash",
  messages=[
    {
      "role": "user",
      "content": [
        {
          "type": "input_audio",
          "input_audio": {
            "data": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250211/tixcef/cherry.wav",
            "format": "wav",
          },
        }
      ],
    }
  ],
  modalities=["text", "audio"],
  audio={"voice": "Cherry", "format": "wav"},
  stream=True,
  stream_options={"include_usage": True},
  extra_body={"translation_options": {"source_lang": "zh", "target_lang": "en"}},
)

for chunk in completion:
  print(chunk)

Video input

To translate video, set the content type to video_url. All other parameters stay the same.
messages = [
  {
    "role": "user",
    "content": [
      {
        "type": "video_url",
        "video_url": {
          "url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241115/cqqkru/1.mp4"
        },
      }
    ],
  },
]

Request body

Required parameters

ParameterTypeDescription
modelstringModel name: qwen3-livetranslate-flash or qwen3-livetranslate-flash-2025-12-01.
messagesarrayA single user message. See Message object.
streambooleanWhether to stream the response. Use true for real-time translation progress, false for simpler integration.
translation_optionsobjectTranslation settings. Non-standard OpenAI parameter: pass in extra_body for the Python SDK, or at the top level for Node.js/HTTP. See Translation options.

Optional parameters

ParameterTypeDefaultDescription
modalitiesarray["text"]Output mode. Use ["text", "audio"] for text and audio, or ["text"] for text only.
audioobject-Audio output settings. Required when modalities includes "audio". See Audio output options.
stream_optionsobject-Streaming settings. See Stream options.
max_tokensintegerModel maximumMaximum tokens to generate. Output is truncated if exceeded.
seedinteger-Random seed for reproducible output. Range: [0, 2^31-1].

Sampling parameters

Keep these at their defaults for best translation accuracy.
ParameterTypeDefaultRangeNotes
temperaturefloat0.000001[0, 2)Controls output diversity.
top_pfloat0.8(0, 1.0]Nucleus sampling threshold.
presence_penaltyfloat0[-2.0, 2.0]Reduces repetition when positive.
top_kinteger1>= 0Candidate set size. Disabled when None or > 100 (top_p takes effect instead). Non-standard OpenAI parameter: use extra_body in the Python SDK.
repetition_penaltyfloat1.05> 0Penalizes repeated sequences. Non-standard OpenAI parameter: use extra_body in the Python SDK.

Message object

The messages array must contain exactly one user message. content array items:
FieldTypeRequiredDescription
typestringYesinput_audio for audio, video_url for video.
input_audioobjectWhen type is input_audioAudio input. See below.
video_urlobjectWhen type is video_urlVideo input. See below.
input_audio object:
FieldTypeRequiredDescription
datastringYesAudio file URL or Base64 data URL. For local files, see Send a Base64-encoded local file.
formatstringYesAudio format, such as mp3 or wav.
video_url object:
FieldTypeRequiredDescription
urlstringYesPublic video URL or Base64 data URL. For local files, see Send a Base64-encoded local file.

Translation options

FieldTypeRequiredDescription
source_langstringNoSource language (full English name). See Supported languages. If omitted, the model detects it automatically.
target_langstringYesTarget language (full English name). See Supported languages.
extra_body={"translation_options": {"source_lang": "zh", "target_lang": "en"}}

Audio output options

Required when modalities is ["text", "audio"].
FieldTypeRequiredDescription
voicestringYesOutput voice.
formatstringYesOutput audio format. Only wav is supported.

Stream options

FieldTypeDefaultDescription
include_usagebooleanfalseWhen true, the final chunk includes token usage details.

Response

The API streams chat.completion.chunk objects in three categories: text, audio, and token usage.

Text chunk

Contains incremental translated text in choices[0].delta.content:
{
  "id": "chatcmpl-c22a54b8-40cc-4a1d-988b-f84cdf86868f",
  "choices": [
    {
      "delta": {
        "content": " of",
        "role": null,
        "audio": null
      },
      "finish_reason": null,
      "index": 0
    }
  ],
  "created": 1764755440,
  "model": "qwen3-livetranslate-flash",
  "object": "chat.completion.chunk"
}

Audio chunk

Contains incremental Base64-encoded audio in choices[0].delta.audio.data:
{
  "id": "chatcmpl-c22a54b8-40cc-4a1d-988b-f84cdf86868f",
  "choices": [
    {
      "delta": {
        "content": null,
        "role": null,
        "audio": {
          "data": "///+//7////+////////////AAAAAAAAAAABA......",
          "expires_at": 1764755440,
          "id": "audio_c22a54b8-40cc-4a1d-988b-f84cdf86868f"
        }
      },
      "finish_reason": null,
      "index": 0
    }
  ],
  "created": 1764755440,
  "model": "qwen3-livetranslate-flash",
  "object": "chat.completion.chunk"
}

Token usage chunk

Sent last when include_usage is true. The choices array is empty. usage holds the token breakdown:
{
  "id": "chatcmpl-c22a54b8-40cc-4a1d-988b-f84cdf86868f",
  "choices": [],
  "created": 1764755440,
  "model": "qwen3-livetranslate-flash",
  "object": "chat.completion.chunk",
  "usage": {
    "completion_tokens": 242,
    "prompt_tokens": 415,
    "total_tokens": 657,
    "completion_tokens_details": {
      "accepted_prediction_tokens": null,
      "audio_tokens": 191,
      "reasoning_tokens": null,
      "rejected_prediction_tokens": null,
      "text_tokens": 51
    },
    "prompt_tokens_details": {
      "audio_tokens": 415,
      "cached_tokens": null,
      "text_tokens": 0,
      "video_tokens": null
    }
  }
}
For video input, prompt_tokens_details.audio_tokens includes audio tokens from the video. video_tokens reports the video-specific count.

Response fields

FieldTypeDescription
idstringRequest identifier. Same across all chunks.
choicesarrayGenerated content. Empty in the final usage chunk.
choices[].delta.contentstringIncremental translated text. null in audio chunks.
choices[].delta.audioobjectIncremental audio data. null in text chunks.
choices[].delta.audio.datastringBase64-encoded audio segment.
choices[].delta.audio.idstringOutput audio identifier.
choices[].delta.audio.expires_atintegerTimestamp when the request was created.
choices[].delta.rolestringMessage role. Present only in the first chunk.
choices[].finish_reasonstringstop when complete, length when truncated by max_tokens, null while in progress.
choices[].indexintegerAlways 0.
createdintegerRequest Unix timestamp. Same across all chunks.
modelstringModel used.
objectstringAlways chat.completion.chunk.
usageobjectToken usage. Present only in the final chunk when include_usage is true.
usage.prompt_tokensintegerTotal input tokens.
usage.completion_tokensintegerTotal output tokens.
usage.total_tokensintegerSum of prompt_tokens and completion_tokens.
usage.completion_tokens_details.audio_tokensintegerOutput audio tokens.
usage.completion_tokens_details.text_tokensintegerOutput text tokens.
usage.prompt_tokens_details.audio_tokensintegerInput audio tokens. For video input, includes audio tokens from the video.
usage.prompt_tokens_details.text_tokensintegerInput text tokens. Always 0.
usage.prompt_tokens_details.video_tokensintegerInput video tokens. Present only for video input.

Fields fixed to null

These fields exist for OpenAI compatibility but always return null: reasoning_content, function_call, refusal, tool_calls, logprobs, service_tier, system_fingerprint

Send a Base64-encoded local file

To translate a local audio file, encode it as Base64 and pass it as a data URL to input_audio.data. The format is data:audio/<format>;base64,<base64_data> (for example, data:audio/wav;base64,UklGRiQAAABXQVZFZm10...).
Supported audio formats: WAV, MP3, FLAC, AAC, OGG, OPUS, M4A, WMA, AMR. Sample rate: 8kHz-48kHz.
import base64

with open("local_audio.wav", "rb") as f:
  audio_base64 = base64.b64encode(f.read()).decode("utf-8")

messages = [
  {
    "role": "user",
    "content": [
      {
        "type": "input_audio",
        "input_audio": {
          "data": f"data:audio/wav;base64,{audio_base64}",
          "format": "wav",
        },
      }
    ],
  }
]
For full examples in Python, Node.js, and curl, see Send a Base64-encoded local file in the developer guide.

Usage notes

  • Streaming and non-streaming. Both streaming (stream: true) and non-streaming (stream: false) calls are supported. Use streaming for real-time translation progress; use non-streaming for simpler integration.
  • Single message. The messages array accepts exactly one user message.
  • Non-standard parameters. translation_options, top_k, and repetition_penalty are not in the standard OpenAI API. In the Python SDK, pass them in extra_body. In Node.js or HTTP, include them at the top level.
  • Sampling defaults. Defaults are tuned for translation accuracy. Changing temperature, top_p, top_k, presence_penalty, or repetition_penalty may reduce quality.
  • Output audio format. Only wav is supported.
  • Auto language detection. If source_lang is omitted, the model detects the input language automatically.

References