LiveTranslate client events

This topic describes the client events for the qwen3.5-livetranslate-flash-realtime API.

Connect

Establish a WebSocket connection to start a session. The server sends a session.created event when the connection is ready.

Configuration	Value
Endpoint	`wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime`
Query parameter	`model=qwen3.5-livetranslate-flash-realtime`
Auth header	`Authorization: Bearer $DASHSCOPE_API_KEY`
Protocol	JSON text frames

Full URL:

wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime?model=qwen3.5-livetranslate-flash-realtime

session.update

Updates the session configuration after you connect. The server validates parameters and returns the full configuration, or an error if any value is invalid.

Example

{
  "event_id": "event_ToPZqeobitzUJnt3QqtWg",
  "type": "session.update",
  "session": {
    "modalities": [
      "text",
      "audio"
    ],
    "voice": "Tina",
    "input_audio_format": "pcm16",
    "output_audio_format": "pcm24",
    "input_audio_transcription": {
      "model": "qwen3-asr-flash-realtime",
      "language": "zh"
    },
    "translation": {
      "language": "en"
    }
  }
}

string

body

required

Always "session.update".

object

body

Session configuration.

Show properties

array

body

Output types. Valid values:

["text"] — Text only.
["text", "audio"] (default) — Text and audio.

string

body

Voice for audio output. See Supported voices. Default: Tina for Qwen3.5-LiveTranslate-Flash-Realtime, or Cherry for Qwen3-LiveTranslate-Flash-Realtime.

object

body

Input audio settings.

Show properties

string

body

Speech recognition model. When set, the server returns source-language text through conversation.item.input_audio_transcription.text and conversation.item.input_audio_transcription.completed events.Valid value: qwen3-asr-flash-realtime.

string

body

Source language. See Supported languages. Default: en.

string

body

Input audio format. Must be pcm16.

string

body

Output audio format. Must be pcm24.

object

body

Translation settings.

Show properties

string

body

Target language. See Supported languages. Default: en.

boolean

body

Whether to enable voice cloning. Default: false. When enabled, the model clones the speaker's voice from the input audio for translated output. In this case, voice no longer accepts system preset voices and must be set to default or a voice ID previously created through the Voice Clone API.

object

body

Voice cloning options. Only applies when enable_voice_clone is true.

Show properties

string

body

Controls when voice cloning occurs. Valid values: never (use pre-cloned voice profile), once (clone at session start), always (clone before each response). Default: once.

input_audio_buffer.append

Appends audio bytes to the input buffer. The server uses this buffer for speech detection and submission timing.

Example

{
  "event_id": "event_xxx",
  "type": "input_audio_buffer.append",
  "audio": "xxx"
}

string

body

required

Always "input_audio_buffer.append".

string

body

required

Base64-encoded audio data.

input_image_buffer.append

Adds image data to the buffer from a local file or a real-time video stream. Image limits:

Format: JPG or JPEG. Recommended resolution: 480p or 720p. Maximum: 1080p.
Maximum size: 500 KB (before Base64 encoding).
Must be Base64-encoded.
Maximum rate: 2 images per second.
You must send at least one input_audio_buffer.append event first.

Example

{
  "event_id": "event_xxx",
  "type": "input_image_buffer.append",
  "image": "xxx"
}

string

body

required

Always "input_image_buffer.append".

string

body

required

Base64-encoded image data.

session.finish

Ends the session. The server responds based on whether it detected speech:

Speech detected: The server finishes recognition and sends conversation.item.input_audio_transcription.completed with the result, then sends session.finished.
No speech detected: The server sends session.finished directly.

Disconnect after you receive session.finished.

Example

{
  "event_id": "event_xxx",
  "type": "session.finish"
}

string

body

required

Always "session.finish".

​Connect

​session.update

​input_audio_buffer.append

​input_image_buffer.append

​session.finish

Connect

session.update

input_audio_buffer.append

input_image_buffer.append

session.finish