Skip to main content
Realtime

LiveTranslate client events

WebSocket client reference

Events your client sends to the server over WebSocket.
Reference: Speech translation.

session.update

Updates the session configuration after you connect. The server validates parameters and returns the full configuration, or an error if any value is invalid.
Example
{
  "event_id": "event_ToPZqeobitzUJnt3QqtWg",
  "type": "session.update",
  "session": {
    "modalities": [
      "text",
      "audio"
    ],
    "voice": "Cherry",
    "input_audio_format": "pcm16",
    "output_audio_format": "pcm24",
    "input_audio_transcription": {
      "model": "qwen3-asr-flash-realtime",
      "language": "zh"
    },
    "translation": {
      "language": "en"
    }
  }
}
string
body
required
Always "session.update".
object
body
Session configuration.

input_audio_buffer.append

Appends audio bytes to the input buffer. The server uses this buffer for speech detection and submission timing.
Example
{
  "event_id": "event_xxx",
  "type": "input_audio_buffer.append",
  "audio": "xxx"
}
string
body
required
Always "input_audio_buffer.append".
string
body
required
Base64-encoded audio data.

input_image_buffer.append

Adds image data to the buffer from a local file or a real-time video stream. Image limits:
  • Format: JPG or JPEG. Recommended resolution: 480p or 720p. Maximum: 1080p.
  • Maximum size: 500 KB (before Base64 encoding).
  • Must be Base64-encoded.
  • Maximum rate: 2 images per second.
  • You must send at least one input_audio_buffer.append event first.
Example
{
  "event_id": "event_xxx",
  "type": "input_image_buffer.append",
  "image": "xxx"
}
string
body
required
Always "input_image_buffer.append".
string
body
required
Base64-encoded image data.

session.finish

Ends the session. The server responds based on whether it detected speech: Disconnect after you receive session.finished.
Example
{
  "event_id": "event_xxx",
  "type": "session.finish"
}
string
body
required
Always "session.finish".