WebSocket client reference
Events your client sends to the server over WebSocket.
Updates the session configuration after you connect. The server validates parameters and returns the full configuration, or an error if any value is invalid.
Appends audio bytes to the input buffer. The server uses this buffer for speech detection and submission timing.
Adds image data to the buffer from a local file or a real-time video stream.
Image limits:
Ends the session. The server responds based on whether it detected speech:
Reference: Speech translation.
session.update
Updates the session configuration after you connect. The server validates parameters and returns the full configuration, or an error if any value is invalid.
Example
string
body
required
Always
"session.update".object
body
Session configuration.
input_audio_buffer.append
Appends audio bytes to the input buffer. The server uses this buffer for speech detection and submission timing.
Example
string
body
required
Always
"input_audio_buffer.append".string
body
required
Base64-encoded audio data.
input_image_buffer.append
Adds image data to the buffer from a local file or a real-time video stream.
Image limits:
- Format: JPG or JPEG. Recommended resolution: 480p or 720p. Maximum: 1080p.
- Maximum size: 500 KB (before Base64 encoding).
- Must be Base64-encoded.
- Maximum rate: 2 images per second.
- You must send at least one
input_audio_buffer.appendevent first.
Example
string
body
required
Always
"input_image_buffer.append".string
body
required
Base64-encoded image data.
session.finish
Ends the session. The server responds based on whether it detected speech:
- Speech detected: The server finishes recognition and sends conversation.item.input_audio_transcription.completed with the result, then sends session.finished.
- No speech detected: The server sends session.finished directly.
session.finished.
Example
string
body
required
Always
"session.finish".