WebSocket client reference
Client events are JSON messages you send over a WebSocket to configure voice settings, stream text, and signal input completion.
Connect to the WebSocket endpoint:
Replace
Send as the first message after the WebSocket opens. Omit to use defaults. The server responds with
Appends text to the synthesis buffer. In
Commits buffered text and creates a user message item. The server responds with
Clears all text in the buffer. The server responds with
Signals that you have no more text to send. The server flushes remaining audio, returns
For the full API overview, see Realtime streaming TTS.
Endpoint
Connect to the WebSocket endpoint:
{model} with your model ID, such as qwen3-tts-instruct-flash-realtime.
Event summary
| Client event | Server response | Description |
|---|---|---|
session.update | session.updated | Set voice, audio format, mode, and other session parameters |
input_text_buffer.append | response.created | Append text to the synthesis buffer |
input_text_buffer.commit | input_text_buffer.committed | Commit buffered text to start synthesis |
input_text_buffer.clear | input_text_buffer.cleared | Discard all buffered text |
session.finish | session.finished | End the session; the server flushes remaining audio and closes the connection |
session.update
Send as the first message after the WebSocket opens. Omit to use defaults. The server responds with session.updated.
Example
string
body
required
Unique identifier for this event. Use a UUID. Must be unique within the session.
string
body
required
Set to
session.update.object
body
Session configuration.
input_text_buffer.append
Appends text to the synthesis buffer. In server_commit mode, the buffer is server-side; in commit mode, it is client-side.
The server responds with response.created when a new response begins.
Example
string
body
required
Unique identifier for this event. Use a UUID. Must be unique within the session.
string
body
required
Set to
input_text_buffer.append.string
body
required
Text to synthesize.
input_text_buffer.commit
Commits buffered text and creates a user message item. The server responds with input_text_buffer.committed. Sending this on an empty buffer returns an error.
Behavior by mode:
server_commit: All buffered text is synthesized immediately. The server stops caching and processes everything.commit: Creates a user message item from the buffered text.
Committing triggers speech synthesis only -- not model response generation.
Example
string
body
required
Unique identifier for this event. Use a UUID. Must be unique within the session.
string
body
required
Set to
input_text_buffer.commit.input_text_buffer.clear
Clears all text in the buffer. The server responds with input_text_buffer.cleared.
Example
string
body
required
Unique identifier for this event. Use a UUID. Must be unique within the session.
string
body
required
Set to
input_text_buffer.clear.session.finish
Signals that you have no more text to send. The server flushes remaining audio, returns session.finished, and closes the connection.
Example
string
body
required
Unique identifier for this event. Use a UUID. Must be unique within the session.
string
body
required
Set to
session.finish.