Client events are JSON messages sent over a WebSocket connection to control the Qwen-TTS Realtime API session -- configure voice settings, stream text for synthesis, and signal completion.
For the full API overview, see Realtime streaming TTS.
Endpoint
Connect to the WebSocket endpoint:
{model} with your model ID, such as qwen3-tts-instruct-flash-realtime.
Event summary
| Client event | Server response | Description |
|---|---|---|
session.update | session.updated | Set voice, audio format, interaction mode, and other session parameters |
input_text_buffer.append | -- | Append text to the synthesis buffer |
input_text_buffer.commit | input_text_buffer.committed | Commit buffered text to trigger synthesis |
input_text_buffer.clear | input_text_buffer.cleared | Discard all buffered text |
session.finish | -- | End the session; the server flushes remaining audio and closes the connection |
session.update
Configures the session. Send as the first message after the WebSocket connection is established. If omitted, all parameters use defaults. The server confirms with a session.updated event.
Request body
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
event_id | string | Yes | Unique event identifier generated by the client (UUID recommended). Must be unique within the WebSocket session. |
type | string | Yes | Set to session.update. |
session | object | No | Session configuration. See the following subsections. |
input_text_buffer.append
Append text to the synthesis buffer.
- In
server_commitmode, text is appended to the server-side buffer. - In
commitmode, text is appended to the client-side buffer.
Request body
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
event_id | string | Yes | Unique event identifier generated by the client (UUID recommended). Must be unique within the WebSocket session. |
type | string | Yes | Set to input_text_buffer.append. |
text | string | Yes | The text to synthesize. |
input_text_buffer.commit
Commits buffered text and creates a user message item. The server responds with an input_text_buffer.committed event.
Returns an error if the buffer is empty.
Behavior differs by mode:
server_commitmode: All buffered text is synthesized immediately. The server stops caching and processes everything at once.commitmode: Creates user message item from buffered text.
Committing the buffer triggers synthesis only -- it does not generate a model response.
Request body
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
event_id | string | Yes | Unique event identifier generated by the client (UUID recommended). Must be unique within the WebSocket session. |
type | string | Yes | Set to input_text_buffer.commit. |
input_text_buffer.clear
Clears buffer text. The server responds with an input_text_buffer.cleared event.
Request body
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
event_id | string | Yes | Unique event identifier generated by the client (UUID recommended). Must be unique within the WebSocket session. |
type | string | Yes | Set to input_text_buffer.clear. |
session.finish
Signals no more text will be sent. The server returns remaining audio and closes the connection.
Request body
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
event_id | string | Yes | Unique event identifier generated by the client (UUID recommended). Must be unique within the WebSocket session. |
type | string | Yes | Set to session.finish. |