Qwen-TTS client events

Client events are JSON messages you send over a WebSocket to configure voice settings, stream text, and signal input completion.

For the full API overview, see Realtime streaming TTS.

Endpoint

Connect to the WebSocket endpoint:

wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime?model={model}

Replace {model} with your model ID, such as qwen3-tts-instruct-flash-realtime.

Event summary

Client event	Server response	Description
`session.update`	`session.updated`	Set voice, audio format, mode, and other session parameters
`input_text_buffer.append`	`response.created`	Append text to the synthesis buffer
`input_text_buffer.commit`	`input_text_buffer.committed`	Commit buffered text to start synthesis
`input_text_buffer.clear`	`input_text_buffer.cleared`	Discard all buffered text
`session.finish`	`session.finished`	End the session; the server flushes remaining audio and closes the connection

session.update

Send as the first message after the WebSocket opens. Omit to use defaults. The server responds with session.updated.

Example

{
  "event_id": "event_123",
  "type": "session.update",
  "session": {
    "voice": "Cherry",
    "mode": "server_commit",
    "language_type": "Chinese",
    "response_format": "pcm",
    "sample_rate": 24000,
    "instructions": "",
    "optimize_instructions": false
  }
}

string

body

required

Unique identifier for this event. Use a UUID. Must be unique within the session.

string

body

required

Set to session.update.

object

body

Session configuration.

Show properties

string

body

required

Voice for synthesis.

System voices: Available for Qwen3-TTS-Instruct-Flash-Realtime, Qwen3-TTS-Flash-Realtime, and Qwen-TTS-Realtime.
Custom voices:
- Voice cloning: Qwen3-TTS-VC-Realtime only.
- Voice design: Qwen3-TTS-VD-Realtime only.

string

body

Controls when buffered text is synthesized. Default: server_commit.

Value	Behavior
`server_commit`	The server decides when to synthesize, balancing latency and quality. Recommended.
`commit`	You trigger synthesis by sending `input_text_buffer.commit`. Lowest latency, but you must manage sentence boundaries.

string

body

Language of the output audio. Default: Auto.

Auto -- For unknown or mixed-language text. The model matches pronunciation per segment automatically. Accuracy is not guaranteed for every segment.
Specify the language to improve quality. Supported values:

Value	Value	Value
`Chinese`	`English`	`German`
`Italian`	`Portuguese`	`Spanish`
`Japanese`	`Korean`	`French`
`Russian`

string

body

Audio output format. Default: pcm.

Value	Notes
`pcm`	Default. The only format for Qwen-TTS-Realtime.
`wav`
`mp3`
`opus`	Supports configurable bitrate via `bit_rate`.

integer

body

Audio sample rate in Hz. Default: 24000.Supported values: 8000, 16000, 24000, 48000.

Qwen-TTS-Realtime supports only 24000.

float

body

Playback speed. Below 1.0 slows the audio; above 1.0 speeds it up. Default: 1.0. Range: 0.5--2.0.

Not supported by Qwen-TTS-Realtime.

integer

body

Audio volume. Default: 50. Range: 0--100.

Not supported by Qwen-TTS-Realtime.

float

body

Audio pitch. Default: 1.0. Range: 0.5--2.0.

Not supported by Qwen-TTS-Realtime.

integer

body

Audio bitrate in kbps. Higher values improve quality but increase file size. Only applies when response_format is opus. Default: 128. Range: 6--510.

Not supported by Qwen-TTS-Realtime.

string

body

Controls the style and expressiveness of the output speech. See Realtime streaming TTS for details. Max length: 1600 tokens.Supported languages: Chinese and English only.

Available for Qwen3-TTS-Instruct-Flash-Realtime only.

boolean

body

When true, the system rewrites instructions to improve naturalness and expressiveness. Use for fine-grained vocal control. Default: false.No effect if instructions is empty.

Available for Qwen3-TTS-Instruct-Flash-Realtime only.

input_text_buffer.append

Appends text to the synthesis buffer. In server_commit mode, the buffer is server-side; in commit mode, it is client-side. The server responds with response.created when a new response begins.

Example

{
  "event_id": "event_B4o9RHSTWobB5OQdEHLTo",
  "type": "input_text_buffer.append",
  "text": "Hello, I am Qwen."
}

string

body

required

Unique identifier for this event. Use a UUID. Must be unique within the session.

string

body

required

Set to input_text_buffer.append.

string

body

required

Text to synthesize.

input_text_buffer.commit

Commits buffered text and creates a user message item. The server responds with input_text_buffer.committed. Sending this on an empty buffer returns an error. Behavior by mode:

server_commit: All buffered text is synthesized immediately. The server stops caching and processes everything.
commit: Creates a user message item from the buffered text.

Committing triggers speech synthesis only -- not model response generation.

Example

{
  "event_id": "event_B4o9RHSTWobB5OQdEHLTo",
  "type": "input_text_buffer.commit"
}

string

body

required

Unique identifier for this event. Use a UUID. Must be unique within the session.

string

body

required

Set to input_text_buffer.commit.

input_text_buffer.clear

Clears all text in the buffer. The server responds with input_text_buffer.cleared.

Example

{
  "event_id": "event_2728",
  "type": "input_text_buffer.clear"
}

string

body

required

Unique identifier for this event. Use a UUID. Must be unique within the session.

string

body

required

Set to input_text_buffer.clear.

session.finish

Signals that you have no more text to send. The server flushes remaining audio, returns session.finished, and closes the connection.

Example

{
  "event_id": "event_2239",
  "type": "session.finish"
}

string

body

required

Unique identifier for this event. Use a UUID. Must be unique within the session.

string

body

required

Set to session.finish.

​Endpoint

​Event summary

​session.update

​input_text_buffer.append

​input_text_buffer.commit

​input_text_buffer.clear

​session.finish

Endpoint

Event summary

session.update

input_text_buffer.append

input_text_buffer.commit

input_text_buffer.clear

session.finish