Qwen-TTS realtime Java SDK

For models and voice options, see Realtime streaming TTS.

Getting started

Both interaction modes follow the same pattern: build QwenTtsRealtimeParam, create a QwenTtsRealtime instance with a callback, connect, configure the session, send text, and close. For complete examples and tutorials, see Real-time text-to-speech.

Server commit mode (server_commit): The server decides when to synthesize buffered text. Best for most use cases.
Commit mode (commit): The client calls commit() to trigger synthesis. Lowest latency, but you must manage sentence boundaries.

Request parameters

QwenTtsRealtimeParam

Build a QwenTtsRealtimeParam and pass it to the QwenTtsRealtime constructor.

Parameter	Type	Required	Description
model	String	Yes	Model name. See Supported models.
url	String	Yes	WebSocket endpoint: `wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime`.

QwenTtsRealtimeConfig

Build a QwenTtsRealtimeConfig and pass it to updateSession().

Parameter	Type	Required	Description
voice	String	Yes	Voice for synthesis. System voices: available for Qwen3-TTS-Instruct-Flash-Realtime, Qwen3-TTS-Flash-Realtime, and Qwen-TTS-Realtime. See Supported voices for samples. Custom voices: created through Voice cloning (Qwen) (Qwen3-TTS-VC-Realtime only) or Voice design (Qwen) (Qwen3-TTS-VD-Realtime only).
languageType	String	No	Language of the output audio. Default: `Auto` (the model detects language per segment; accuracy is not guaranteed). Set a specific language to improve quality for single-language text. Valid values: `Chinese`, `English`, `German`, `Italian`, `Portuguese`, `Spanish`, `Japanese`, `Korean`, `French`, `Russian`.
mode	String	No	Interaction mode. `server_commit` (default): the server decides when to synthesize buffered text, balancing latency and quality. `commit`: the client triggers synthesis for the lowest latency; requires manual sentence boundary management.
format	String	No	Audio output format. Valid values: `pcm` (default), `wav`, `mp3`, `opus`. Qwen-TTS-Realtime supports only `pcm`. See Supported models.
sampleRate	int	No	Sample rate in Hz. Valid values: `8000`, `16000`, `24000` (default), `48000`. Qwen-TTS-Realtime supports only `24000`. See Supported models.
speechRate	float	No	Speech rate multiplier. Range: [0.5, 2.0]. Default: `1.0`. Not supported by Qwen-TTS-Realtime.
volume	int	No	Audio volume. Range: [0, 100]. Default: `50`. Not supported by Qwen-TTS-Realtime.
pitchRate	float	No	Pitch multiplier. Range: [0.5, 2.0]. Default: `1.0`. Not supported by Qwen-TTS-Realtime.
bitRate	int	No	Audio bitrate in kbps. Range: [6, 510]. Default: `128`. Only for `opus` format. Higher values produce better quality at larger file sizes. Not supported by Qwen-TTS-Realtime.
instructions	String	No	Instruction text for controlling speech style. Max 1,600 tokens. Chinese and English only. Only for Qwen3-TTS-Instruct-Flash-Realtime. See Instruction control.
optimizeInstructions	boolean	No	Optimizes `instructions` for more natural synthesis. Default: `false`. No effect if `instructions` is not set. Only for Qwen3-TTS-Instruct-Flash-Realtime.

API reference

QwenTtsRealtime class

import com.alibaba.dashscope.audio.qwen_tts_realtime.QwenTtsRealtime;

Method	Signature	Server events	Description
connect	`public void connect() throws NoApiKeyException, InterruptedException`	session.created, session.updated	Opens a WebSocket connection.
updateSession	`public void updateSession(QwenTtsRealtimeConfig config)`	session.updated	Updates the session configuration. Call immediately after `connect()` to override defaults. The server validates parameters and returns an error if invalid.
appendText	`public void appendText(String text)`	None	Appends text to the server-side input buffer. In `server_commit` mode, the server decides when to synthesize. In `commit` mode, call `commit()` to trigger synthesis.
clearAppendedText	`public void clearAppendedText()`	input_text_buffer.cleared	Clears all text in the server-side input buffer.
commit	`public void commit()`	input_text_buffer.committed, response.output_item.added, response.content_part.added, response.audio.delta, response.audio.done, response.content_part.done, response.output_item.done, response.done	Commits buffered text and starts synthesis. Returns an error if the buffer is empty. In `server_commit` mode, the server commits automatically. In `commit` mode, call this to trigger synthesis.
finish	`public void finish()`	session.finished	Stops the current task.
close	`public void close()`	None	Closes the WebSocket connection.
getSessionId	`public String getSessionId()`	None	Returns the current session ID.
getResponseId	`public String getResponseId()`	None	Returns the most recent response ID.
getFirstAudioDelay	`public long getFirstAudioDelay()`	None	Returns the first audio packet latency in milliseconds.

QwenTtsRealtimeCallback interface

import com.alibaba.dashscope.audio.qwen_tts_realtime.QwenTtsRealtimeCallback;

Method	Parameters	Description
`onOpen()`	None	Called when the WebSocket connection opens.
`onEvent(JsonObject message)`	`message`: The server event payload.	Called when a server event arrives. For event types, see Server-side events.
`onClose(int code, String reason)`	`code`: WebSocket close status code. `reason`: Close reason.	Called when the server closes the connection.

Next steps

Realtime streaming TTS: Models, supported voices, and instruction control.
Server-side events: WebSocket event type reference.
Voice cloning (Qwen): Create custom voices with Qwen3-TTS-VC-Realtime.
Voice design (Qwen): Design custom voices with Qwen3-TTS-VD-Realtime.
GitHub samples: More code examples.

​Getting started

​Request parameters

​QwenTtsRealtimeParam

​QwenTtsRealtimeConfig

​API reference

​QwenTtsRealtime class

​QwenTtsRealtimeCallback interface

​Next steps