Qwen-TTS realtime Python SDK

Interfaces and parameters for real-time text-to-speech (Qwen) with the DashScope Python SDK. Guide: See Real-time text-to-speech - Qwen or Text-to-speech - Qwen for model details.

Getting started

Two interaction modes are available: server commit (server decides synthesis timing) and commit (client triggers synthesis manually). For complete examples and tutorials, see Real-time text-to-speech.

Request parameters

Set these in the QwenTtsRealtime constructor.

Parameter	Type	Required	Description
model	str	Yes	Model name.
url	str	Yes	`wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime`

Set these with update_session.

Parameter	Type	Required	Description
voice	str	Yes	Voice for synthesis. System voices: available for the Qwen3-TTS-Instruct-Flash-Realtime, Qwen3-TTS-Flash-Realtime, and Qwen-TTS-Realtime series. Custom voices: created through Voice cloning (Qwen) (Qwen3-TTS-VC-Realtime series only) or Voice design (Qwen) (Qwen3-TTS-VD-Realtime series only).
language_type	str	No	Output audio language. Default: `Auto`. Use `Auto` when the language is uncertain or the text mixes languages. Supported: `Chinese`, `English`, `German`, `Italian`, `Portuguese`, `Spanish`, `Japanese`, `Korean`, `French`, `Russian`. Setting a specific language improves quality.
mode	str	No	Interaction mode. `server_commit` (default): server decides synthesis timing, balancing latency and quality. Best for most use cases. `commit`: client triggers synthesis manually with lower latency, but you must manage sentence boundaries.
format	str	No	Audio format. Supported: `pcm` (default), `wav`, `mp3`, `opus`. Qwen-TTS-Realtime supports only `pcm`.
sample_rate	int	No	Sample rate in Hz. Supported: 8000, 16000, 24000 (default), 48000. Qwen-TTS-Realtime supports only 24000.
speech_rate	float	No	Speech rate multiplier. Default: 1.0 (normal). Range: [0.5, 2.0]. Not supported by Qwen-TTS-Realtime.
volume	int	No	Volume. Default: 50. Range: [0, 100]. Not supported by Qwen-TTS-Realtime.
pitch_rate	float	No	Pitch multiplier. Default: 1.0. Range: [0.5, 2.0]. Not supported by Qwen-TTS-Realtime.
bit_rate	int	No	Bitrate in kbps. Higher values improve quality but increase file size. Only for `opus` format. Default: 128. Range: [6, 510]. Not supported by Qwen-TTS-Realtime.
instructions	str	No	Synthesis control instructions. See Realtime streaming TTS for details. Default: None. Max: 1600 tokens. Supports Chinese and English. Qwen3-TTS-Instruct-Flash-Realtime series only.
optimize_instructions	bool	No	When True, rewrites `instructions` for better naturalness and expressiveness. Default: False. Recommended for fine-grained voice control. Requires `instructions` to be set. Qwen3-TTS-Instruct-Flash-Realtime series only.

Key interfaces

QwenTtsRealtime class

Import with from dashscope.audio.qwen_tts_realtime import QwenTtsRealtime.

Method signature	Server response events (via callback)	Description
`def connect(self) -> None`	session.created - Session created. session.updated - Session configuration updated.	Connect to the server.
`def update_session(self, voice: str, response_format: AudioFormat = AudioFormat.PCM_24000HZ_MONO_16BIT, mode: str = 'server_commit', language_type: str = "Chinese", **kwargs) -> None`	session.updated - Session configuration updated.	Update session configuration. See Request parameters for details. Call immediately after connecting to override defaults. Invalid parameters return an error; valid parameters update the configuration.
`def append_text(self, text: str) -> None`	None	Append text to the cloud input buffer. In `server_commit` mode, the server decides when to synthesize buffered text. In `commit` mode, call `commit` to trigger synthesis.
`def clear_appended_text(self) -> None`	input_text_buffer.cleared - Buffer cleared.	Clear all text in the cloud buffer.
`def commit(self) -> None`	input_text_buffer.committed - Text submitted. response.output_item.added - Output item added. response.content_part.added - Content part added. response.audio.delta - Incremental audio data. response.audio.done - Audio generation done. response.content_part.done - Content streaming done. response.output_item.done - Output item done. response.done - Response done.	Submit buffered text and synthesize immediately. Returns an error if the buffer is empty. Not needed in `server_commit` mode. Required in `commit` mode to trigger synthesis.
`def finish(self) -> None`	session.finished - Session finished.	End the task.
`def close(self) -> None`	None	Close the connection.
`def get_session_id(self) -> str`	None	Get the current session ID.
`def get_last_response_id(self) -> str`	None	Get the last response ID.
`def get_first_audio_delay(self)`	None	Get the delay before the first audio packet.

Callback interface (QwenTtsRealtimeCallback)

Handle server responses by implementing callback methods. Import with from dashscope.audio.qwen_tts_realtime import QwenTtsRealtimeCallback.

Method	Parameters	Return value	Description
`def on_open(self) -> None`	None	None	Called when the connection is established.
`def on_event(self, message: str) -> None`	message: Server response event.	None	Called for API responses and model-generated text/audio. See Server events.
`def on_close(self, close_status_code, close_msg) -> None`	close_status_code: WebSocket close code. close_msg: WebSocket close message.	None	Called when the server closes the connection.

​Getting started

​Request parameters

​Key interfaces

​QwenTtsRealtime class

​Callback interface (QwenTtsRealtimeCallback)

Getting started

Request parameters

Key interfaces

QwenTtsRealtime class

Callback interface (QwenTtsRealtimeCallback)