Real-time TTS Python SDK
Interfaces and parameters for real-time text-to-speech (Qwen) with the DashScope Python SDK.
Guide: See Real-time text-to-speech - Qwen or Text-to-speech - Qwen for model details.
Two interaction modes are available: server commit (server decides synthesis timing) and commit (client triggers synthesis manually). For complete examples and tutorials, see Real-time text-to-speech.
Set these in the
Set these with
Import with
Handle server responses by implementing callback methods.
Import with
Getting started
Two interaction modes are available: server commit (server decides synthesis timing) and commit (client triggers synthesis manually). For complete examples and tutorials, see Real-time text-to-speech.
Request parameters
Set these in the QwenTtsRealtime constructor.
| Parameter | Type | Required | Description |
|---|---|---|---|
| model | str | Yes | Model name. |
| url | str | Yes | wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime |
update_session.
| Parameter | Type | Required | Description |
|---|---|---|---|
| voice | str | Yes | Voice for synthesis. System voices: available for the Qwen3-TTS-Instruct-Flash-Realtime, Qwen3-TTS-Flash-Realtime, and Qwen-TTS-Realtime series. Custom voices: created through Voice cloning (Qwen) (Qwen3-TTS-VC-Realtime series only) or Voice design (Qwen) (Qwen3-TTS-VD-Realtime series only). |
| language_type | str | No | Output audio language. Default: Auto. Use Auto when the language is uncertain or the text mixes languages. Supported: Chinese, English, German, Italian, Portuguese, Spanish, Japanese, Korean, French, Russian. Setting a specific language improves quality. |
| mode | str | No | Interaction mode. server_commit (default): server decides synthesis timing, balancing latency and quality. Best for most use cases. commit: client triggers synthesis manually with lower latency, but you must manage sentence boundaries. |
| format | str | No | Audio format. Supported: pcm (default), wav, mp3, opus. Qwen-TTS-Realtime supports only pcm. |
| sample_rate | int | No | Sample rate in Hz. Supported: 8000, 16000, 24000 (default), 48000. Qwen-TTS-Realtime supports only 24000. |
| speech_rate | float | No | Speech rate multiplier. Default: 1.0 (normal). Range: [0.5, 2.0]. Not supported by Qwen-TTS-Realtime. |
| volume | int | No | Volume. Default: 50. Range: [0, 100]. Not supported by Qwen-TTS-Realtime. |
| pitch_rate | float | No | Pitch multiplier. Default: 1.0. Range: [0.5, 2.0]. Not supported by Qwen-TTS-Realtime. |
| bit_rate | int | No | Bitrate in kbps. Higher values improve quality but increase file size. Only for opus format. Default: 128. Range: [6, 510]. Not supported by Qwen-TTS-Realtime. |
| instructions | str | No | Synthesis control instructions. See Realtime streaming TTS for details. Default: None. Max: 1600 tokens. Supports Chinese and English. Qwen3-TTS-Instruct-Flash-Realtime series only. |
| optimize_instructions | bool | No | When True, rewrites instructions for better naturalness and expressiveness. Default: False. Recommended for fine-grained voice control. Requires instructions to be set. Qwen3-TTS-Instruct-Flash-Realtime series only. |
Key interfaces
QwenTtsRealtime class
Import with from dashscope.audio.qwen_tts_realtime import QwenTtsRealtime.
| Method signature | Server response events (via callback) | Description |
|---|---|---|
def connect(self) -> None | session.created - Session created. session.updated - Session configuration updated. | Connect to the server. |
def update_session(self, voice: str, response_format: AudioFormat = AudioFormat.PCM_24000HZ_MONO_16BIT, mode: str = 'server_commit', language_type: str = "Chinese", **kwargs) -> None | session.updated - Session configuration updated. | Update session configuration. See Request parameters for details. Call immediately after connecting to override defaults. Invalid parameters return an error; valid parameters update the configuration. |
def append_text(self, text: str) -> None | None | Append text to the cloud input buffer. In server_commit mode, the server decides when to synthesize buffered text. In commit mode, call commit to trigger synthesis. |
def clear_appended_text(self) -> None | input_text_buffer.cleared - Buffer cleared. | Clear all text in the cloud buffer. |
def commit(self) -> None | input_text_buffer.committed - Text submitted. response.output_item.added - Output item added. response.content_part.added - Content part added. response.audio.delta - Incremental audio data. response.audio.done - Audio generation done. response.content_part.done - Content streaming done. response.output_item.done - Output item done. response.done - Response done. | Submit buffered text and synthesize immediately. Returns an error if the buffer is empty. Not needed in server_commit mode. Required in commit mode to trigger synthesis. |
def finish(self) -> None | session.finished - Session finished. | End the task. |
def close(self) -> None | None | Close the connection. |
def get_session_id(self) -> str | None | Get the current session ID. |
def get_last_response_id(self) -> str | None | Get the last response ID. |
def get_first_audio_delay(self) | None | Get the delay before the first audio packet. |
Callback interface (QwenTtsRealtimeCallback)
Handle server responses by implementing callback methods.
Import with from dashscope.audio.qwen_tts_realtime import QwenTtsRealtimeCallback.
| Method | Parameters | Return value | Description |
|---|---|---|---|
def on_open(self) -> None | None | None | Called when the connection is established. |
def on_event(self, message: str) -> None | message: Server response event. | None | Called for API responses and model-generated text/audio. See Server events. |
def on_close(self, close_status_code, close_msg) -> None | close_status_code: WebSocket close code. close_msg: WebSocket close message. | None | Called when the server closes the connection. |