Real-time TTS Python SDK
Interfaces and parameters for real-time text-to-speech (Qwen) with the DashScope Python SDK.
Guide: See Real-time text-to-speech - Qwen or Text-to-speech - Qwen for model details.
DashScope Python SDK 1.25.11 or later.
See GitHub for more samples.
Set these in the
Set these with
Import with
Handle server responses by implementing callback methods.
Import with
Prerequisites
DashScope Python SDK 1.25.11 or later.
Getting started
- Server commit mode
- Commit mode
Request parameters
Set these in the QwenTtsRealtime constructor.
| Parameter | Type | Required | Description |
|---|---|---|---|
| model | str | Yes | Model name. |
| url | str | Yes | wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime |
update_session.
| Parameter | Type | Required | Description |
|---|---|---|---|
| voice | str | Yes | Voice for synthesis. System voices: available for the Qwen3-TTS-Instruct-Flash-Realtime, Qwen3-TTS-Flash-Realtime, and Qwen-TTS-Realtime series. Custom voices: created through Voice cloning (Qwen) (Qwen3-TTS-VC-Realtime series only) or Voice design (Qwen) (Qwen3-TTS-VD-Realtime series only). |
| language_type | str | No | Output audio language. Default: Auto. Use Auto when the language is uncertain or the text mixes languages. Supported: Chinese, English, German, Italian, Portuguese, Spanish, Japanese, Korean, French, Russian. Setting a specific language improves quality. |
| mode | str | No | Interaction mode. server_commit (default): server decides synthesis timing, balancing latency and quality. Best for most use cases. commit: client triggers synthesis manually with lower latency, but you must manage sentence boundaries. |
| format | str | No | Audio format. Supported: pcm (default), wav, mp3, opus. Qwen-TTS-Realtime supports only pcm. |
| sample_rate | int | No | Sample rate in Hz. Supported: 8000, 16000, 24000 (default), 48000. Qwen-TTS-Realtime supports only 24000. |
| speech_rate | float | No | Speech rate multiplier. Default: 1.0 (normal). Range: [0.5, 2.0]. Not supported by Qwen-TTS-Realtime. |
| volume | int | No | Volume. Default: 50. Range: [0, 100]. Not supported by Qwen-TTS-Realtime. |
| pitch_rate | float | No | Pitch multiplier. Default: 1.0. Range: [0.5, 2.0]. Not supported by Qwen-TTS-Realtime. |
| bit_rate | int | No | Bitrate in kbps. Higher values improve quality but increase file size. Only for opus format. Default: 128. Range: [6, 510]. Not supported by Qwen-TTS-Realtime. |
| instructions | str | No | Synthesis control instructions. See Realtime streaming TTS for details. Default: None. Max: 1600 tokens. Supports Chinese and English. Qwen3-TTS-Instruct-Flash-Realtime series only. |
| optimize_instructions | bool | No | When True, rewrites instructions for better naturalness and expressiveness. Default: False. Recommended for fine-grained voice control. Requires instructions to be set. Qwen3-TTS-Instruct-Flash-Realtime series only. |
Key interfaces
QwenTtsRealtime class
Import with from dashscope.audio.qwen_tts_realtime import QwenTtsRealtime.
| Method signature | Server response events (via callback) | Description |
|---|---|---|
def connect(self) -> None | session.created - Session created. session.updated - Session configuration updated. | Connect to the server. |
def update_session(self, voice: str, response_format: AudioFormat = AudioFormat.PCM_24000HZ_MONO_16BIT, mode: str = 'server_commit', language_type: str = "Chinese", **kwargs) -> None | session.updated - Session configuration updated. | Update session configuration. See Request parameters for details. Call immediately after connecting to override defaults. Invalid parameters return an error; valid parameters update the configuration. |
def append_text(self, text: str) -> None | None | Append text to the cloud input buffer. In server_commit mode, the server decides when to synthesize buffered text. In commit mode, call commit to trigger synthesis. |
def clear_appended_text(self) -> None | input_text_buffer.cleared - Buffer cleared. | Clear all text in the cloud buffer. |
def commit(self) -> None | input_text_buffer.committed - Text submitted. response.output_item.added - Output item added. response.content_part.added - Content part added. response.audio.delta - Incremental audio data. response.audio.done - Audio generation done. response.content_part.done - Content streaming done. response.output_item.done - Output item done. response.done - Response done. | Submit buffered text and synthesize immediately. Returns an error if the buffer is empty. Not needed in server_commit mode. Required in commit mode to trigger synthesis. |
def finish(self) -> None | session.finished - Session finished. | End the task. |
def close(self) -> None | None | Close the connection. |
def get_session_id(self) -> str | None | Get the current session ID. |
def get_last_response_id(self) -> str | None | Get the last response ID. |
def get_first_audio_delay(self) | None | Get the delay before the first audio packet. |
Callback interface (QwenTtsRealtimeCallback)
Handle server responses by implementing callback methods.
Import with from dashscope.audio.qwen_tts_realtime import QwenTtsRealtimeCallback.
| Method | Parameters | Return value | Description |
|---|---|---|---|
def on_open(self) -> None | None | None | Called when the connection is established. |
def on_event(self, message: str) -> None | message: Server response event. | None | Called for API responses and model-generated text/audio. See Server events. |
def on_close(self, close_status_code, close_msg) -> None | close_status_code: WebSocket close code. close_msg: WebSocket close message. | None | Called when the server closes the connection. |