Skip to main content
Realtime

Qwen-TTS realtime Python SDK

Real-time TTS Python SDK

Interfaces and parameters for real-time text-to-speech (Qwen) with the DashScope Python SDK. Guide: See Real-time text-to-speech - Qwen or Text-to-speech - Qwen for model details.

Getting started

Two interaction modes are available: server commit (server decides synthesis timing) and commit (client triggers synthesis manually). For complete examples and tutorials, see Real-time text-to-speech.

Request parameters

Set these in the QwenTtsRealtime constructor.
ParameterTypeRequiredDescription
modelstrYesModel name.
urlstrYeswss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime
Set these with update_session.
ParameterTypeRequiredDescription
voicestrYesVoice for synthesis. System voices: available for the Qwen3-TTS-Instruct-Flash-Realtime, Qwen3-TTS-Flash-Realtime, and Qwen-TTS-Realtime series. Custom voices: created through Voice cloning (Qwen) (Qwen3-TTS-VC-Realtime series only) or Voice design (Qwen) (Qwen3-TTS-VD-Realtime series only).
language_typestrNoOutput audio language. Default: Auto. Use Auto when the language is uncertain or the text mixes languages. Supported: Chinese, English, German, Italian, Portuguese, Spanish, Japanese, Korean, French, Russian. Setting a specific language improves quality.
modestrNoInteraction mode. server_commit (default): server decides synthesis timing, balancing latency and quality. Best for most use cases. commit: client triggers synthesis manually with lower latency, but you must manage sentence boundaries.
formatstrNoAudio format. Supported: pcm (default), wav, mp3, opus. Qwen-TTS-Realtime supports only pcm.
sample_rateintNoSample rate in Hz. Supported: 8000, 16000, 24000 (default), 48000. Qwen-TTS-Realtime supports only 24000.
speech_ratefloatNoSpeech rate multiplier. Default: 1.0 (normal). Range: [0.5, 2.0]. Not supported by Qwen-TTS-Realtime.
volumeintNoVolume. Default: 50. Range: [0, 100]. Not supported by Qwen-TTS-Realtime.
pitch_ratefloatNoPitch multiplier. Default: 1.0. Range: [0.5, 2.0]. Not supported by Qwen-TTS-Realtime.
bit_rateintNoBitrate in kbps. Higher values improve quality but increase file size. Only for opus format. Default: 128. Range: [6, 510]. Not supported by Qwen-TTS-Realtime.
instructionsstrNoSynthesis control instructions. See Realtime streaming TTS for details. Default: None. Max: 1600 tokens. Supports Chinese and English. Qwen3-TTS-Instruct-Flash-Realtime series only.
optimize_instructionsboolNoWhen True, rewrites instructions for better naturalness and expressiveness. Default: False. Recommended for fine-grained voice control. Requires instructions to be set. Qwen3-TTS-Instruct-Flash-Realtime series only.

Key interfaces

QwenTtsRealtime class

Import with from dashscope.audio.qwen_tts_realtime import QwenTtsRealtime.
Method signatureServer response events (via callback)Description
def connect(self) -> Nonesession.created - Session created. session.updated - Session configuration updated.Connect to the server.
def update_session(self, voice: str, response_format: AudioFormat = AudioFormat.PCM_24000HZ_MONO_16BIT, mode: str = 'server_commit', language_type: str = "Chinese", **kwargs) -> Nonesession.updated - Session configuration updated.Update session configuration. See Request parameters for details. Call immediately after connecting to override defaults. Invalid parameters return an error; valid parameters update the configuration.
def append_text(self, text: str) -> NoneNoneAppend text to the cloud input buffer. In server_commit mode, the server decides when to synthesize buffered text. In commit mode, call commit to trigger synthesis.
def clear_appended_text(self) -> Noneinput_text_buffer.cleared - Buffer cleared.Clear all text in the cloud buffer.
def commit(self) -> Noneinput_text_buffer.committed - Text submitted. response.output_item.added - Output item added. response.content_part.added - Content part added. response.audio.delta - Incremental audio data. response.audio.done - Audio generation done. response.content_part.done - Content streaming done. response.output_item.done - Output item done. response.done - Response done.Submit buffered text and synthesize immediately. Returns an error if the buffer is empty. Not needed in server_commit mode. Required in commit mode to trigger synthesis.
def finish(self) -> Nonesession.finished - Session finished.End the task.
def close(self) -> NoneNoneClose the connection.
def get_session_id(self) -> strNoneGet the current session ID.
def get_last_response_id(self) -> strNoneGet the last response ID.
def get_first_audio_delay(self)NoneGet the delay before the first audio packet.

Callback interface (QwenTtsRealtimeCallback)

Handle server responses by implementing callback methods. Import with from dashscope.audio.qwen_tts_realtime import QwenTtsRealtimeCallback.
MethodParametersReturn valueDescription
def on_open(self) -> NoneNoneNoneCalled when the connection is established.
def on_event(self, message: str) -> Nonemessage: Server response event.NoneCalled for API responses and model-generated text/audio. See Server events.
def on_close(self, close_status_code, close_msg) -> Noneclose_status_code: WebSocket close code. close_msg: WebSocket close message.NoneCalled when the server closes the connection.