Skip to main content
Realtime

Qwen-TTS realtime Python SDK

Real-time TTS Python SDK

Interfaces and parameters for real-time text-to-speech (Qwen) with the DashScope Python SDK. Guide: See Real-time text-to-speech - Qwen or Text-to-speech - Qwen for model details.

Prerequisites

DashScope Python SDK 1.25.11 or later.

Getting started

  • Server commit mode
  • Commit mode
import os
import base64
import threading
import time
import dashscope
from dashscope.audio.qwen_tts_realtime import *


qwen_tts_realtime: QwenTtsRealtime = None
text_to_synthesize = [
  'Right? I love supermarkets like this.',
  'Especially during Chinese New Year,',
  'I go shopping at supermarkets.',
  'And I feel',
  'absolutely thrilled!',
  'I want to buy so many things!'
]

DO_VIDEO_TEST = False

def init_dashscope_api_key():
  """
    Set your DashScope API key. More information:
    https://github.com/aliyun/alibabacloud-bailian-speech-demo/blob/master/PREREQUISITES.md
  """

  if 'DASHSCOPE_API_KEY' in os.environ:
    dashscope.api_key = os.environ[
      'DASHSCOPE_API_KEY']  # Load API key from environment variable DASHSCOPE_API_KEY
  else:
    dashscope.api_key = 'your-dashscope-api-key'  # Set API key manually



class MyCallback(QwenTtsRealtimeCallback):
  def __init__(self):
    self.complete_event = threading.Event()
    self.file = open('result_24k.pcm', 'wb')

  def on_open(self) -> None:
    print('connection opened, init player')

  def on_close(self, close_status_code, close_msg) -> None:
    self.file.close()
    print('connection closed with code: {}, msg: {}, destroy player'.format(close_status_code, close_msg))

  def on_event(self, response: str) -> None:
    try:
      global qwen_tts_realtime
      type = response['type']
      if 'session.created' == type:
        print('start session: {}'.format(response['session']['id']))
      if 'response.audio.delta' == type:
        recv_audio_b64 = response['delta']
        self.file.write(base64.b64decode(recv_audio_b64))
      if 'response.done' == type:
        print(f'response {qwen_tts_realtime.get_last_response_id()} done')
      if 'session.finished' == type:
        print('session finished')
        self.complete_event.set()
    except Exception as e:
      print('[Error] {}'.format(e))
      return

  def wait_for_finished(self):
    self.complete_event.wait()


if __name__  == '__main__':
  init_dashscope_api_key()

  print('Initializing ...')

  callback = MyCallback()

  qwen_tts_realtime = QwenTtsRealtime(
    # To use instruction control, replace the model with qwen3-tts-instruct-flash-realtime
    model='qwen3-tts-flash-realtime',
    callback=callback,
    url='wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime'
    )

  qwen_tts_realtime.connect()
  qwen_tts_realtime.update_session(
    voice = 'Cherry',
    response_format = AudioFormat.PCM_24000HZ_MONO_16BIT,
    # To use instruction control, uncomment the following lines and replace the model with qwen3-tts-instruct-flash-realtime
    # instructions='Speak quickly with a rising intonation, suitable for introducing fashion products.',
    # optimize_instructions=True,
    mode = 'server_commit'
  )
  for text_chunk in text_to_synthesize:
    print(f'send text: {text_chunk}')
    qwen_tts_realtime.append_text(text_chunk)
    time.sleep(0.1)
  qwen_tts_realtime.finish()
  callback.wait_for_finished()
  print('[Metric] session: {}, first audio delay: {}'.format(
          qwen_tts_realtime.get_session_id(),
          qwen_tts_realtime.get_first_audio_delay(),
          ))
See GitHub for more samples.

Request parameters

Set these in the QwenTtsRealtime constructor.
ParameterTypeRequiredDescription
modelstrYesModel name.
urlstrYeswss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime
Set these with update_session.
ParameterTypeRequiredDescription
voicestrYesVoice for synthesis. System voices: available for the Qwen3-TTS-Instruct-Flash-Realtime, Qwen3-TTS-Flash-Realtime, and Qwen-TTS-Realtime series. Custom voices: created through Voice cloning (Qwen) (Qwen3-TTS-VC-Realtime series only) or Voice design (Qwen) (Qwen3-TTS-VD-Realtime series only).
language_typestrNoOutput audio language. Default: Auto. Use Auto when the language is uncertain or the text mixes languages. Supported: Chinese, English, German, Italian, Portuguese, Spanish, Japanese, Korean, French, Russian. Setting a specific language improves quality.
modestrNoInteraction mode. server_commit (default): server decides synthesis timing, balancing latency and quality. Best for most use cases. commit: client triggers synthesis manually with lower latency, but you must manage sentence boundaries.
formatstrNoAudio format. Supported: pcm (default), wav, mp3, opus. Qwen-TTS-Realtime supports only pcm.
sample_rateintNoSample rate in Hz. Supported: 8000, 16000, 24000 (default), 48000. Qwen-TTS-Realtime supports only 24000.
speech_ratefloatNoSpeech rate multiplier. Default: 1.0 (normal). Range: [0.5, 2.0]. Not supported by Qwen-TTS-Realtime.
volumeintNoVolume. Default: 50. Range: [0, 100]. Not supported by Qwen-TTS-Realtime.
pitch_ratefloatNoPitch multiplier. Default: 1.0. Range: [0.5, 2.0]. Not supported by Qwen-TTS-Realtime.
bit_rateintNoBitrate in kbps. Higher values improve quality but increase file size. Only for opus format. Default: 128. Range: [6, 510]. Not supported by Qwen-TTS-Realtime.
instructionsstrNoSynthesis control instructions. See Realtime streaming TTS for details. Default: None. Max: 1600 tokens. Supports Chinese and English. Qwen3-TTS-Instruct-Flash-Realtime series only.
optimize_instructionsboolNoWhen True, rewrites instructions for better naturalness and expressiveness. Default: False. Recommended for fine-grained voice control. Requires instructions to be set. Qwen3-TTS-Instruct-Flash-Realtime series only.

Key interfaces

QwenTtsRealtime class

Import with from dashscope.audio.qwen_tts_realtime import QwenTtsRealtime.
Method signatureServer response events (via callback)Description
def connect(self) -> Nonesession.created - Session created. session.updated - Session configuration updated.Connect to the server.
def update_session(self, voice: str, response_format: AudioFormat = AudioFormat.PCM_24000HZ_MONO_16BIT, mode: str = 'server_commit', language_type: str = "Chinese", **kwargs) -> Nonesession.updated - Session configuration updated.Update session configuration. See Request parameters for details. Call immediately after connecting to override defaults. Invalid parameters return an error; valid parameters update the configuration.
def append_text(self, text: str) -> NoneNoneAppend text to the cloud input buffer. In server_commit mode, the server decides when to synthesize buffered text. In commit mode, call commit to trigger synthesis.
def clear_appended_text(self) -> Noneinput_text_buffer.cleared - Buffer cleared.Clear all text in the cloud buffer.
def commit(self) -> Noneinput_text_buffer.committed - Text submitted. response.output_item.added - Output item added. response.content_part.added - Content part added. response.audio.delta - Incremental audio data. response.audio.done - Audio generation done. response.content_part.done - Content streaming done. response.output_item.done - Output item done. response.done - Response done.Submit buffered text and synthesize immediately. Returns an error if the buffer is empty. Not needed in server_commit mode. Required in commit mode to trigger synthesis.
def finish(self) -> Nonesession.finished - Session finished.End the task.
def close(self) -> NoneNoneClose the connection.
def get_session_id(self) -> strNoneGet the current session ID.
def get_last_response_id(self) -> strNoneGet the last response ID.
def get_first_audio_delay(self)NoneGet the delay before the first audio packet.

Callback interface (QwenTtsRealtimeCallback)

Handle server responses by implementing callback methods. Import with from dashscope.audio.qwen_tts_realtime import QwenTtsRealtimeCallback.
MethodParametersReturn valueDescription
def on_open(self) -> NoneNoneNoneCalled when the connection is established.
def on_event(self, message: str) -> Nonemessage: Server response event.NoneCalled for API responses and model-generated text/audio. See Server events.
def on_close(self, close_status_code, close_msg) -> Noneclose_status_code: WebSocket close code. close_msg: WebSocket close message.NoneCalled when the server closes the connection.