Skip to main content
Realtime

Qwen-ASR realtime Python SDK

Qwen ASR Python streaming

For supported models, features, and full sample code, see Real-time speech recognition.

Prerequisites

Request parameters

OmniRealtimeConversation constructor

Create an OmniRealtimeConversation instance.
from dashscope.audio.qwen_omni import OmniRealtimeConversation, OmniRealtimeCallback

class MyCallback(OmniRealtimeCallback):
  """Callback for real-time recognition"""
  def __init__(self, conversation):
    self.conversation = conversation
    self.handlers = {
      'session.created': self._handle_session_created,
      'conversation.item.input_audio_transcription.completed': self._handle_final_text,
      'conversation.item.input_audio_transcription.text': self._handle_stash_text,
      'input_audio_buffer.speech_started': lambda r: print('======Speech Start======'),
      'input_audio_buffer.speech_stopped': lambda r: print('======Speech Stop======')
    }

  def on_open(self):
    print('Connection opened')

  def on_close(self, code, msg):
    print(f'Connection closed, code: {code}, msg: {msg}')

  def on_event(self, response):
    try:
      handler = self.handlers.get(response['type'])
      if handler:
        handler(response)
    except Exception as e:
      print(f'[Error] {e}')

  def _handle_session_created(self, response):
    print(f"Start session: {response['session']['id']}")

  def _handle_final_text(self, response):
    print(f"Final recognized text: {response['transcript']}")

  def _handle_stash_text(self, response):
    print(f"Got stash result: {response['stash']}")

conversation = OmniRealtimeConversation(
    model='qwen3-asr-flash-realtime',
    url='wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime',
    callback=MyCallback(conversation=None)  # Temporarily pass None and inject it later.
  )
# Inject self into the callback.
conversation.callback.conversation = conversation
ParameterTypeRequiredDescription
modelstrYesModel name.
callbackOmniRealtimeCallbackYesCallback object that handles server events.
urlstrYesWebSocket endpoint: wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime

Session configuration

Set session parameters after connecting.
from dashscope.audio.qwen_omni import MultiModality
from dashscope.audio.qwen_omni.omni_realtime import TranscriptionParams

transcription_params = TranscriptionParams(
  language='zh',
  sample_rate=16000,
  input_audio_format="pcm"
)

conversation.update_session(
  output_modalities=[MultiModality.TEXT],
  enable_turn_detection=True,
  turn_detection_type="server_vad",
  turn_detection_threshold=0.0,
  turn_detection_silence_duration_ms=400,
  enable_input_audio_transcription=True,
  transcription_params=transcription_params
)
ParameterTypeRequiredDescription
output_modalitiesList[MultiModality]YesOutput type. Fixed to [MultiModality.TEXT].
enable_turn_detectionboolNoEnables server-side VAD. Default: True. If False, call commit() manually to trigger recognition.
turn_detection_typestrNoVAD type. Fixed to server_vad.
turn_detection_thresholdfloatNoVAD sensitivity. Default: 0.2. Recommended: 0.0. Range: [-1, 1]. Lower values increase sensitivity but may trigger on noise. Higher values reduce false triggers.
turn_detection_silence_duration_msintNoSilence duration (ms) that marks the end of a statement. Default: 800. Recommended: 400. Range: [200, 6000]. Lower values respond faster but may split pauses. Higher values handle long pauses but add latency.
transcription_paramsTranscriptionParamsNoRecognition settings. See TranscriptionParams.

TranscriptionParams

Set recognition options with TranscriptionParams.
transcription_params = TranscriptionParams(
  language='zh',
  sample_rate=16000,
  input_audio_format="pcm"
)
ParameterTypeRequiredDescription
languagestrNoAudio language. Supported values: zh (Chinese: Mandarin, Sichuanese, Minnan, Wu), yue (Cantonese), en (English), ja (Japanese), ko (Korean), de (German), fr (French), es (Spanish), pt (Portuguese), it (Italian), ru (Russian), ar (Arabic), hi (Hindi), id (Indonesian), th (Thai), tr (Turkish), uk (Ukrainian), vi (Vietnamese), cs (Czech), da (Danish), fi (Finnish), fil (Filipino), is (Icelandic), ms (Malay), no (Norwegian), pl (Polish), sv (Swedish)
sample_rateintNoAudio sample rate (Hz). Default: 16000. Supported: 16000, 8000. The server upsamples 8 kHz to 16 kHz, which adds minor latency. Use 8000 only for 8 kHz sources like telephone audio.
input_audio_formatstrNoAudio format. Default: pcm. Supported: pcm, opus.
corpusDict[str, Any]NoContextual biasing configuration as a dictionary. For a simpler string interface, use corpus_text instead.
corpus_textstrNoReference text for contextual biasing (such as entities or domain vocabulary). Maximum: 10,000 tokens. See Contextual biasing.

Key interfaces

OmniRealtimeConversation class

from dashscope.audio.qwen_omni import OmniRealtimeConversation
MethodServer response eventDescription
connect()session.created, session.updatedOpens a WebSocket connection.
update_session(...)session.updatedSets session parameters. Call after connect(). Defaults apply if omitted. See Session configuration.
append_audio(audio_b64: str)NoneSends Base64-encoded audio to the input buffer. With enable_turn_detection=True, the server auto-commits at speech boundaries. With False, the client controls commits. Max 15 MiB per event. Smaller chunks improve VAD responsiveness.
commit()input_audio_buffer.committedCommits buffered audio for recognition. Returns an error if the buffer is empty. Disabled when enable_turn_detection=True.
end_session(timeout: int = 20)session.finishedEnds the session after the final recognition completes. In VAD mode (default), call after sending audio. In manual mode, call after commit(). Async variant: end_session_async.
close()NoneStops the task and closes the connection.
get_session_id()NoneReturns the session ID.
get_last_response_id()NoneReturns the latest response ID.

OmniRealtimeCallback interface

Subclass OmniRealtimeCallback to handle server events.
from dashscope.audio.qwen_omni import OmniRealtimeCallback
MethodParametersDescription
on_open()NoneCalled when the WebSocket connection opens.
on_event(message: dict)message: a server eventCalled when a server event arrives.
on_close(close_status_code, close_msg)close_status_code: status code; close_msg: log messageCalled when the WebSocket connection closes.
Qwen-ASR realtime Python SDK | Qwen Cloud