Qwen ASR Python streaming
For supported models, features, and full sample code, see Real-time speech recognition.
Create an
Set session parameters after connecting.
Set recognition options with
Subclass
Prerequisites
- DashScope SDK 1.25.6 or later (install)
- API key
- Familiarity with the interaction flow
Request parameters
OmniRealtimeConversation constructor
Create an OmniRealtimeConversation instance.
Click to view sample code
Click to view sample code
| Parameter | Type | Required | Description |
|---|---|---|---|
model | str | Yes | Model name. |
callback | OmniRealtimeCallback | Yes | Callback object that handles server events. |
url | str | Yes | WebSocket endpoint: wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime |
Session configuration
Set session parameters after connecting.
Click to view sample code
Click to view sample code
| Parameter | Type | Required | Description |
|---|---|---|---|
output_modalities | List[MultiModality] | Yes | Output type. Fixed to [MultiModality.TEXT]. |
enable_turn_detection | bool | No | Enables server-side VAD. Default: True. If False, call commit() manually to trigger recognition. |
turn_detection_type | str | No | VAD type. Fixed to server_vad. |
turn_detection_threshold | float | No | VAD sensitivity. Default: 0.2. Recommended: 0.0. Range: [-1, 1]. Lower values increase sensitivity but may trigger on noise. Higher values reduce false triggers. |
turn_detection_silence_duration_ms | int | No | Silence duration (ms) that marks the end of a statement. Default: 800. Recommended: 400. Range: [200, 6000]. Lower values respond faster but may split pauses. Higher values handle long pauses but add latency. |
transcription_params | TranscriptionParams | No | Recognition settings. See TranscriptionParams. |
TranscriptionParams
Set recognition options with TranscriptionParams.
Click to view sample code
Click to view sample code
| Parameter | Type | Required | Description |
|---|---|---|---|
language | str | No | Audio language. Supported values: zh (Chinese: Mandarin, Sichuanese, Minnan, Wu), yue (Cantonese), en (English), ja (Japanese), ko (Korean), de (German), fr (French), es (Spanish), pt (Portuguese), it (Italian), ru (Russian), ar (Arabic), hi (Hindi), id (Indonesian), th (Thai), tr (Turkish), uk (Ukrainian), vi (Vietnamese), cs (Czech), da (Danish), fi (Finnish), fil (Filipino), is (Icelandic), ms (Malay), no (Norwegian), pl (Polish), sv (Swedish) |
sample_rate | int | No | Audio sample rate (Hz). Default: 16000. Supported: 16000, 8000. The server upsamples 8 kHz to 16 kHz, which adds minor latency. Use 8000 only for 8 kHz sources like telephone audio. |
input_audio_format | str | No | Audio format. Default: pcm. Supported: pcm, opus. |
corpus | Dict[str, Any] | No | Contextual biasing configuration as a dictionary. For a simpler string interface, use corpus_text instead. |
corpus_text | str | No | Reference text for contextual biasing (such as entities or domain vocabulary). Maximum: 10,000 tokens. See Contextual biasing. |
Key interfaces
OmniRealtimeConversation class
| Method | Server response event | Description |
|---|---|---|
connect() | session.created, session.updated | Opens a WebSocket connection. |
update_session(...) | session.updated | Sets session parameters. Call after connect(). Defaults apply if omitted. See Session configuration. |
append_audio(audio_b64: str) | None | Sends Base64-encoded audio to the input buffer. With enable_turn_detection=True, the server auto-commits at speech boundaries. With False, the client controls commits. Max 15 MiB per event. Smaller chunks improve VAD responsiveness. |
commit() | input_audio_buffer.committed | Commits buffered audio for recognition. Returns an error if the buffer is empty. Disabled when enable_turn_detection=True. |
end_session(timeout: int = 20) | session.finished | Ends the session after the final recognition completes. In VAD mode (default), call after sending audio. In manual mode, call after commit(). Async variant: end_session_async. |
close() | None | Stops the task and closes the connection. |
get_session_id() | None | Returns the session ID. |
get_last_response_id() | None | Returns the latest response ID. |
OmniRealtimeCallback interface
Subclass OmniRealtimeCallback to handle server events.
| Method | Parameters | Description |
|---|---|---|
on_open() | None | Called when the WebSocket connection opens. |
on_event(message: dict) | message: a server event | Called when a server event arrives. |
on_close(close_status_code, close_msg) | close_status_code: status code; close_msg: log message | Called when the WebSocket connection closes. |