Qwen-ASR server events

Events the server sends during a WebSocket session.

User guide: For an overview of features and sample code, see Realtime speech recognition.

error

Sent when a client or server error occurs.

Example

{
  "event_id": "event_B2uoU7VOt1AAITsPRPH9n",
  "type": "error",
  "error": {
  "type": "invalid_request_error",
  "code": "invalid_value",
  "message": "Invalid value: 'whisper-1xx'. Supported values are: 'whisper-1'.",
  "param": "session.input_audio_transcription.model",
  "event_id": "event_123"
  }
}

string

body

Unique identifier for this event.

string

body

Event type. Always error.

object

body

Error details.

Show properties

string

body

Error type.

string

body

Error code.

string

body

Error message. For solutions, see Error messages.

string

body

Parameter related to the error.

string

body

Event ID related to the error.

session.created

First event after connection. Contains default session settings.

Example

{
  "event_id": "event_1234",
  "type": "session.created",
  "session": {
  "id": "sess_001",
  "object": "realtime.session",
  "model": "qwen3-asr-flash-realtime",
  "modalities": ["text"],
  "input_audio_format": "pcm16",
  "input_audio_transcription": null,
  "turn_detection": {
      "type": "server_vad",
      "threshold": 0.5,
      "silence_duration_ms": 200
  }
  }
}

string

body

Unique identifier for this event.

string

body

Event type. Always session.created.

object

body

Session configuration.

Show properties

string

body

ID of the current WebSocket session.

string

body

Always realtime.session.

string

body

Model name.

array

body

Output modality. Always ["text"].

string

body

Input audio format.

object

body

Speech recognition settings. For details, see the input_audio_transcription parameter of the session.update client event.

object

body

Voice Activity Detection (VAD) settings.

Show properties

string

body

Always server_vad.

float

body

VAD detection threshold.

integer

body

Silence duration in milliseconds before a sentence break is detected.

session.updated

Sent after your session.update event is processed. If processing fails, an error event is sent instead. For other parameter descriptions, see session.created.

Example

{
  "event_id": "event_1234",
  "type": "session.updated",
  "session": {
  "id": "sess_001",
  "object": "realtime.session",
  "model": "qwen3-asr-flash-realtime",
  "modalities": ["text"],
  "input_audio_format": "pcm16",
  "input_audio_transcription": null,
  "turn_detection": {
      "type": "server_vad",
      "threshold": 0.5,
      "silence_duration_ms": 200
  }
  }
}

string

body

Unique identifier for this event.

string

body

Event type. Always session.updated.

input_audio_buffer.speech_started

Sent in VAD mode when speech starts in the audio buffer.

Triggered each time audio is added to the buffer, unless speech start was already detected.

Example

{
  "event_id": "event_B1lV7FPbgTv9qGxPI1tH4",
  "type": "input_audio_buffer.speech_started",
  "audio_start_ms": 64,
  "item_id": "item_B1lV7jWLscp4mMV8hSs8c"
}

string

body

Unique identifier for this event.

string

body

Event type. Always input_audio_buffer.speech_started.

integer

body

Milliseconds from the buffer start to speech detection.

string

body

ID of the user message item to be created.

input_audio_buffer.speech_stopped

Sent in VAD mode when speech ends in the audio buffer. Immediately followed by a conversation.item.created event with the user message item.

Example

{
  "event_id": "event_B3GGEYh2orwNIdhUagZPz",
  "type": "input_audio_buffer.speech_stopped",
  "audio_end_ms": 28128,
  "item_id": "item_B3GGE8ry4yqbqJGzrVhEM"
}

string

body

Unique identifier for this event.

string

body

Event type. Always input_audio_buffer.speech_stopped.

integer

body

Milliseconds from the session start to when speech stopped.

string

body

ID of the user message item created when speech stops.

input_audio_buffer.committed

Sent when the input audio buffer is committed.

VAD mode: Sent after you finish sending audio with input_audio_buffer.append.
Manual mode: Sent after you finish sending audio with input_audio_buffer.append and then send input_audio_buffer.commit.

Example

{
  "event_id": "event_1121",
  "type": "input_audio_buffer.committed",
  "previous_item_id": "msg_001",
  "item_id": "msg_002"
}

string

body

Unique identifier for this event.

string

body

Event type. Always input_audio_buffer.committed.

string

body

ID of the previous conversation item.

string

body

ID of the user conversation item to be created.

conversation.item.created

Sent when a conversation item is created.

Example

{
  "type": "conversation.item.created",
  "event_id": "event_B3GGKbCfBZTpqFHZ0P8vg",
  "previous_item_id": "item_B3GGE8ry4yqbqJGzrVhEM",
  "item": {
  "id": "item_B3GGEPlolCqdMiVbYIf5L",
  "object": "realtime.item",
  "type": "message",
  "status": "completed",
  "role": "user",
  "content": [
      {
    "type": "input_audio",
    "transcript": null
      }
  ]
  }
}

string

body

Unique identifier for this event.

string

body

Event type. Always conversation.item.created.

string

body

ID of the previous conversation item.

object

body

The conversation item.

Show properties

string

body

Unique ID of the conversation item.

string

body

Always realtime.item.

string

body

Always message.

string

body

Status of the conversation item.

string

body

Role of the message sender.

array

body

Message content.

Show properties

string

body

Always input_audio.

string

body

Always null. The final result is in the conversation.item.input_audio_transcription.completed event.

conversation.item.input_audio_transcription.text

Sent frequently with real-time recognition results.

Example

{
  "event_id": "event_R7Pfu8QVBfP5HmpcbEFSd",
  "type": "conversation.item.input_audio_transcription.text",
  "item_id": "item_MpJQPNQzqVRc9aC9zMwSj",
  "content_index": 0,
  "language": "en",
  "emotion": "neutral",
  "text": "",
  "stash": "Beijing's"
}

string

body

Unique identifier for this event.

string

body

Event type. Always conversation.item.input_audio_transcription.text.

string

body

ID of the associated conversation item.

integer

body

Index of the content part that contains the audio.

string

body

Detected language. If you set the language request parameter, this value matches that setting. Possible values:

zh: Chinese (Mandarin, Sichuanese, Minnan, and Wu)
yue: Cantonese
en: English
ja: Japanese
de: German
ko: Korean
ru: Russian
fr: French
pt: Portuguese
ar: Arabic
it: Italian
es: Spanish
hi: Hindi
id: Indonesian
th: Thai
tr: Turkish
uk: Ukrainian
vi: Vietnamese

string

body

Detected emotion. Supported values: surprised, neutral, happy, sad, disgusted, angry, fearful.

string

body

Confirmed text prefix. The model has finalized this part and will not change it.

string

body

Pre-recognized text suffix. A temporary draft that follows the confirmed part. The model may still correct it.

To get the most complete preview, concatenate both fields: text + stash.

Show Click to view an example

Assume a user says, "The weather is nice today, sunny and bright." This table shows the events you might receive:

Timestamp	User speech progress	API response (`text` and `stash`)	UI display (`text + stash`)
T1	"The..."	`text`: `""` / `stash`: `"The"`	The
T2	"...weather is..."	`text`: `""` / `stash`: `"The weather is"`	The weather is
T3	"...nice today"	`text`: `"The"` / `stash`: `"weather is nice today"`	The weather is nice today
T4	(Short pause)	`text`: `"The weather is nice today,"` / `stash`: `""`	The weather is nice today,
T5	"...sunny and..."	`text`: `"The weather is nice today,"` / `stash`: `"sunny and"`	The weather is nice today, sunny and
T6	"...bright."	`text`: `"The weather is nice today,"` / `stash`: `"sunny and bright."`	The weather is nice today, sunny and bright.
T7	(User stops speaking)	-	Use `transcript` from the conversation.item.input_audio_transcription.completed event as the final result.

conversation.item.input_audio_transcription.completed

Sends the final recognition result and marks the end of a conversation item.

Example

{
  "event_id": "event_B3GGEjPT2sLzjBM74W6kB",
  "type": "conversation.item.input_audio_transcription.completed",
  "item_id": "item_B3GGC53jGOuIFcjZkmEQ9",
  "content_index": 0,
  "language": "en",
  "emotion": "neutral",
  "transcript": "What's the weather like today?"
}

string

body

Unique identifier for this event.

string

body

Event type. Always conversation.item.input_audio_transcription.completed.

string

body

ID of the associated conversation item.

integer

body

Index of the content part that contains the audio.

string

body

Detected language. If you set the language request parameter, this value matches that setting. Possible values:

zh: Chinese (Mandarin, Sichuanese, Minnan, and Wu)
yue: Cantonese
en: English
ja: Japanese
de: German
ko: Korean
ru: Russian
fr: French
pt: Portuguese
ar: Arabic
it: Italian
es: Spanish
hi: Hindi
id: Indonesian
th: Thai
tr: Turkish
uk: Ukrainian
vi: Vietnamese

string

body

Detected emotion. Supported values: surprised, neutral, happy, sad, disgusted, angry, fearful.

string

body

Transcription result.

conversation.item.input_audio_transcription.failed

Sent if recognition fails for the input audio, separate from other error events so you can identify which item failed.

Example

{
  "type": "conversation.item.input_audio_transcription.failed",
  "item_id": "<item_id>",
  "content_index": 0,
  "error": {
  "code": "<code>",
  "message": "<message>",
  "param": "<param>"
  }
}

string

body

Event type. Always conversation.item.input_audio_transcription.failed.

string

body

ID of the associated conversation item.

integer

body

Index of the content part that contains the audio.

object

body

Error details.

Show properties

string

body

Error code.

string

body

Error message.

string

body

Parameter related to the error.

session.finished

Confirms that all recognition is complete. Sent after you send session.finish. You can disconnect after receiving this event.

Example

{
  "event_id": "event_2239",
  "type": "session.finished"
}

string

body

Unique identifier for this event.

string

body

Event type. Always session.finished.

​error

​session.created

​session.updated

​input_audio_buffer.speech_started

​input_audio_buffer.speech_stopped

​input_audio_buffer.committed

​conversation.item.created

​conversation.item.input_audio_transcription.text

​conversation.item.input_audio_transcription.completed

​conversation.item.input_audio_transcription.failed

​session.finished

error

session.created

session.updated

input_audio_buffer.speech_started

input_audio_buffer.speech_stopped

input_audio_buffer.committed

conversation.item.created

conversation.item.input_audio_transcription.text

conversation.item.input_audio_transcription.completed

conversation.item.input_audio_transcription.failed

session.finished