WebSocket server reference
Server events for the Qwen-Omni-Realtime API, including function calling events.
Server error message.
First event after connection. Contains the default session configuration.
Sent after a successful
Sent in VAD mode when speech starts in the audio buffer.
Sent in VAD mode when speech ends in the audio buffer. The server also sends
Sent when the input audio buffer is committed.
Sent after the client sends
Sent when a conversation item is created.
When input audio transcription is enabled, this event is sent frequently while the user is speaking. It provides real-time intermediate transcription results. Concatenate
Sent after audio is buffered and transcribed. Transcription uses a separate model (
Sent when input audio transcription fails (if enabled). Separate from the
Sent when the model starts generating a response.
Sent after response generation completes. The
Sent when the output modality is text-only and the model generates a text chunk.
Sent when text-only output finishes generating.
Sent when the output modality includes audio and the model generates an audio chunk.
Sent when audio output finishes generating.
Sent when the output modality includes audio and the model generates a transcript chunk.
Sent when the audio transcript finishes generating.
When the model generates the argument string for a function call in a streaming manner, the server pushes this event for each new segment. Concatenate the
Indicates that the function call arguments have been fully generated. The
Sent when a new item is created during response generation. The item type can be
Sent when an output item is complete.
Sent when a new content part is added to an assistant message during response generation.
Sent when a content part in an assistant message finishes streaming.
Reference: Real-time multimodal.
error
Server error message.
Example
string
body
Unique event identifier.
string
body
Always
error.object
body
Error details.
session.created
First event after connection. Contains the default session configuration.
Example
string
body
Unique event identifier.
string
body
Always
session.created.object
body
Session configuration.
session.updated
Sent after a successful session.update request. On error, the server sends an error event instead.
Example
string
body
Unique event identifier.
string
body
Always
session.updated.object
body
Session configuration.
input_audio_buffer.speech_started
Sent in VAD mode when speech starts in the audio buffer.
May also fire each time audio is added to the buffer before speech is detected.
Example
string
body
Unique event identifier.
string
body
Always
input_audio_buffer.speech_started.integer
body
Milliseconds from the start of audio input to the first detected speech.
string
body
User message item ID, created when speech stops. This item appends user input to the conversation history for inference.
input_audio_buffer.speech_stopped
Sent in VAD mode when speech ends in the audio buffer. The server also sends conversation.item.created to create the user message item.
Example
string
body
Unique event identifier.
string
body
Always
input_audio_buffer.speech_stopped.integer
body
Milliseconds from session start to speech end.
string
body
User message item ID (will be created).
input_audio_buffer.committed
Sent when the input audio buffer is committed.
- In VAD mode, the buffer commits automatically when the user finishes speaking.
-
In manual mode, sent after the client sends
input_audio_buffer.commit.
Example
string
body
Unique event identifier.
string
body
Always
input_audio_buffer.committed.string
body
User message item ID (will be created).
input_audio_buffer.cleared
Sent after the client sends input_audio_buffer.clear.
Example
string
body
Unique event identifier.
string
body
Always
input_audio_buffer.cleared.conversation.item.created
Sent when a conversation item is created.
Example
string
body
Unique event identifier.
string
body
Always
conversation.item.created.object
body
Conversation item.
conversation.item.input_audio_transcription.delta
When input audio transcription is enabled, this event is sent frequently while the user is speaking. It provides real-time intermediate transcription results. Concatenate text + stash to get the most complete sentence preview at any point in time.
Example
string
body
Unique event identifier.
string
body
Always
conversation.item.input_audio_transcription.delta.string
body
The ID of the associated conversation item.
integer
body
The index of the content part that contains the audio.
string
body
The confirmed text prefix. This portion of the current sentence has been confirmed by the model and will not change.
string
body
The preliminary text suffix. This temporary draft follows the confirmed portion and may be revised by the model.
string
body
The detected language of the recognized audio.
string
body
The detected emotion of the recognized audio. Valid values:
neutral, happy, sad, angry, surprised, disgusted, fearful.Example: how text and stash fields work together
Example: how text and stash fields work together
Suppose the user says: "The weather is nice today, sunny and warm."
| Time | User speech | text | stash | Display (text + stash) |
|---|---|---|---|---|
| T1 | "The weather..." | "" | "The weather" | The weather |
| T2 | "...is nice..." | "" | "The weather is nice" | The weather is nice |
| T3 | "...today," | "The weather" | " is nice today," | The weather is nice today, |
| T4 | (brief pause) | "The weather is nice today, " | "" | The weather is nice today, |
| T5 | "sunny..." | "The weather is nice today, " | "sunny" | The weather is nice today, sunny |
| T6 | "...and warm." | "The weather is nice today, " | "sunny and warm." | The weather is nice today, sunny and warm. |
| T7 | (stops) | - | - | Use conversation.item.input_audio_transcription.completed as the final result. |
conversation.item.input_audio_transcription.completed
Sent after audio is buffered and transcribed. Transcription uses a separate model (qwen3-asr-flash-realtime).
The transcribed text may differ from text processed by Qwen-Omni-Realtime. Treat it as a reference.
Example
string
body
Unique event identifier.
string
body
Always
conversation.item.input_audio_transcription.completed.string
body
User message item ID.
integer
body
Fixed to 0.
string
body
Transcribed text.
conversation.item.input_audio_transcription.failed
Sent when input audio transcription fails (if enabled). Separate from the error event.
Example
string
body
Unique event identifier.
string
body
Always
conversation.item.input_audio_transcription.failed.string
body
User message item ID.
integer
body
Fixed to 0.
object
body
Error details.
response.created
Sent when the model starts generating a response.
Example
string
body
Unique event identifier.
string
body
Always
response.created.object
body
Response object.
response.done
Sent after response generation completes. The response object contains all output items except raw audio data.
Example
string
body
Unique event identifier.
string
body
Always
response.done.object
body
Response object.
response.text.delta
Sent when the output modality is text-only and the model generates a text chunk.
Example
string
body
Unique event identifier.
string
body
Always
response.text.delta.string
body
Incremental text chunk.
string
body
Response ID.
string
body
Message item ID. Use this to associate items from the same message.
integer
body
Output item index. Fixed to 0.
integer
body
Content part index. Fixed to 0.
response.text.done
Sent when text-only output finishes generating.
Also sent when the response is interrupted, incomplete, or canceled.
Example
string
body
Unique event identifier.
string
body
Always
response.text.done.string
body
Response ID.
string
body
Message item ID.
integer
body
Output item index.
integer
body
Content part index.
string
body
Complete text output.
response.audio.delta
Sent when the output modality includes audio and the model generates an audio chunk.
Example
string
body
Unique event identifier.
string
body
Always
response.audio.delta.string
body
Response ID.
string
body
Message item ID.
integer
body
Output item index.
integer
body
Content part index.
string
body
Base64-encoded audio chunk.
response.audio.done
Sent when audio output finishes generating.
Also sent when the response is interrupted, incomplete, or canceled.
Example
string
body
Unique event identifier.
string
body
Always
response.audio.done.string
body
Response ID.
string
body
Message item ID.
integer
body
Output item index.
integer
body
Content part index.
response.audio_transcript.delta
Sent when the output modality includes audio and the model generates a transcript chunk.
Example
string
body
Unique event identifier.
string
body
Always
response.audio_transcript.delta.string
body
Response ID.
string
body
Message item ID.
integer
body
Output item index.
integer
body
Content part index.
string
body
Incremental transcript text.
response.audio_transcript.done
Sent when the audio transcript finishes generating.
Example
string
body
Unique event identifier.
string
body
Always
response.audio_transcript.done.string
body
Response ID.
string
body
Message item ID.
integer
body
Output item index.
integer
body
Content part index.
string
body
Complete transcript.
response.function_call_arguments.delta
When the model generates the argument string for a function call in a streaming manner, the server pushes this event for each new segment. Concatenate the delta fields in order. The complete content is provided in the subsequent response.function_call_arguments.done event.
Example
string
body
Unique event identifier.
string
body
Always
response.function_call_arguments.delta.string
body
Response ID.
string
body
Message item ID.
integer
body
Output item index.
string
body
Unique ID for this function invocation. Consistent with the
done event in the same turn.string
body
New segment of the argument string. Concatenate segments in order.
response.function_call_arguments.done
Indicates that the function call arguments have been fully generated. The arguments field contains the complete argument string. After receiving this event, parse the arguments and call the local tool function. Use the complete arguments from this event, not the concatenated delta result.
Example
string
body
Unique event identifier.
string
body
Always
response.function_call_arguments.done.string
body
Response ID.
string
body
Message item ID.
integer
body
Output item index.
string
body
Unique ID for this function invocation.
string
body
Name of the function that was called.
string
body
Complete arguments for the function invocation, typically a JSON string.
response.output_item.added
Sent when a new item is created during response generation. The item type can be message or function_call.
Example
string
body
Unique event identifier.
string
body
Always
response.output_item.added.string
body
Response ID.
integer
body
Output item index.
object
body
Output item.
response.output_item.done
Sent when an output item is complete.
Example
string
body
Unique event identifier.
string
body
Always
response.output_item.done.string
body
Response ID.
integer
body
Output item index.
object
body
Output item.
response.content_part.added
Sent when a new content part is added to an assistant message during response generation.
Example
string
body
Unique event identifier.
string
body
Always
response.content_part.added.string
body
Response ID.
string
body
Message item ID.
integer
body
Output item index. Fixed to 0.
integer
body
Content part index. Fixed to 0.
object
body
Content part.
response.content_part.done
Sent when a content part in an assistant message finishes streaming.
Example
string
body
Unique event identifier.
string
body
Always
response.content_part.done.string
body
Response ID.
string
body
Message item ID.
integer
body
Output item index. Fixed to 0.
integer
body
Content part index. Fixed to 0.
object
body
Content part.