WebSocket server reference
Server events for the Qwen-Omni-Realtime API.
Server error message.
First event after connection. Contains the default session configuration.
Sent after a successful
Sent in VAD mode when speech starts in the audio buffer.
Sent in VAD mode when speech ends in the audio buffer. The server also sends
Sent when the input audio buffer is committed.
Sent after the client sends
Sent when a conversation item is created.
Sent after audio is buffered and transcribed. Transcription uses a separate model (
Sent when input audio transcription fails (if enabled). Separate from the
Sent when the model starts generating a response.
Sent after response generation completes. The
Sent when the output modality is text-only and the model generates a text chunk.
Sent when text-only output finishes generating.
Sent when the output modality includes audio and the model generates an audio chunk.
Sent when audio output finishes generating.
Sent when the output modality includes audio and the model generates a transcript chunk.
Sent when the audio transcript finishes generating.
Sent when a new output item is created during response generation.
Sent when an output item is complete.
Sent when a new content part is added to an assistant message during response generation.
Sent when a content part in an assistant message finishes streaming.
Reference: Real-time multimodal.
error
Server error message.
Example
string
body
Unique event identifier.
string
body
Always
error.object
body
Error details.
session.created
First event after connection. Contains the default session configuration.
Example
string
body
Unique event identifier.
string
body
Always
session.created.object
body
Session configuration.
session.updated
Sent after a successful session.update request. On error, the server sends an error event instead.
Example
string
body
Unique event identifier.
string
body
Always
session.updated.object
body
Session configuration.
input_audio_buffer.speech_started
Sent in VAD mode when speech starts in the audio buffer.
May also fire each time audio is added to the buffer before speech is detected.
Example
string
body
Unique event identifier.
string
body
Always
input_audio_buffer.speech_started.integer
body
Milliseconds from the start of audio input to the first detected speech.
string
body
User message item ID, created when speech stops. This item appends user input to the conversation history for inference.
input_audio_buffer.speech_stopped
Sent in VAD mode when speech ends in the audio buffer. The server also sends conversation.item.created to create the user message item.
Example
string
body
Unique event identifier.
string
body
Always
input_audio_buffer.speech_stopped.integer
body
Milliseconds from session start to speech end.
string
body
User message item ID (will be created).
input_audio_buffer.committed
Sent when the input audio buffer is committed.
- In VAD mode, the buffer commits automatically when the user finishes speaking.
-
In manual mode, sent after the client sends
input_audio_buffer.commit.
Example
string
body
Unique event identifier.
string
body
Always
input_audio_buffer.committed.string
body
User message item ID (will be created).
input_audio_buffer.cleared
Sent after the client sends input_audio_buffer.clear.
Example
string
body
Unique event identifier.
string
body
Always
input_audio_buffer.cleared.conversation.item.created
Sent when a conversation item is created.
Example
string
body
Unique event identifier.
string
body
Always
conversation.item.created.object
body
Conversation item.
conversation.item.input_audio_transcription.completed
Sent after audio is buffered and transcribed. Transcription uses a separate model (gummy-realtime-v1).
The transcribed text may differ from text processed by Qwen-Omni-Realtime. Treat it as a reference.
Example
string
body
Unique event identifier.
string
body
Always
conversation.item.input_audio_transcription.completed.string
body
User message item ID.
integer
body
Fixed to 0.
string
body
Transcribed text.
conversation.item.input_audio_transcription.failed
Sent when input audio transcription fails (if enabled). Separate from the error event.
Example
string
body
Unique event identifier.
string
body
Always
conversation.item.input_audio_transcription.failed.string
body
User message item ID.
integer
body
Fixed to 0.
object
body
Error details.
response.created
Sent when the model starts generating a response.
Example
string
body
Unique event identifier.
string
body
Always
response.created.object
body
Response object.
response.done
Sent after response generation completes. The response object contains all output items except raw audio data.
Example
string
body
Unique event identifier.
string
body
Always
response.done.object
body
Response object.
response.text.delta
Sent when the output modality is text-only and the model generates a text chunk.
Example
string
body
Unique event identifier.
string
body
Always
response.text.delta.string
body
Incremental text chunk.
string
body
Response ID.
string
body
Message item ID. Use this to associate items from the same message.
integer
body
Output item index. Fixed to 0.
integer
body
Content part index. Fixed to 0.
response.text.done
Sent when text-only output finishes generating.
Also sent when the response is interrupted, incomplete, or canceled.
Example
string
body
Unique event identifier.
string
body
Always
response.text.done.string
body
Response ID.
string
body
Message item ID.
integer
body
Output item index.
integer
body
Content part index.
string
body
Complete text output.
response.audio.delta
Sent when the output modality includes audio and the model generates an audio chunk.
Example
string
body
Unique event identifier.
string
body
Always
response.audio.delta.string
body
Response ID.
string
body
Message item ID.
integer
body
Output item index.
integer
body
Content part index.
string
body
Base64-encoded audio chunk.
response.audio.done
Sent when audio output finishes generating.
Also sent when the response is interrupted, incomplete, or canceled.
Example
string
body
Unique event identifier.
string
body
Always
response.audio.done.string
body
Response ID.
string
body
Message item ID.
integer
body
Output item index.
integer
body
Content part index.
response.audio_transcript.delta
Sent when the output modality includes audio and the model generates a transcript chunk.
Example
string
body
Unique event identifier.
string
body
Always
response.audio_transcript.delta.string
body
Response ID.
string
body
Message item ID.
integer
body
Output item index.
integer
body
Content part index.
string
body
Incremental transcript text.
response.audio_transcript.done
Sent when the audio transcript finishes generating.
Example
string
body
Unique event identifier.
string
body
Always
response.audio_transcript.done.string
body
Response ID.
string
body
Message item ID.
integer
body
Output item index.
integer
body
Content part index.
string
body
Complete transcript.
response.output_item.added
Sent when a new output item is created during response generation.
Example
string
body
Unique event identifier.
string
body
Always
response.output_item.added.string
body
Response ID.
integer
body
Output item index.
object
body
Output item.
response.output_item.done
Sent when an output item is complete.
Example
string
body
Unique event identifier.
string
body
Always
response.output_item.done.string
body
Response ID.
integer
body
Output item index.
object
body
Output item.
response.content_part.added
Sent when a new content part is added to an assistant message during response generation.
Example
string
body
Unique event identifier.
string
body
Always
response.content_part.added.string
body
Response ID.
string
body
Message item ID.
integer
body
Output item index. Fixed to 0.
integer
body
Content part index. Fixed to 0.
object
body
Content part.
response.content_part.done
Sent when a content part in an assistant message finishes streaming.
Example
string
body
Unique event identifier.
string
body
Always
response.content_part.done.string
body
Response ID.
string
body
Message item ID.
integer
body
Output item index. Fixed to 0.
integer
body
Content part index. Fixed to 0.
object
body
Content part.