WebSocket server reference
Events the server sends during a WebSocket session.
Sent when a client or server error occurs.
First event after connection. Contains default session settings.
Sent after your session.update event is processed. If processing fails, an
Sent in VAD mode when speech starts in the audio buffer.
Sent in VAD mode when speech ends in the audio buffer. Immediately followed by a
Sent when the input audio buffer is committed.
Sent when a conversation item is created.
Sent frequently with real-time recognition results.
Sends the final recognition result and marks the end of a conversation item.
Sent if recognition fails for the input audio, separate from other
Confirms that all recognition is complete. Sent after you send session.finish. You can disconnect after receiving this event.
User guide: For an overview of features and sample code, see Realtime speech recognition.
error
Sent when a client or server error occurs.
Example
string
body
Unique identifier for this event.
string
body
Event type. Always
error.object
body
Error details.
session.created
First event after connection. Contains default session settings.
Example
string
body
Unique identifier for this event.
string
body
Event type. Always
session.created.object
body
Session configuration.
session.updated
Sent after your session.update event is processed. If processing fails, an error event is sent instead.
For other parameter descriptions, see session.created.
Example
string
body
Unique identifier for this event.
string
body
Event type. Always
session.updated.input_audio_buffer.speech_started
Sent in VAD mode when speech starts in the audio buffer.
Triggered each time audio is added to the buffer, unless speech start was already detected.
Example
string
body
Unique identifier for this event.
string
body
Event type. Always
input_audio_buffer.speech_started.integer
body
Milliseconds from the buffer start to speech detection.
string
body
ID of the user message item to be created.
input_audio_buffer.speech_stopped
Sent in VAD mode when speech ends in the audio buffer. Immediately followed by a conversation.item.created event with the user message item.
Example
string
body
Unique identifier for this event.
string
body
Event type. Always
input_audio_buffer.speech_stopped.integer
body
Milliseconds from the session start to when speech stopped.
string
body
ID of the user message item created when speech stops.
input_audio_buffer.committed
Sent when the input audio buffer is committed.
- VAD mode: Sent after you finish sending audio with input_audio_buffer.append.
- Manual mode: Sent after you finish sending audio with input_audio_buffer.append and then send input_audio_buffer.commit.
Example
string
body
Unique identifier for this event.
string
body
Event type. Always
input_audio_buffer.committed.string
body
ID of the previous conversation item.
string
body
ID of the user conversation item to be created.
conversation.item.created
Sent when a conversation item is created.
Example
string
body
Unique identifier for this event.
string
body
Event type. Always
conversation.item.created.string
body
ID of the previous conversation item.
object
body
The conversation item.
conversation.item.input_audio_transcription.text
Sent frequently with real-time recognition results.
Example
string
body
Unique identifier for this event.
string
body
Event type. Always
conversation.item.input_audio_transcription.text.string
body
ID of the associated conversation item.
integer
body
Index of the content part that contains the audio.
string
body
Detected language. If you set the
language request parameter, this value matches that setting. Possible values:zh: Chinese (Mandarin, Sichuanese, Minnan, and Wu)yue: Cantoneseen: Englishja: Japanesede: Germanko: Koreanru: Russianfr: Frenchpt: Portuguesear: Arabicit: Italianes: Spanishhi: Hindiid: Indonesianth: Thaitr: Turkishuk: Ukrainianvi: Vietnamese
string
body
Detected emotion. Supported values:
surprised, neutral, happy, sad, disgusted, angry, fearful.string
body
Confirmed text prefix. The model has finalized this part and will not change it.
string
body
Pre-recognized text suffix. A temporary draft that follows the confirmed part. The model may still correct it.
To get the most complete preview, concatenate both fields:
text + stash.conversation.item.input_audio_transcription.completed
Sends the final recognition result and marks the end of a conversation item.
Example
string
body
Unique identifier for this event.
string
body
Event type. Always
conversation.item.input_audio_transcription.completed.string
body
ID of the associated conversation item.
integer
body
Index of the content part that contains the audio.
string
body
Detected language. If you set the
language request parameter, this value matches that setting. Possible values:zh: Chinese (Mandarin, Sichuanese, Minnan, and Wu)yue: Cantoneseen: Englishja: Japanesede: Germanko: Koreanru: Russianfr: Frenchpt: Portuguesear: Arabicit: Italianes: Spanishhi: Hindiid: Indonesianth: Thaitr: Turkishuk: Ukrainianvi: Vietnamese
string
body
Detected emotion. Supported values:
surprised, neutral, happy, sad, disgusted, angry, fearful.string
body
Transcription result.
conversation.item.input_audio_transcription.failed
Sent if recognition fails for the input audio, separate from other error events so you can identify which item failed.
Example
string
body
Event type. Always
conversation.item.input_audio_transcription.failed.string
body
ID of the associated conversation item.
integer
body
Index of the content part that contains the audio.
object
body
Error details.
session.finished
Confirms that all recognition is complete. Sent after you send session.finish. You can disconnect after receiving this event.
Example
string
body
Unique identifier for this event.
string
body
Event type. Always
session.finished.