Qwen-Omni client events

Events sent from the client to the server over WebSocket.

For non-realtime usage, Qwen-Omni is available through the Chat API.

session.update

Send this event after connecting to update the session configuration. The service validates your parameters and returns the full configuration or an error.

Example

{
  "event_id": "event_ToPZqeobitzUJnt3QqtWg",
  "type": "session.update",
  "session": {
  "modalities": ["text", "audio"],
  "voice": "Chelsie",
  "input_audio_format": "pcm16",
  "output_audio_format": "pcm24",
  "instructions": "You are an AI customer service agent for a five-star hotel. Please answer customer inquiries about room types, facilities, prices, and reservation policies accurately and in a friendly manner. Always respond with a professional and helpful attitude. Do not provide unconfirmed information or information beyond the scope of the hotel's services.",
  "turn_detection": {
      "type": "server_vad",
      "threshold": 0.5,
      "silence_duration_ms": 800
  },
  "enable_search": true,
  "search_options": {
      "enable_source": true
  },
  "tools": [
      {
      "type": "function",
      "function": {
          "name": "get_current_weather",
          "description": "Useful for querying the weather in a specific city.",
          "parameters": {
          "type": "object",
          "properties": {
              "location": {
              "type": "string",
              "description": "The city or district, such as Beijing, Hangzhou, or Yuhang District."
              }
          },
          "required": ["location"]
          }
      }
      }
  ],
  "seed": 1314,
  "max_tokens": 16384,
  "repetition_penalty": 1.05,
  "presence_penalty": 0.0,
  "top_k": 50,
  "top_p": 1.0,
  "temperature": 0.9
  }
}

string

body

required

Event type. Always session.update.

object

body

Session configuration.

Show properties

array

body

Output modalities. Valid values:

["text"] — Text only.
["text", "audio"] (Default) — Text and audio.

string

body

Voice for audio output. Accepts preset voices or cloned voices (Qwen3.5-Omni-Realtime only) created via the Voice cloning API. See Voice list for supported preset voices. Defaults:

Qwen3.5-Omni-Realtime: Tina
Qwen3-Omni-Flash-Realtime: Cherry
Qwen-Omni-Turbo-Realtime: Chelsie

string

body

Input audio format. Supports pcm16 only.

string

body

Output audio format:

Qwen3.5-Omni-Realtime: pcm24 only
Qwen3-Omni-Flash-Realtime: pcm24 only
Qwen-Omni-Turbo-Realtime: pcm16 only

string

body

System message defining the model's role or goal.

object

body

Voice activity detection (VAD) configuration. Set to null to disable VAD and trigger responses manually. Omit to use VAD with default parameters.

Show properties

string

body

VAD type. Valid values:

server_vad (Default): Detects the end of user speech based on acoustic features.
semantic_vad: Detects the end of user speech based on semantic validity. Filters out meaningless speech such as backchannels and background noise. Supported only by qwen3.5-omni-realtime.

float

body

VAD sensitivity. Lower values detect more sounds (including background noise). Higher values require clearer speech. Range: [-1.0, 1.0]. Default: 0.5.

integer

body

Silence duration (ms) after speech ends before the model responds. Lower values give faster responses but may trigger on brief pauses. Range: 200-6000. Default: 800.

boolean

body

Qwen3.5-Omni-Realtime only. Whether to enable web search. Default: false. When enabled, the model can autonomously determine whether a search is needed to answer the user's real-time questions.

tools and enable_search are incompatible. Do not enable both at the same time.

object

body

Web search option configuration. Takes effect only when enable_search is enabled.

Show properties

boolean

body

Whether to return a list of search result sources. Set to true to enable.

array

body

A list of tool definitions. When configured, the model can decide whether to call a tool based on user input.

Show properties

string

body

required

The value must be function.

string

body

required

The name of the custom tool function, such as get_current_weather or get_current_time.

string

body

A description of the tool function's capabilities. The model uses this field to decide whether to use the tool function.

object

body

A description of the tool function's input parameters. The model uses this field to extract the input parameters. If the tool function does not require input parameters, do not specify this field.

Show properties

string

body

required

The value must be object.

object

body

Describes the name, data type, and description of each input parameter. The key is the parameter name. The value is an object that contains type and description.

array

body

Specifies which input parameters are required.

float

body

Sampling temperature for output diversity. Higher values increase diversity; lower values increase determinism. Range: [0, 2). Set only one of temperature or top_p. Defaults:

qwen3.5-omni-realtime models: 0.7
qwen3-omni-flash-realtime models: 0.9
qwen-omni-turbo-realtime models: 1.0

qwen-omni-turbo models do not support this parameter.

float

body

Nucleus sampling threshold for output diversity. Higher values increase diversity; lower values increase determinism. Range: (0, 1.0]. Set only one of temperature or top_p. Defaults:

qwen3.5-omni-realtime models: 0.8
qwen3-omni-flash-realtime models: 1.0
qwen-omni-turbo-realtime models: 0.01

qwen-omni-turbo models do not support this parameter.

integer

body

Candidate token count for sampling. Higher values increase randomness; lower values increase determinism. If null or >100, top_k is disabled and only top_p applies. Must be >=0. Defaults:

qwen3.5-omni-realtime models: 20
qwen3-omni-flash-realtime models: 50
qwen-omni-turbo-realtime models: 20

qwen-omni-turbo models do not support this parameter.

integer

body

Maximum tokens to return. Output is truncated beyond this limit; generation itself is not affected. Defaults and maximums match the model's max output length (see Model list). Use to control word count, costs, or latency. qwen-omni-turbo models do not support this parameter.

float

body

Penalty for consecutive repetition. Higher values reduce repetition. 1.0 = no penalty. Must be >0. Defaults: qwen3.5-omni-realtime models: 1.0, qwen3-omni-flash-realtime models: 1.05. qwen-omni-turbo models do not support this parameter.

float

body

Controls repetition. Range: [-2.0, 2.0]. Defaults: qwen3.5-omni-realtime models: 1.5, qwen3-omni-flash-realtime models: 0.0. Positive values reduce repetition; negative values increase it. Use higher values for creative tasks, lower values for formal content. qwen-omni-turbo models do not support this parameter.

integer

body

Makes output deterministic. With the same seed and parameters, the model returns identical results. Range: 0 to 2^31-1. Default: -1. qwen-omni-turbo models do not support this parameter.

response.create

Tells the service to generate a model response. In VAD mode, responses are automatic and you do not need this event. The service responds with response.created, then item and content events (conversation.item.created, response.content_part.added), and finally response.done.

Example

{
  "type": "response.create",
  "event_id": "event_1718624400000"
}

string

body

required

Event type. Always response.create.

response.cancel

Cancels an ongoing response. Returns an error if no response is in progress.

Example

{
  "event_id": "event_B4o9RHSTWobB5OQdEHLTo",
  "type": "response.cancel"
}

string

body

required

Event type. Always response.cancel.

input_audio_buffer.append

Appends audio bytes to the input buffer.

Example

{
  "event_id": "event_B4o9RHSTWobB5OQdEHLTo",
  "type": "input_audio_buffer.append",
  "audio": "UklGR..."
}

string

body

required

Event type. Always input_audio_buffer.append.

string

body

required

The Base64-encoded audio data.

input_audio_buffer.commit

Submits the input audio buffer as a user message. Returns an error if the buffer is empty.

VAD mode: Automatic. You do not need this event.
Manual mode: Required to create a user message.

Submitting the buffer does not trigger a model response. The service responds with input_audio_buffer.committed.

If you have sent an input_image_buffer.append event, input_audio_buffer.commit submits the image buffer along with the audio buffer.

Example

{
  "event_id": "event_B4o9RHSTWobB5OQdEHLTo",
  "type": "input_audio_buffer.commit"
}

string

body

required

Event type. Always input_audio_buffer.commit.

input_audio_buffer.clear

Clears the audio buffer. The service responds with input_audio_buffer.cleared.

Example

{
  "event_id": "event_xxx",
  "type": "input_audio_buffer.clear"
}

string

body

required

Event type. Always input_audio_buffer.clear.

input_image_buffer.append

Adds image data to the image buffer from local files or video streams. Limits:

Format: JPG or JPEG. Recommended: 480p or 720p. Maximum: 1080p.
Size: ≤500 KB before Base64 encoding.
Encoding: Base64.
Frequency: 1 image per second.
Prerequisite: Send at least one input_audio_buffer.append event first.

The image buffer is submitted with the audio buffer through the input_audio_buffer.commit event.

Example

{
  "event_id": "event_xxx",
  "type": "input_image_buffer.append",
  "image": "xxx"
}

string

body

required

Event type. Always input_image_buffer.append.

string

body

required

The Base64-encoded image data.

conversation.item.create

Returns the execution result of a tool function to the server. After the model triggers a tool call, execute the tool function locally, send the result back using this event, then send a response.create event to trigger the model to generate the final response.

Currently, only items of the function_call_output type are supported.

Example

{
  "event_id": "event_55099cddb51b4f208cb95d1a994eef80",
  "type": "conversation.item.create",
  "item": {
    "id": "item_2a80d7682b4e473c9c2154da135041e9",
    "type": "function_call_output",
    "call_id": "call_62c24725afdb4c2680ac54",
    "output": "The weather in Beijing today is changing from haze to clear, with a temperature of 4/-4°C and a light breeze."
  }
}

string

body

required

Event type. Always conversation.item.create.

object

body

required

The conversation item to create.

Show properties

string

body

The conversation item ID. The client can specify an ID to align with the local state. If not provided, the server generates one.

string

body

required

The conversation item type. Currently, only function_call_output is supported.

string

body

required

Corresponds to the call_id returned in the response.function_call_arguments.done event.

string

body

required

The execution result of the tool function.

​session.update

​response.create

​response.cancel

​input_audio_buffer.append

​input_audio_buffer.commit

​input_audio_buffer.clear

​input_image_buffer.append

​conversation.item.create

session.update

response.create

response.cancel

input_audio_buffer.append

input_audio_buffer.commit

input_audio_buffer.clear

input_image_buffer.append

conversation.item.create