DashScope synchronous

POST

/api/v1/services/aigc/multimodal-generation/generation

cURL

curl -X POST "https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation" \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
  "model": "qwen3-asr-flash",
  "input": {
    "messages": [
      {
        "content": [
          {
            "text": ""
          }
        ],
        "role": "system"
      },
      {
        "content": [
          {
            "audio": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
          }
        ],
        "role": "user"
      }
    ]
  },
  "parameters": {
    "asr_options": {
      "enable_itn": false
    }
  }
}'

{
  "request_id": "568e2bf0-d6f2-97f8-9f15-a57b11dc6977",
  "output": {
    "choices": [
      {
        "finish_reason": "stop",
        "message": {
          "annotations": [
            {
              "language": "zh",
              "type": "audio_info",
              "emotion": "neutral"
            }
          ],
          "content": [
            {
              "text": "Welcome to Qwen Cloud."
            }
          ],
          "role": "assistant"
        }
      }
    ]
  },
  "usage": {
    "input_tokens_details": {
      "text_tokens": 0
    },
    "output_tokens_details": {
      "text_tokens": 6
    },
    "seconds": 1
  }
}

Supported audio formats

You can pass audio as a Base64-encoded file, a local file path, or a public URL. For HTTP calls, nest the messages field inside the input object.

Authorizations

string

header

required

DashScope API key. Get your API key from Qwen Cloud console.

Body

application/json

string

required

The model name. Only applicable to Qwen3-ASR-Flash.

object

required

The input object.

Show child attributes

object[]

required

The list of messages.

Show child attributes

enum<string>

required

The role of the message sender.

Available options:system,user

object[]

required

The content of the message.

Show child attributes

string

The audio to recognize (User Message). Supports URLs, Base64-encoded data, and local file paths (SDK only).

string

Context for customized recognition (System Message). Length limit: 10,000 tokens.

object

Additional parameters.

Show child attributes

object

Specifies whether to enable certain features. Supported only by Qwen3-ASR-Flash.

Show child attributes

enum<string>

If you know the language of the audio, specify it to improve recognition accuracy. Specify only one language. If the audio contains multiple languages, do not specify this parameter.

Available options:zh,yue,en,ja,de,ko,ru,fr,pt,ar,it,es,hi,id,th,tr,uk,vi,cs,da,fil,fi,is,ms,no,pl,sv

boolean

defaultfalse

Specifies whether to enable Inverse Text Normalization (ITN). Applicable only to Chinese and English audio.

Response

200-application/json

string

The unique identifier for this call. The Java SDK returns this as requestId.

Example:568e2bf0-d6f2-97f8-9f15-a57b11dc6977

object

The call result.

Show child attributes

object[]

The model output, included when result_format is message.

Example:

[
  {
    "finish_reason": "stop",
    "message": {
      "annotations": [
        {
          "language": "zh",
          "type": "audio_info",
          "emotion": "neutral"
        }
      ],
      "content": [
        {
          "text": "Welcome to Qwen Cloud."
        }
      ],
      "role": "assistant"
    }
  }
]

Show child attributes

enum<string>

null during generation. stop when finished naturally. length when output exceeded maximum length.

Available options:stop,length,null

Example:stop

object

The message object output by the model.

Show child attributes

string

The role of the output message. Always assistant.

Example:assistant

object[]

The output message content.

Example:

[
  {
    "text": "Welcome to Qwen Cloud."
  }
]

Show child attributes

string

The speech recognition result text.

Example:Welcome to Qwen Cloud.

object[]

Output annotation information.

Example:

[
  {
    "language": "zh",
    "type": "audio_info",
    "emotion": "neutral"
  }
]

Show child attributes

string

Set to audio_info.

Example:audio_info

enum<string>

The language of the recognized audio.

Available options:zh,yue,en,ja,de,ko,ru,fr,pt,ar,it,es,hi,id,th,tr,uk,vi,cs,da,fil,fi,is,ms,no,pl,sv

Example:zh

enum<string>

The emotion of the recognized audio.

Available options:surprised,neutral,happy,sad,disgusted,angry,fearful

Example:neutral

object

Token consumption information.

Show child attributes

object

Show child attributes

integer

Ignore this parameter.

Example:0

object

Show child attributes

integer

The length of the recognized text output in tokens.

Example:6

integer

The duration of the audio in seconds.

Example:1

​Supported audio formats

Authorizations

Body

Response

Supported audio formats