curl -X POST "https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation" \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3-asr-flash",
"input": {
"messages": [
{
"content": [
{
"text": ""
}
],
"role": "system"
},
{
"content": [
{
"audio": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
}
],
"role": "user"
}
]
},
"parameters": {
"asr_options": {
"enable_itn": false
}
}
}'{
"request_id": "568e2bf0-d6f2-97f8-9f15-a57b11dc6977",
"output": {
"choices": [
{
"finish_reason": "stop",
"message": {
"annotations": [
{
"language": "zh",
"type": "audio_info",
"emotion": "neutral"
}
],
"content": [
{
"text": "Welcome to Qwen Cloud."
}
],
"role": "assistant"
}
}
]
},
"usage": {
"input_tokens_details": {
"text_tokens": 0
},
"output_tokens_details": {
"text_tokens": 6
},
"seconds": 1
}
}Supported audio formats
You can pass audio as a Base64-encoded file, a local file path, or a public URL. For HTTP calls, nest the messages field inside the input object.Authorizations
DashScope API key. Get your API key from Qwen Cloud console.
Body
application/jsonThe model name. Only applicable to Qwen3-ASR-Flash.
The input object.
Show child attributes
Show child attributes
The list of messages.
Show child attributes
Show child attributes
The role of the message sender.
Additional parameters.
Show child attributes
Show child attributes
Specifies whether to enable certain features. Supported only by Qwen3-ASR-Flash.
Show child attributes
Show child attributes
If you know the language of the audio, specify it to improve recognition accuracy. Specify only one language. If the audio contains multiple languages, do not specify this parameter.
Specifies whether to enable Inverse Text Normalization (ITN). Applicable only to Chinese and English audio.
Response
The unique identifier for this call. The Java SDK returns this as requestId.
The call result.
Show child attributes
Show child attributes
The model output, included when result_format is message.
[
{
"finish_reason": "stop",
"message": {
"annotations": [
{
"language": "zh",
"type": "audio_info",
"emotion": "neutral"
}
],
"content": [
{
"text": "Welcome to Qwen Cloud."
}
],
"role": "assistant"
}
}
]Show child attributes
Show child attributes
null during generation. stop when finished naturally. length when output exceeded maximum length.
The message object output by the model.
Show child attributes
Show child attributes
The role of the output message. Always assistant.
The output message content.
[
{
"text": "Welcome to Qwen Cloud."
}
]Show child attributes
Show child attributes
The speech recognition result text.
Output annotation information.
[
{
"language": "zh",
"type": "audio_info",
"emotion": "neutral"
}
]Show child attributes
Show child attributes
Set to audio_info.
The language of the recognized audio.
The emotion of the recognized audio.