RESTful API - Qwen Cloud

User guide: For tutorials, code examples, and model details, see Audio file transcription. This service has two APIs: task submission and task query. Submit a task first, then poll the query API until it completes.

Prerequisites

Sign in to Qwen Cloud and create an API key. To avoid security risks, export the API key as an environment variable instead of hard-coding it.

To grant temporary access or restrict sensitive operations, use a temporary token.Temporary tokens expire in 60 seconds, reducing leakage risk. Replace the API key in your code with the temporary token.

Model availability

Model	Version	Unit price	Free quota (Note)
fun-asr Currently, fun-asr-2025-11-07	Stable	$0.000035/second	36,000 seconds (10 hours) Valid for 90 days
fun-asr-2025-11-07 Improved far-field VAD over fun-asr-2025-08-25 for higher accuracy	Snapshot	$0.000035/second	36,000 seconds (10 hours) Valid for 90 days
fun-asr-2025-08-25	Snapshot	$0.000035/second	36,000 seconds (10 hours) Valid for 90 days
fun-asr-mtl Currently, fun-asr-mtl-2025-08-25	Stable	$0.000035/second	36,000 seconds (10 hours) Valid for 90 days
fun-asr-mtl-2025-08-25	Snapshot	$0.000035/second	36,000 seconds (10 hours) Valid for 90 days

Supported languages:
- fun-asr, fun-asr-2025-11-07, fun-asr-mtl, and fun-asr-mtl-2025-08-25: Chinese (Mandarin, Cantonese, Wu, Minnan, Hakka, Gan, Xiang, and Jin; also supports Mandarin accents from Zhongyuan, Southwest, Jilu, Jianghuai, Lanyin, Jiaoliao, Northeast, Beijing, and Hong Kong-Taiwan regions -- including Henan, Shaanxi, Hubei, Sichuan, Chongqing, Yunnan, Guizhou, Guangdong, Guangxi, Hebei, Tianjin, Shandong, Anhui, Nanjing, Jiangsu, Hangzhou, Gansu, and Ningxia), English, Japanese, Korean, Vietnamese, Indonesian, Thai, Malay, Filipino, Arabic, Bulgarian, Croatian, Czech, Danish, Dutch, Estonian, Finnish, Greek, Hindi, Hungarian, Irish, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Slovak, Slovenian, and Swedish.
- fun-asr-2025-08-25: Mandarin and English.
Sample rates supported: Any
Audio formats supported: aac, amr, avi, flac, flv, m4a, mkv, mov, mp3, mp4, mpeg, ogg, opus, wav, webm, wma, wmv

Limitations

This service does not accept local file uploads or Base64 audio. You must provide a publicly accessible file URL over HTTP or HTTPS, for example https://your-domain.com/file.mp3. Specify the URL with the file_urls parameter. A single request supports up to 100 URLs.

Audio formats: aac, amr, avi, flac, flv, m4a, mkv, mov, mp3, mp4, mpeg, ogg, opus, wav, webm, wma, wmv

Many audio format variants exist. The API cannot guarantee all formats work correctly. Test your files to verify results.

Audio sample rate: Any
File size and duration: Max 2 GB, max 12 hours. For files exceeding these limits, pre-process them first. See Preprocess audio files with FFmpeg.
Batch size: Up to 100 file URLs per request.
Supported languages: fun-asr, fun-asr-mtl, and their snapshot versions support Chinese and 29 other languages. fun-asr-2025-08-25 supports Chinese and English only. See Supported languages.
Frontend calls: You cannot call the API from the frontend. Use a backend proxy.

Task submission API

Basic information

Item	Description
Description	Submits a speech recognition task.
URL	`https://dashscope-intl.aliyuncs.com/api/v1/services/audio/asr/transcription`
Request method	POST
Request headers	See below
Message body	See below

Request headers:

Authorization: Bearer $DASHSCOPE_API_KEY
Content-Type: application/json
X-DashScope-Async: enable

The X-DashScope-Async: enable header is required.

Message body (contains all request parameters. You can omit optional fields):

{
  "model": "fun-asr",
  "input": {
    "file_urls": [
      "https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_female2.wav",
      "https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_male2.wav"
    ]
  },
  "parameters": {
    "vocabulary_id": "vocab-Xxxx",
    "channel_id": [0],
    "special_word_filter": "xxx",
    "diarization_enabled": false,
    "speaker_count": 2
  }
}

Request parameters

Click to view a request sample

curl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/audio/asr/transcription' \
     --header "Authorization: Bearer $DASHSCOPE_API_KEY" \
     --header "Content-Type: application/json" \
     --header "X-DashScope-Async: enable" \
     --data '{"model":"fun-asr","input":{"file_urls":["https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_female2.wav",
              "https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_male2.wav"]},"parameters":{"channel_id":[0]}}'

Parameter	Type	Default value	Required	Description
model	string	-	Yes	The model name. See Model availability.
file_urls	array[string]	-	Yes	A list of audio or video file URLs (HTTP/HTTPS). Up to 100 URLs per request.
vocabulary_id	string	-	No	The hotword ID. Applies the hotwords to this task. Disabled by default. See Customize hotwords.
channel_id	array[integer]	[0]	No	Audio track indexes to recognize in a multi-track file. Starts from 0. For example, `[0]` recognizes the first track, `[0, 1]` recognizes both. Defaults to the first track.
special_word_filter	string	-	No	Configures sensitive word handling. See Sensitive word filter details.
diarization_enabled	boolean	false	No	Enables speaker diarization. Single-channel audio only. When enabled, results include `speaker_id` to distinguish speakers. See Recognition results.
speaker_count	integer	-	No	A reference value for the number of speakers (2 to 100). Takes effect only when `diarization_enabled` is `true`. The algorithm tries to output this number of speakers but cannot guarantee it. Defaults to automatic detection.
language_hints	array[string]	["zh", "en"]	No	Language codes for recognition. If unset, the model detects the language automatically. See Supported languages.

Each audio track in channel_id is billed separately. Example: [0, 1] on one file = two charges.

Sensitive word filter details

If special_word_filter is not set, the built-in filter replaces matched words with asterisks (*) of equal length. If set, you can use these policies:

Replace with *: Replaces matched words with asterisks of the same length.
Filter out: Removes matched words from the result.

The value must be a JSON string:

{
  "filter_with_signed": {
  "word_list": ["test"]
  },
  "filter_with_empty": {
  "word_list": ["start", "happen"]
  },
  "system_reserved_filter": true
}

Field descriptions:

filter_with_signed
- Type: object. Required: No.
- Matched words are replaced with asterisks of the same length.
- Example: "Help me test this piece of code" becomes "Help me **** this piece of code".
- Internal field: word_list -- A string array of words to replace.
filter_with_empty
- Type: object. Required: No.
- Matched words are removed from the result.
- Example: "Is the game about to start?" becomes "Is the game about to ?".
- Internal field: word_list -- A string array of words to remove.
system_reserved_filter
- Type: Boolean. Required: No. Default: true.
- Enables the system's preset sensitive word rules. When true, words matching the Qwen Cloud sensitive word list are replaced with asterisks of the same length.

Supported languages

Supported language codes by model:

fun-asr, fun-asr-2025-11-07, fun-asr-mtl, fun-asr-mtl-2025-08-25:
- zh: Chinese
- en: English
- ja: Japanese
- ko: Korean
- vi: Vietnamese
- id: Indonesian
- th: Thai
- ms: Malay
- tl: Filipino
- ar: Arabic
- bg: Bulgarian
- hr: Croatian
- cs: Czech
- da: Danish
- nl: Dutch
- et: Estonian
- fi: Finnish
- el: Greek
- hi: Hindi
- hu: Hungarian
- ga: Irish
- lv: Latvian
- lt: Lithuanian
- mt: Maltese
- pl: Polish
- pt: Portuguese
- ro: Romanian
- sk: Slovak
- sl: Slovenian
- sv: Swedish
fun-asr-2025-08-25:
- zh: Chinese
- en: English

Response parameters

Click to view a response sample

{
  "output": {
  "task_status": "PENDING",
  "task_id": "c2e5d63b-96e1-4607-bb91-************"
  },
  "request_id": "77ae55ae-be17-97b8-9942-************"
}

Parameter	Type	Description
task_status	string	Task status: `PENDING`, `RUNNING`, `SUCCEEDED`, or `FAILED`.
task_id	string	The task ID. Use it with the task query API to check results.
request_id	string	The request ID.

Task query API

Basic information

Item	Description
Description	Queries the status and results of a speech recognition task.
URL	`https://dashscope-intl.aliyuncs.com/api/v1/tasks/\{task_id\}`
Request method	GET
Request headers	See below
Message body	None

Request headers:

Authorization: Bearer $DASHSCOPE_API_KEY

Request parameters

Click to view a request sample

curl --location 'https://dashscope-intl.aliyuncs.com/api/v1/tasks/{task_id}' \
     --header "Authorization: Bearer $DASHSCOPE_API_KEY"

Parameter	Type	Default value	Required	Description
task_id	string	-	Yes	The task ID returned by the task submission API.

Response parameters

Multi-subtask jobs: overall status shows SUCCEEDED if any subtask succeeds. Check subtask_status for individual results.

Click to view a response sample (success)

{
  "request_id": "f9e1afad-94d3-997e-a83b-************",
  "output": {
  "task_id": "f86ec806-4d73-485f-a24f-************",
  "task_status": "SUCCEEDED",
  "submit_time": "2024-09-12 15:11:40.041",
  "scheduled_time": "2024-09-12 15:11:40.071",
  "end_time": "2024-09-12 15:11:40.903",
  "results": [
      {
    "file_url": "https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_male2.wav",
    "transcription_url": "https://dashscope-result-bj.oss-cn-beijing.aliyuncs.com/pre/filetrans-16k/20240912/15%3A11/3bdf7689-b598-409d-806a-121cff5e4a31-1.json?Expires=1726211500&OSSAccessKeyId=yourOSSAccessKeyId&Signature=Fj%2BaF%2FH0Kayj3w3My2ECBeP****%3D",
    "subtask_status": "SUCCEEDED"
      },
      {
    "file_url": "https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_female2.wav",
    "transcription_url": "https://dashscope-result-bj.oss-cn-beijing.aliyuncs.com/pre/filetrans-16k/20240912/15%3A11/409a4b92-445b-4dd8-8c1d-f110954d82d8-1.json?Expires=1726211500&OSSAccessKeyId=yourOSSAccessKeyId&Signature=v5Owy5qoAfT7mzGmQgH0g8C****%3D",
    "subtask_status": "SUCCEEDED"
      }
  ],
  "task_metrics": {
      "TOTAL": 2,
      "SUCCEEDED": 2,
      "FAILED": 0
  }
  },
  "usage": {
  "duration": 9
  }
}

Click to view a response sample (partial failure)

The code field contains the error code, and the message field contains the error message. These fields appear only on errors.

{
  "task_id": "7bac899c-06ec-4a79-8875-xxxxxxxxxxxx",
  "task_status": "SUCCEEDED",
  "submit_time": "2024-12-16 16:30:59.170",
  "scheduled_time": "2024-12-16 16:30:59.204",
  "end_time": "2024-12-16 16:31:02.375",
  "results": [
    {
      "file_url": "https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/sensevoice/long_audio_demo_cn.mp3",
      "transcription_url": "https://dashscope-result-bj.oss-cn-beijing.aliyuncs.com/prod/paraformer-v2/20241216/xxxx",
      "subtask_status": "SUCCEEDED"
    },
    {
      "file_url": "https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/sensevoice/rich_text_exaple_1.wav",
      "code": "InvalidFile.DownloadFailed",
      "message": "The audio file cannot be downloaded.",
      "subtask_status": "FAILED"
    }
  ],
  "task_metrics": {
    "TOTAL": 2,
    "SUCCEEDED": 1,
    "FAILED": 1
  }
}

Parameter	Type	Description
task_id	string	The task ID.
task_status	string	The task status.
subtask_status	string	The subtask status.
file_url	string	The URL of the processed file.
transcription_url	string	The link to the recognition result. Valid for 24 hours. After expiry, you cannot query the task or download the result. The result is a JSON file you can download or read via HTTP. See Recognition results.
submit_time	string	The time the task was submitted.
scheduled_time	string	The time the task was scheduled.
end_time	string	The time the task ended.
task_metrics	object	Task metrics: `TOTAL`, `SUCCEEDED`, and `FAILED` counts.
usage	object	Usage information. `duration` is the total duration in seconds.

Description of recognition results

The recognition result is a JSON file.

Click to view a recognition result example

{
  "file_url": "https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_female2.wav",
  "properties": {
    "audio_format": "pcm_s16le",
    "channels": [0],
    "original_sampling_rate": 16000,
    "original_duration_in_milliseconds": 3834
  },
  "transcripts": [
    {
      "channel_id": 0,
      "content_duration_in_milliseconds": 3720,
      "text": "Hello world, this is Alibaba Speech Lab.",
      "sentences": [
        {
          "begin_time": 100,
          "end_time": 3820,
          "text": "Hello world, this is Alibaba Speech Lab.",
          "sentence_id": 1,
          "speaker_id": 0,
          "words": [
            {
              "begin_time": 100,
              "end_time": 596,
              "text": "Hello ",
              "punctuation": ""
            },
            {
              "begin_time": 596,
              "end_time": 844,
              "text": "world",
              "punctuation": ", "
            }
          ]
        }
      ]
    }
  ]
}

The speaker_id field appears only when speaker diarization is enabled. Other word entries are omitted for brevity.

Key parameters:

Parameter	Type	Description
audio_format	string	The audio format of the source file.
channels	array[integer]	The audio track indexes. Returns `[0]` for single-track, `[0, 1]` for dual-track, etc.
original_sampling_rate	integer	The sample rate (Hz).
original_duration_in_milliseconds	integer	The original audio duration (ms).
channel_id	integer	The transcribed track index, starting from 0.
content_duration_in_milliseconds	integer	The duration of speech content in the track (ms).
text	string	The transcription text (paragraph-level or word-level, depending on context).
sentences	array	Sentence-level transcription results.
words	array	Word-level transcription results.
begin_time	integer	The start timestamp (ms).
end_time	integer	The end timestamp (ms).
speaker_id	integer	The speaker index, starting from 0. Appears only when diarization is enabled.
punctuation	string	The predicted punctuation after the word, if any.

Billing is based on speech segments only, not total file duration. Non-speech segments are not billed. Because speech detection uses an AI model, billed duration may differ slightly from expected content.

​Prerequisites

​Model availability

​Limitations

​Task submission API

​Basic information

​Request parameters

​Sensitive word filter details

​Supported languages

​Response parameters

​Task query API

​Basic information

​Request parameters

​Response parameters

​Description of recognition results

Prerequisites

Model availability

Limitations

Task submission API

Basic information

Request parameters

Sensitive word filter details

Supported languages

Response parameters

Task query API

Basic information

Request parameters

Response parameters

Description of recognition results