Skip to main content
Non-realtime

RESTful API

File transcription REST

User guide: For model details and selection tips, see Audio file recognition - Fun-ASR/Paraformer. This service has two APIs: task submission and task query. Submit a task first, then poll the query API until it completes.

Prerequisites

Sign in to Qwen Cloud and create an API key. To avoid security risks, export the API key as an environment variable instead of hard-coding it.
To grant temporary access or restrict sensitive operations, use a temporary token.Temporary tokens expire in 60 seconds, reducing leakage risk. Replace the API key in your code with the temporary token.

Model availability

ModelVersionUnit priceFree quota (Note)
fun-asr
Currently, fun-asr-2025-11-07
Stable$0.000035/second36,000 seconds (10 hours)
Valid for 90 days
fun-asr-2025-11-07
Improved far-field VAD over fun-asr-2025-08-25 for higher accuracy
Snapshot$0.000035/second36,000 seconds (10 hours)
Valid for 90 days
fun-asr-2025-08-25Snapshot$0.000035/second36,000 seconds (10 hours)
Valid for 90 days
fun-asr-mtl
Currently, fun-asr-mtl-2025-08-25
Stable$0.000035/second36,000 seconds (10 hours)
Valid for 90 days
fun-asr-mtl-2025-08-25Snapshot$0.000035/second36,000 seconds (10 hours)
Valid for 90 days
  • Supported languages:
    • fun-asr and fun-asr-2025-11-07: Mandarin, Cantonese, Wu, Minnan, Hakka, Gan, Xiang, and Jin. Also supports Mandarin accents from Zhongyuan, Southwest, Jilu, Jianghuai, Lanyin, Jiaoliao, Northeast, Beijing, and Hong Kong-Taiwan regions -- including Henan, Shaanxi, Hubei, Sichuan, Chongqing, Yunnan, Guizhou, Guangdong, Guangxi, Hebei, Tianjin, Shandong, Anhui, Nanjing, Jiangsu, Hangzhou, Gansu, and Ningxia. Also supports English and Japanese.
    • fun-asr-2025-08-25: Mandarin and English.
    • fun-asr-mtl and fun-asr-mtl-2025-08-25: Mandarin, Cantonese, English, Japanese, Korean, Vietnamese, Indonesian, Thai, Malay, Filipino, Arabic, Hindi, Bulgarian, Croatian, Czech, Danish, Dutch, Estonian, Finnish, Greek, Hungarian, Irish, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Slovak, Slovenian, and Swedish.
  • Sample rates supported: Any
  • Audio formats supported: aac, amr, avi, flac, flv, m4a, mkv, mov, mp3, mp4, mpeg, ogg, opus, wav, webm, wma, wmv

Limitations

This service does not accept local file uploads or Base64 audio. You must provide a publicly accessible file URL over HTTP or HTTPS, for example https://your-domain.com/file.mp3. Specify the URL with the file_urls parameter. A single request supports up to 100 URLs.
  • Audio formats: aac, amr, avi, flac, flv, m4a, mkv, mov, mp3, mp4, mpeg, ogg, opus, wav, webm, wma, wmv
Many audio format variants exist. The API cannot guarantee all formats work correctly. Test your files to verify results.
  • Audio sample rate: Any
  • File size and duration: Max 2 GB, max 12 hours. For files exceeding these limits, pre-process them first. See Preprocess audio files with FFmpeg.
  • Batch size: Up to 100 file URLs per request.
  • Supported languages: fun-asr supports Chinese and English. fun-asr-mtl-2025-08-25 supports Chinese, Cantonese, English, Japanese, Thai, Vietnamese, and Indonesian.
  • Frontend calls: You cannot call the API from the frontend. Use a backend proxy.

Task submission API

Basic information

ItemDescription
DescriptionSubmits a speech recognition task.
URLhttps://dashscope-intl.aliyuncs.com/api/v1/services/audio/asr/transcription
Request methodPOST
Request headersSee below
Message bodySee below
Request headers:
Authorization: Bearer $DASHSCOPE_API_KEY
Content-Type: application/json
X-DashScope-Async: enable
The X-DashScope-Async: enable header is required.
Message body (contains all request parameters. You can omit optional fields):
{
  "model": "fun-asr",
  "input": {
    "file_urls": [
      "https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_female2.wav",
      "https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_male2.wav"
    ]
  },
  "parameters": {
    "vocabulary_id": "vocab-Xxxx",
    "channel_id": [0],
    "special_word_filter": "xxx",
    "diarization_enabled": false,
    "speaker_count": 2
  }
}

Request parameters

curl --location 'https://dashscope-intl.aliyuncs.com/api/v1/services/audio/asr/transcription' \
     --header "Authorization: Bearer $DASHSCOPE_API_KEY" \
     --header "Content-Type: application/json" \
     --header "X-DashScope-Async: enable" \
     --data '{"model":"fun-asr","input":{"file_urls":["https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_female2.wav",
              "https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_male2.wav"]},"parameters":{"channel_id":[0]}}'
ParameterTypeDefault valueRequiredDescription
modelstring-YesThe model name. See Model availability.
file_urlsarray[string]-YesA list of audio or video file URLs (HTTP/HTTPS). Up to 100 URLs per request.
vocabulary_idstring-NoThe hotword ID. Applies the hotwords to this task. Disabled by default. See Customize hotwords.
channel_idarray[integer][0]NoAudio track indexes to recognize in a multi-track file. Starts from 0. For example, [0] recognizes the first track, [0, 1] recognizes both. Defaults to the first track.
special_word_filterstring-NoConfigures sensitive word handling. See Sensitive word filter details.
diarization_enabledbooleanfalseNoEnables speaker diarization. Single-channel audio only. When enabled, results include speaker_id to distinguish speakers. See Recognition results.
speaker_countinteger-NoA reference value for the number of speakers (2 to 100). Takes effect only when diarization_enabled is true. The algorithm tries to output this number of speakers but cannot guarantee it. Defaults to automatic detection.
language_hintsarray[string]["zh", "en"]NoLanguage codes for recognition. If unset, the model detects the language automatically. See Supported languages.
Each audio track in channel_id is billed separately. Example: [0, 1] on one file = two charges.

Sensitive word filter details

If special_word_filter is not set, the built-in filter replaces matched words with asterisks (*) of equal length. If set, you can use these policies:
  • Replace with *: Replaces matched words with asterisks of the same length.
  • Filter out: Removes matched words from the result.
The value must be a JSON string:
{
  "filter_with_signed": {
  "word_list": ["test"]
  },
  "filter_with_empty": {
  "word_list": ["start", "happen"]
  },
  "system_reserved_filter": true
}
Field descriptions:
  • filter_with_signed
    • Type: object. Required: No.
    • Matched words are replaced with asterisks of the same length.
    • Example: "Help me test this piece of code" becomes "Help me **** this piece of code".
    • Internal field: word_list -- A string array of words to replace.
  • filter_with_empty
    • Type: object. Required: No.
    • Matched words are removed from the result.
    • Example: "Is the game about to start?" becomes "Is the game about to ?".
    • Internal field: word_list -- A string array of words to remove.
  • system_reserved_filter
    • Type: Boolean. Required: No. Default: true.
    • Enables the system's preset sensitive word rules. When true, words matching the Qwen Cloud sensitive word list are replaced with asterisks of the same length.

Supported languages

Supported language codes by model:
  • fun-asr, fun-asr-2025-11-07:
    • zh: Chinese
    • en: English
    • ja: Japanese
  • fun-asr-2025-08-25:
    • zh: Chinese
    • en: English
  • fun-asr-mtl, fun-asr-mtl-2025-08-25:
    • zh: Chinese
    • en: English
    • ja: Japanese
    • ko: Korean
    • vi: Vietnamese
    • id: Indonesian
    • th: Thai
    • ms: Malay
    • tl: Filipino
    • ar: Arabic
    • hi: Hindi
    • bg: Bulgarian
    • hr: Croatian
    • cs: Czech
    • da: Danish
    • nl: Dutch
    • et: Estonian
    • fi: Finnish
    • el: Greek
    • hu: Hungarian
    • ga: Irish
    • lv: Latvian
    • lt: Lithuanian
    • mt: Maltese
    • pl: Polish
    • pt: Portuguese
    • ro: Romanian
    • sk: Slovak
    • sl: Slovenian
    • sv: Swedish

Response parameters

{
  "output": {
  "task_status": "PENDING",
  "task_id": "c2e5d63b-96e1-4607-bb91-************"
  },
  "request_id": "77ae55ae-be17-97b8-9942-************"
}
ParameterTypeDescription
task_statusstringTask status: PENDING, RUNNING, SUCCEEDED, or FAILED.
task_idstringThe task ID. Use it with the task query API to check results.
request_idstringThe request ID.

Task query API

Basic information

ItemDescription
DescriptionQueries the status and results of a speech recognition task.
URLhttps://dashscope-intl.aliyuncs.com/api/v1/tasks/\{task_id\}
Request methodGET
Request headersSee below
Message bodyNone
Request headers:
Authorization: Bearer $DASHSCOPE_API_KEY

Request parameters

curl --location 'https://dashscope-intl.aliyuncs.com/api/v1/tasks/{task_id}' \
     --header "Authorization: Bearer $DASHSCOPE_API_KEY"
ParameterTypeDefault valueRequiredDescription
task_idstring-YesThe task ID returned by the task submission API.

Response parameters

Multi-subtask jobs: overall status shows SUCCEEDED if any subtask succeeds. Check subtask_status for individual results.
{
  "request_id": "f9e1afad-94d3-997e-a83b-************",
  "output": {
  "task_id": "f86ec806-4d73-485f-a24f-************",
  "task_status": "SUCCEEDED",
  "submit_time": "2024-09-12 15:11:40.041",
  "scheduled_time": "2024-09-12 15:11:40.071",
  "end_time": "2024-09-12 15:11:40.903",
  "results": [
      {
    "file_url": "https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_male2.wav",
    "transcription_url": "https://dashscope-result-bj.oss-cn-beijing.aliyuncs.com/pre/filetrans-16k/20240912/15%3A11/3bdf7689-b598-409d-806a-121cff5e4a31-1.json?Expires=1726211500&OSSAccessKeyId=yourOSSAccessKeyId&Signature=Fj%2BaF%2FH0Kayj3w3My2ECBeP****%3D",
    "subtask_status": "SUCCEEDED"
      },
      {
    "file_url": "https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_female2.wav",
    "transcription_url": "https://dashscope-result-bj.oss-cn-beijing.aliyuncs.com/pre/filetrans-16k/20240912/15%3A11/409a4b92-445b-4dd8-8c1d-f110954d82d8-1.json?Expires=1726211500&OSSAccessKeyId=yourOSSAccessKeyId&Signature=v5Owy5qoAfT7mzGmQgH0g8C****%3D",
    "subtask_status": "SUCCEEDED"
      }
  ],
  "task_metrics": {
      "TOTAL": 2,
      "SUCCEEDED": 2,
      "FAILED": 0
  }
  },
  "usage": {
  "duration": 9
  }
}
The code field contains the error code, and the message field contains the error message. These fields appear only on errors.
{
  "task_id": "7bac899c-06ec-4a79-8875-xxxxxxxxxxxx",
  "task_status": "SUCCEEDED",
  "submit_time": "2024-12-16 16:30:59.170",
  "scheduled_time": "2024-12-16 16:30:59.204",
  "end_time": "2024-12-16 16:31:02.375",
  "results": [
    {
      "file_url": "https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/sensevoice/long_audio_demo_cn.mp3",
      "transcription_url": "https://dashscope-result-bj.oss-cn-beijing.aliyuncs.com/prod/paraformer-v2/20241216/xxxx",
      "subtask_status": "SUCCEEDED"
    },
    {
      "file_url": "https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/sensevoice/rich_text_exaple_1.wav",
      "code": "InvalidFile.DownloadFailed",
      "message": "The audio file cannot be downloaded.",
      "subtask_status": "FAILED"
    }
  ],
  "task_metrics": {
    "TOTAL": 2,
    "SUCCEEDED": 1,
    "FAILED": 1
  }
}
ParameterTypeDescription
task_idstringThe task ID.
task_statusstringThe task status.
subtask_statusstringThe subtask status.
file_urlstringThe URL of the processed file.
transcription_urlstringThe link to the recognition result. Valid for 24 hours. After expiry, you cannot query the task or download the result. The result is a JSON file you can download or read via HTTP. See Recognition results.
submit_timestringThe time the task was submitted.
scheduled_timestringThe time the task was scheduled.
end_timestringThe time the task ended.
task_metricsobjectTask metrics: TOTAL, SUCCEEDED, and FAILED counts.
usageobjectUsage information. duration is the total duration in seconds.

Description of recognition results

The recognition result is a JSON file.
{
  "file_url": "https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_female2.wav",
  "properties": {
    "audio_format": "pcm_s16le",
    "channels": [0],
    "original_sampling_rate": 16000,
    "original_duration_in_milliseconds": 3834
  },
  "transcripts": [
    {
      "channel_id": 0,
      "content_duration_in_milliseconds": 3720,
      "text": "Hello world, this is Alibaba Speech Lab.",
      "sentences": [
        {
          "begin_time": 100,
          "end_time": 3820,
          "text": "Hello world, this is Alibaba Speech Lab.",
          "sentence_id": 1,
          "speaker_id": 0,
          "words": [
            {
              "begin_time": 100,
              "end_time": 596,
              "text": "Hello ",
              "punctuation": ""
            },
            {
              "begin_time": 596,
              "end_time": 844,
              "text": "world",
              "punctuation": ", "
            }
          ]
        }
      ]
    }
  ]
}
The speaker_id field appears only when speaker diarization is enabled. Other word entries are omitted for brevity.
Key parameters:
ParameterTypeDescription
audio_formatstringThe audio format of the source file.
channelsarray[integer]The audio track indexes. Returns [0] for single-track, [0, 1] for dual-track, etc.
original_sampling_rateintegerThe sample rate (Hz).
original_duration_in_millisecondsintegerThe original audio duration (ms).
channel_idintegerThe transcribed track index, starting from 0.
content_duration_in_millisecondsintegerThe duration of speech content in the track (ms).
textstringThe transcription text (paragraph-level or word-level, depending on context).
sentencesarraySentence-level transcription results.
wordsarrayWord-level transcription results.
begin_timeintegerThe start timestamp (ms).
end_timeintegerThe end timestamp (ms).
speaker_idintegerThe speaker index, starting from 0. Appears only when diarization is enabled.
punctuationstringThe predicted punctuation after the word, if any.
Billing is based on speech segments only, not total file duration. Non-speech segments are not billed. Because speech detection uses an AI model, billed duration may differ slightly from expected content.

Complete sample

Use any HTTP library to submit tasks and poll for results. This Python sample demonstrates the workflow:
import requests
import json
import os
import time

# If you have not configured environment variables, replace the following line with your API key: api_key = "sk-xxx"
api_key = os.getenv("DASHSCOPE_API_KEY")
file_urls = [
  "https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_female2.wav",
  "https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_male2.wav",
]

region = "dashscope-intl.aliyuncs.com"

# Submit a file transcription task, including a list of file URLs to be transcribed
def submit_task(apikey, file_urls) -> str:

  headers = {
    "Authorization": f"Bearer {apikey}",
    "Content-Type": "application/json",
    "X-DashScope-Async": "enable",
  }
  data = {
    "model": "fun-asr",
    "input": {"file_urls": file_urls},
    "parameters": {
      "channel_id": [0],
      # "vocabulary_id": "vocab-Xxxx", # Optional, hotword ID.
    },
  }
  # URL of the audio file transcription service
  service_url = (
    f"https://{region}/api/v1/services/audio/asr/transcription"
  )
  response = requests.post(
    service_url, headers=headers, data=json.dumps(data)
  )

  # Print the response content
  if response.status_code == 200:
    return response.json()["output"]["task_id"]
  else:
    print("task failed!")
    print(response.json())
    return None


# Recursively query the task status until it is successful
def wait_for_complete(task_id):
  headers = {
    "Authorization": f"Bearer {api_key}",
    "Content-Type": "application/json",
    "X-DashScope-Async": "enable",
  }

  pending = True
  while pending:
    # URL of the task status query service
    service_url = f"https://{region}/api/v1/tasks/{task_id}"
    response = requests.post(
      service_url, headers=headers
    )
    if response.status_code == 200:
      status = response.json()['output']['task_status']
      if status == 'SUCCEEDED':
        print("task succeeded!")
        pending = False
        return response.json()['output']['results']
      elif status == 'RUNNING' or status == 'PENDING':
        pass
      else:
        print("task failed!")
        pending = False
    else:
      print("query failed!")
      pending = False
    print(response.json())
    time.sleep(0.1)


task_id = submit_task(apikey=api_key, file_urls=file_urls)
print("task_id: ", task_id)
result = wait_for_complete(task_id)
print("transcription result: ", result)