Skip to main content
Non-realtime

Fun-ASR recording Python SDK

File transcription Python

User guide: For model details and recommendations, see Audio file recognition - Fun-ASR/Paraformer.

Prerequisites

For temporary access to third-party apps, use a temporary token. Tokens expire in 60 seconds, limiting leakage risk.

Model availability

ModelVersionUnit priceFree quota (Note)
fun-asr
Currently, fun-asr-2025-11-07
Stable$0.000035/second36,000 seconds (10 hours)
Valid for 90 days
fun-asr-2025-11-07
Improved far-field VAD over fun-asr-2025-08-25 for higher accuracy
Snapshot$0.000035/second36,000 seconds (10 hours)
Valid for 90 days
fun-asr-2025-08-25Snapshot$0.000035/second36,000 seconds (10 hours)
Valid for 90 days
fun-asr-mtl
Currently, fun-asr-mtl-2025-08-25
Stable$0.000035/second36,000 seconds (10 hours)
Valid for 90 days
fun-asr-mtl-2025-08-25Snapshot$0.000035/second36,000 seconds (10 hours)
Valid for 90 days
  • Supported languages:
    • fun-asr and fun-asr-2025-11-07: Mandarin, Cantonese, Wu, Minnan, Hakka, Gan, Xiang, Jin, English, and Japanese. Also supports Mandarin accents from Zhongyuan, Southwest, Jilu, Jianghuai, Lanyin, Jiaoliao, Northeast, Beijing, and Hong Kong-Taiwan regions.
    • fun-asr-2025-08-25: Mandarin and English.
    • fun-asr-mtl and fun-asr-mtl-2025-08-25: Mandarin, Cantonese, English, Japanese, Korean, Vietnamese, Indonesian, Thai, Malay, Filipino, Arabic, Hindi, Bulgarian, Croatian, Czech, Danish, Dutch, Estonian, Finnish, Greek, Hungarian, Irish, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Slovak, Slovenian, and Swedish.
  • Sample rates supported: Any
  • Audio formats supported: aac, amr, avi, flac, flv, m4a, mkv, mov, mp3, mp4, mpeg, ogg, opus, wav, webm, wma, wmv

Limitations

Files must be at public URLs (HTTP/HTTPS, such as https://your-domain.com/file.mp3). Local files and Base64 encoding are not supported. Pass URLs with the file_urls parameter. Up to 100 URLs per request.
  • Audio formats: aac, amr, avi, flac, flv, m4a, mkv, mov, mp3, mp4, mpeg, ogg, opus, wav, webm, wma, wmv
Not all format variants are tested. Test your files to verify results.
  • Audio sample rate: Any
  • File size and duration: Max 2 GB and 12 hours. For larger files, see Audio trimming.
  • Batch processing: Up to 100 URLs per request.
  • Languages: fun-asr supports Chinese and English. fun-asr-mtl supports Chinese, Cantonese, English, Japanese, Thai, Vietnamese, and Indonesian.

Getting started

The Transcription class has two usage patterns:
  • Async submit + sync wait: Submit a task and block until done.
  • Async submit + async query: Submit a task and poll for results.

Async submit and sync wait

Flowchart
1

Submit the task

Call async_call on the Transcription class with the request parameters.
  • Tasks start in PENDING state. Queue time depends on queue length and file duration. Processing is fast once started.
  • Results and download URLs expire 24 hours after completion.
2

Wait for the result

Call wait on the Transcription class to block until done.Task statuses: PENDING, RUNNING, SUCCEEDED, FAILED. wait blocks on PENDING/RUNNING and returns on SUCCEEDED or FAILED.Returns a TranscriptionResponse.
from http import HTTPStatus
from dashscope.audio.asr import Transcription
import dashscope
import os
import json

dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'

# If you have not configured an environment variable, replace the following line with your API key: dashscope.api_key = "sk-xxx"
dashscope.api_key = os.getenv("DASHSCOPE_API_KEY")

task_response = Transcription.async_call(
  model='fun-asr',
  file_urls=['https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_female2.wav',
               'https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_male2.wav']
)

transcribe_response = Transcription.wait(task=task_response.output.task_id)
if transcribe_response.status_code == HTTPStatus.OK:
  print(json.dumps(transcribe_response.output, indent=4, ensure_ascii=False))
  print('transcription done!')

Async submit and async query

Flowchart
1

Submit the task

Call async_call on the Transcription class with the request parameters.
  • Tasks start in PENDING state. Queue time depends on queue length and file duration. Processing is fast once started.
  • Results and download URLs expire 24 hours after completion.
2

Poll for the result

Call fetch on the Transcription class until the task reaches SUCCEEDED or FAILED.Returns a TranscriptionResponse.
from http import HTTPStatus
from dashscope.audio.asr import Transcription
import dashscope
import os
import json

dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'

# If you have not configured an environment variable, replace the following line with your API key: dashscope.api_key = "sk-xxx"
dashscope.api_key = os.getenv("DASHSCOPE_API_KEY")

transcribe_response = Transcription.async_call(
  model='fun-asr',
  file_urls=['https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_female2.wav',
               'https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_male2.wav']
)

while True:
  if transcribe_response.output.task_status == 'SUCCEEDED' or transcribe_response.output.task_status == 'FAILED':
    break
  transcribe_response = Transcription.fetch(task=transcribe_response.output.task_id)

if transcribe_response.status_code == HTTPStatus.OK:
  print(json.dumps(transcribe_response.output, indent=4, ensure_ascii=False))
  print('transcription done!')

Request parameters

Pass these parameters to async_call on the Transcription class.
ParameterTypeDefaultRequiredDescription
modelstr-YesModel ID. See Model availability.
file_urlslist[str]-YesAudio/video file URLs (HTTP/HTTPS). Up to 100 per request.
vocabulary_idstr-NoHotword vocabulary ID for this task. Disabled by default. See Customize hotwords.
channel_idlist[int][0]NoAudio track indexes to recognize (0-based). [0] = first track, [0, 1] = first and second.
Each track is billed separately.
special_word_filterstr-NoSensitive word filter config. See Sensitive word filter.
diarization_enabledboolFalseNoEnable speaker diarization (single-channel only). Results include speaker_id. See Recognition result.
speaker_countint-NoExpected speaker count (2-100). Only applies when diarization_enabled is true. Auto-detected by default. Guides the algorithm but does not guarantee exact count.
language_hintslist[str]["zh", "en"]NoLanguage codes. Leave unset for auto-detection. See Supported languages.
speech_noise_thresholdfloat-NoSpeech noise threshold.

Sensitive word filter

By default, words on the Qwen Cloud sensitive word list are replaced with asterisks (*). With special_word_filter, you can:
  • Replace with *: Matched words become asterisks.
  • Filter out: Matched words are removed.
Value must be a JSON string:
{
  "filter_with_signed": {
    "word_list": ["test"]
  },
  "filter_with_empty": {
    "word_list": ["start", "happen"]
  },
  "system_reserved_filter": true
}
Fields:
  • filter_with_signed (object, optional): Words to replace with *.
    • Example: "Help me test this code" becomes "Help me **** this code"
    • word_list: Words to replace.
  • filter_with_empty (object, optional): Words to remove.
    • Example: "Is the game about to start?" becomes "Is the game about to?"
    • word_list: Words to remove.
  • system_reserved_filter (boolean, optional, default: true): Enable system filtering. When true, words on the Qwen Cloud sensitive word list are replaced with *.

Supported languages

Language codes by model:
  • fun-asr, fun-asr-2025-11-07:
    • zh: Chinese
    • en: English
    • ja: Japanese
  • fun-asr-2025-08-25:
    • zh: Chinese
    • en: English
  • fun-asr-mtl, fun-asr-mtl-2025-08-25:
    • zh: Chinese
    • en: English
    • ja: Japanese
    • ko: Korean
    • vi: Vietnamese
    • id: Indonesian
    • th: Thai
    • ms: Malay
    • tl: Filipino
    • ar: Arabic
    • hi: Hindi
    • bg: Bulgarian
    • hr: Croatian
    • cs: Czech
    • da: Danish
    • nl: Dutch
    • et: Estonian
    • fi: Finnish
    • el: Greek
    • hu: Hungarian
    • ga: Irish
    • lv: Latvian
    • lt: Lithuanian
    • mt: Maltese
    • pl: Polish
    • pt: Portuguese
    • ro: Romanian
    • sk: Slovak
    • sl: Slovenian
    • sv: Swedish

Response results

TranscriptionResponse

TranscriptionResponse contains task info (task_id, task_status) and results in output. See TranscriptionOutput.
  • PENDING status
  • RUNNING status
  • SUCCEEDED status
  • FAILED status
{
  "status_code": 200,
  "request_id": "251aceab-a6aa-9fc4-b7f7-0cc6d3e2a9f3",
  "code": null,
  "message": "",
  "output": {
    "task_id": "7d0a58a3-1dbe-4de9-8cff-5f48213128b0",
    "task_status": "PENDING",
    "submit_time": "2025-02-13 16:55:08.573",
    "scheduled_time": "2025-02-13 16:55:08.592",
    "task_metrics": {
      "TOTAL": 2,
      "SUCCEEDED": 0,
      "FAILED": 0
    }
  },
  "usage": null
}
Key parameters:
ParameterDescription
status_codeHTTP status code.
codeIgnore top-level code. Check output.results[].code for errors.
messageIgnore top-level message. Check output.results[].message for errors.
task_idTask ID.
task_statusTask status: PENDING, RUNNING, SUCCEEDED, FAILED. If any subtask succeeds, the task is SUCCEEDED. Check subtask_status for individual results.
resultsSubtask results.
subtask_statusSubtask status: PENDING, RUNNING, SUCCEEDED, FAILED.
file_urlAudio file URL.
transcription_urlResult URL (JSON file). Download or read via HTTP. See Recognition result.

TranscriptionOutput

TranscriptionOutput is the output property of TranscriptionResponse.
  • PENDING status
  • RUNNING status
  • SUCCEEDED status
  • FAILED status
{
  "task_id": "f2f7c2fa-0cd9-4bb2-a283-27b26ee4bb67",
  "task_status": "PENDING",
  "submit_time": "2025-02-13 17:59:27.754",
  "scheduled_time": "2025-02-13 17:59:27.789",
  "task_metrics": {
    "TOTAL": 2,
    "SUCCEEDED": 0,
    "FAILED": 0
  }
}
Key parameters:
ParameterDescription
codeError code.
messageError message.
task_idTask ID.
task_statusTask status: PENDING, RUNNING, SUCCEEDED, FAILED. If any subtask succeeds, the task is SUCCEEDED. Check subtask_status for individual results.
resultsSubtask results.
subtask_statusSubtask status: PENDING, RUNNING, SUCCEEDED, FAILED.
file_urlAudio file URL.
transcription_urlResult URL (JSON file). Download or read via HTTP. See Recognition result.

Recognition result

Results are JSON files.
{
  "file_url": "https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_female2.wav",
  "properties": {
    "audio_format": "pcm_s16le",
    "channels": [
      0
    ],
    "original_sampling_rate": 16000,
    "original_duration_in_milliseconds": 3834
  },
  "transcripts": [
    {
      "channel_id": 0,
      "content_duration_in_milliseconds": 3720,
      "text": "Hello world, this is Alibaba Speech Lab.",
      "sentences": [
        {
          "begin_time": 100,
          "end_time": 3820,
          "text": "Hello world, this is Alibaba Speech Lab.",
          "sentence_id": 1,
          "speaker_id": 0,
          "words": [
            {
              "begin_time": 100,
              "end_time": 596,
              "text": "Hello ",
              "punctuation": ""
            },
            {
              "begin_time": 596,
              "end_time": 844,
              "text": "world",
              "punctuation": ", "
            }
          ]
        }
      ]
    }
  ]
}
speaker_id appears only when speaker diarization is enabled.
Key parameters:
ParameterTypeDescription
audio_formatstringAudio format.
channelsarray[integer]Track indexes. [0] = single-track, [0, 1] = dual-track.
original_sampling_rateintegerSample rate (Hz).
original_duration_in_millisecondsintegerAudio duration (ms).
channel_idintegerTrack index (0-based).
content_duration_in_millisecondsintegerSpeech duration (ms).
Only speech is transcribed and billed. Non-speech is excluded. Speech duration is usually shorter than audio duration.
transcriptstringParagraph-level text.
sentencesarraySentence-level results.
wordsarrayWord-level results.
begin_timeintegerStart time (ms).
end_timeintegerEnd time (ms).
textstringTranscription text.
speaker_idintegerSpeaker index (0-based). Only present when diarization is enabled.
punctuationstringPredicted punctuation after the word.

Transcription class

Import with from dashscope.audio.asr import Transcription.
MethodSignatureDescription
async_call@classmethod def async_call(cls, model: str, file_urls: List[str], phrase_id: str = None, api_key: str = None, workspace: str = None, **kwargs) -> TranscriptionResponseSubmit a recognition task.
wait@classmethod def wait(cls, task: Union[str, TranscriptionResponse], api_key: str = None, workspace: str = None, **kwargs) -> TranscriptionResponseBlock until done (SUCCEEDED or FAILED). Returns a TranscriptionResponse.
fetch@classmethod def fetch(cls, task: Union[str, TranscriptionResponse], api_key: str = None, workspace: str = None, **kwargs) -> TranscriptionResponseQuery task status. Returns a TranscriptionResponse.