Skip to main content
Non-realtime

Fun-ASR recording Python SDK

File transcription Python

User guide: For model details and recommendations, see Audio file recognition - Fun-ASR/Paraformer.

Prerequisites

For temporary access to third-party apps, use a temporary token. Tokens expire in 60 seconds, limiting leakage risk.

Model availability

ModelVersionUnit priceFree quota (Note)
fun-asr
Currently, fun-asr-2025-11-07
Stable$0.000035/second36,000 seconds (10 hours)
Valid for 90 days
fun-asr-2025-11-07
Improved far-field VAD over fun-asr-2025-08-25 for higher accuracy
Snapshot$0.000035/second36,000 seconds (10 hours)
Valid for 90 days
fun-asr-2025-08-25Snapshot$0.000035/second36,000 seconds (10 hours)
Valid for 90 days
fun-asr-mtl
Currently, fun-asr-mtl-2025-08-25
Stable$0.000035/second36,000 seconds (10 hours)
Valid for 90 days
fun-asr-mtl-2025-08-25Snapshot$0.000035/second36,000 seconds (10 hours)
Valid for 90 days
  • Supported languages:
    • fun-asr, fun-asr-2025-11-07, fun-asr-mtl, and fun-asr-mtl-2025-08-25: Chinese (Mandarin, Cantonese, Wu, Minnan, Hakka, Gan, Xiang, and Jin; also supports Mandarin accents from Zhongyuan, Southwest, Jilu, Jianghuai, Lanyin, Jiaoliao, Northeast, Beijing, and Hong Kong-Taiwan regions -- including Henan, Shaanxi, Hubei, Sichuan, Chongqing, Yunnan, Guizhou, Guangdong, Guangxi, Hebei, Tianjin, Shandong, Anhui, Nanjing, Jiangsu, Hangzhou, Gansu, and Ningxia), English, Japanese, Korean, Vietnamese, Indonesian, Thai, Malay, Filipino, Arabic, Bulgarian, Croatian, Czech, Danish, Dutch, Estonian, Finnish, Greek, Hindi, Hungarian, Irish, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Slovak, Slovenian, and Swedish.
    • fun-asr-2025-08-25: Mandarin and English.
  • Sample rates supported: Any
  • Audio formats supported: aac, amr, avi, flac, flv, m4a, mkv, mov, mp3, mp4, mpeg, ogg, opus, wav, webm, wma, wmv

Limitations

Files must be at public URLs (HTTP/HTTPS, such as https://your-domain.com/file.mp3). Local files and Base64 encoding are not supported. Pass URLs with the file_urls parameter. Up to 100 URLs per request.
  • Audio formats: aac, amr, avi, flac, flv, m4a, mkv, mov, mp3, mp4, mpeg, ogg, opus, wav, webm, wma, wmv
Not all format variants are tested. Test your files to verify results.
  • Audio sample rate: Any
  • File size and duration: Max 2 GB and 12 hours. For larger files, see Audio trimming.
  • Batch processing: Up to 100 URLs per request.
  • Languages: fun-asr, fun-asr-mtl, and their snapshot versions support Chinese and 29 other languages. fun-asr-2025-08-25 supports Chinese and English only. See Supported languages.

Request parameters

Pass these parameters to async_call on the Transcription class.
ParameterTypeDefaultRequiredDescription
modelstr-YesModel ID. See Model availability.
file_urlslist[str]-YesAudio/video file URLs (HTTP/HTTPS). Up to 100 per request.
vocabulary_idstr-NoHotword vocabulary ID for this task. Disabled by default. See Customize hotwords.
channel_idlist[int][0]NoAudio track indexes to recognize (0-based). [0] = first track, [0, 1] = first and second.
Each track is billed separately.
special_word_filterstr-NoSensitive word filter config. See Sensitive word filter.
diarization_enabledboolFalseNoEnable speaker diarization (single-channel only). Results include speaker_id. See Recognition result.
speaker_countint-NoExpected speaker count (2-100). Only applies when diarization_enabled is true. Auto-detected by default. Guides the algorithm but does not guarantee exact count.
language_hintslist[str]["zh", "en"]NoLanguage codes. Leave unset for auto-detection. See Supported languages.
speech_noise_thresholdfloat-NoSpeech noise threshold.

Sensitive word filter

By default, words on the Qwen Cloud sensitive word list are replaced with asterisks (*). With special_word_filter, you can:
  • Replace with *: Matched words become asterisks.
  • Filter out: Matched words are removed.
Value must be a JSON string:
{
  "filter_with_signed": {
    "word_list": ["test"]
  },
  "filter_with_empty": {
    "word_list": ["start", "happen"]
  },
  "system_reserved_filter": true
}
Fields:
  • filter_with_signed (object, optional): Words to replace with *.
    • Example: "Help me test this code" becomes "Help me **** this code"
    • word_list: Words to replace.
  • filter_with_empty (object, optional): Words to remove.
    • Example: "Is the game about to start?" becomes "Is the game about to?"
    • word_list: Words to remove.
  • system_reserved_filter (boolean, optional, default: true): Enable system filtering. When true, words on the Qwen Cloud sensitive word list are replaced with *.

Supported languages

Language codes by model:
  • fun-asr, fun-asr-2025-11-07, fun-asr-mtl, fun-asr-mtl-2025-08-25:
    • zh: Chinese
    • en: English
    • ja: Japanese
    • ko: Korean
    • vi: Vietnamese
    • id: Indonesian
    • th: Thai
    • ms: Malay
    • tl: Filipino
    • ar: Arabic
    • bg: Bulgarian
    • hr: Croatian
    • cs: Czech
    • da: Danish
    • nl: Dutch
    • et: Estonian
    • fi: Finnish
    • el: Greek
    • hi: Hindi
    • hu: Hungarian
    • ga: Irish
    • lv: Latvian
    • lt: Lithuanian
    • mt: Maltese
    • pl: Polish
    • pt: Portuguese
    • ro: Romanian
    • sk: Slovak
    • sl: Slovenian
    • sv: Swedish
  • fun-asr-2025-08-25:
    • zh: Chinese
    • en: English

Response results

TranscriptionResponse

TranscriptionResponse contains task info (task_id, task_status) and results in output. See TranscriptionOutput.
  • PENDING status
  • RUNNING status
  • SUCCEEDED status
  • FAILED status
{
  "status_code": 200,
  "request_id": "251aceab-a6aa-9fc4-b7f7-0cc6d3e2a9f3",
  "code": null,
  "message": "",
  "output": {
    "task_id": "7d0a58a3-1dbe-4de9-8cff-5f48213128b0",
    "task_status": "PENDING",
    "submit_time": "2025-02-13 16:55:08.573",
    "scheduled_time": "2025-02-13 16:55:08.592",
    "task_metrics": {
      "TOTAL": 2,
      "SUCCEEDED": 0,
      "FAILED": 0
    }
  },
  "usage": null
}
Key parameters:
ParameterDescription
status_codeHTTP status code.
codeIgnore top-level code. Check output.results[].code for errors.
messageIgnore top-level message. Check output.results[].message for errors.
task_idTask ID.
task_statusTask status: PENDING, RUNNING, SUCCEEDED, FAILED. If any subtask succeeds, the task is SUCCEEDED. Check subtask_status for individual results.
resultsSubtask results.
subtask_statusSubtask status: PENDING, RUNNING, SUCCEEDED, FAILED.
file_urlAudio file URL.
transcription_urlResult URL (JSON file). Download or read via HTTP. See Recognition result.

TranscriptionOutput

TranscriptionOutput is the output property of TranscriptionResponse.
  • PENDING status
  • RUNNING status
  • SUCCEEDED status
  • FAILED status
{
  "task_id": "f2f7c2fa-0cd9-4bb2-a283-27b26ee4bb67",
  "task_status": "PENDING",
  "submit_time": "2025-02-13 17:59:27.754",
  "scheduled_time": "2025-02-13 17:59:27.789",
  "task_metrics": {
    "TOTAL": 2,
    "SUCCEEDED": 0,
    "FAILED": 0
  }
}
Key parameters:
ParameterDescription
codeError code.
messageError message.
task_idTask ID.
task_statusTask status: PENDING, RUNNING, SUCCEEDED, FAILED. If any subtask succeeds, the task is SUCCEEDED. Check subtask_status for individual results.
resultsSubtask results.
subtask_statusSubtask status: PENDING, RUNNING, SUCCEEDED, FAILED.
file_urlAudio file URL.
transcription_urlResult URL (JSON file). Download or read via HTTP. See Recognition result.

Recognition result

Results are JSON files.
{
  "file_url": "https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_female2.wav",
  "properties": {
    "audio_format": "pcm_s16le",
    "channels": [
      0
    ],
    "original_sampling_rate": 16000,
    "original_duration_in_milliseconds": 3834
  },
  "transcripts": [
    {
      "channel_id": 0,
      "content_duration_in_milliseconds": 3720,
      "text": "Hello world, this is Alibaba Speech Lab.",
      "sentences": [
        {
          "begin_time": 100,
          "end_time": 3820,
          "text": "Hello world, this is Alibaba Speech Lab.",
          "sentence_id": 1,
          "speaker_id": 0,
          "words": [
            {
              "begin_time": 100,
              "end_time": 596,
              "text": "Hello ",
              "punctuation": ""
            },
            {
              "begin_time": 596,
              "end_time": 844,
              "text": "world",
              "punctuation": ", "
            }
          ]
        }
      ]
    }
  ]
}
speaker_id appears only when speaker diarization is enabled.
Key parameters:
ParameterTypeDescription
audio_formatstringAudio format.
channelsarray[integer]Track indexes. [0] = single-track, [0, 1] = dual-track.
original_sampling_rateintegerSample rate (Hz).
original_duration_in_millisecondsintegerAudio duration (ms).
channel_idintegerTrack index (0-based).
content_duration_in_millisecondsintegerSpeech duration (ms).
Only speech is transcribed and billed. Non-speech is excluded. Speech duration is usually shorter than audio duration.
transcriptstringParagraph-level text.
sentencesarraySentence-level results.
wordsarrayWord-level results.
begin_timeintegerStart time (ms).
end_timeintegerEnd time (ms).
textstringTranscription text.
speaker_idintegerSpeaker index (0-based). Only present when diarization is enabled.
punctuationstringPredicted punctuation after the word.

Transcription class

Import with from dashscope.audio.asr import Transcription.
MethodSignatureDescription
async_call@classmethod def async_call(cls, model: str, file_urls: List[str], phrase_id: str = None, api_key: str = None, workspace: str = None, **kwargs) -> TranscriptionResponseSubmit a recognition task.
wait@classmethod def wait(cls, task: Union[str, TranscriptionResponse], api_key: str = None, workspace: str = None, **kwargs) -> TranscriptionResponseBlock until done (SUCCEEDED or FAILED). Returns a TranscriptionResponse.
fetch@classmethod def fetch(cls, task: Union[str, TranscriptionResponse], api_key: str = None, workspace: str = None, **kwargs) -> TranscriptionResponseQuery task status. Returns a TranscriptionResponse.