File transcription Python
User guide: For model details and recommendations, see Audio file recognition - Fun-ASR/Paraformer.
Files must be at public URLs (HTTP/HTTPS, such as
The Transcription class has two usage patterns:
Pass these parameters to
By default, words on the Qwen Cloud sensitive word list are replaced with asterisks (
Fields:
Language codes by model:
Key parameters:
Key parameters:
Results are JSON files.
Key parameters:
Import with
Prerequisites
- Sign in to Qwen Cloud and create an API key. Set the API key as an environment variable.
For temporary access to third-party apps, use a temporary token. Tokens expire in 60 seconds, limiting leakage risk.
Model availability
| Model | Version | Unit price | Free quota (Note) |
|---|---|---|---|
| fun-asr Currently, fun-asr-2025-11-07 | Stable | $0.000035/second | 36,000 seconds (10 hours) Valid for 90 days |
| fun-asr-2025-11-07 Improved far-field VAD over fun-asr-2025-08-25 for higher accuracy | Snapshot | $0.000035/second | 36,000 seconds (10 hours) Valid for 90 days |
| fun-asr-2025-08-25 | Snapshot | $0.000035/second | 36,000 seconds (10 hours) Valid for 90 days |
| fun-asr-mtl Currently, fun-asr-mtl-2025-08-25 | Stable | $0.000035/second | 36,000 seconds (10 hours) Valid for 90 days |
| fun-asr-mtl-2025-08-25 | Snapshot | $0.000035/second | 36,000 seconds (10 hours) Valid for 90 days |
- Supported languages:
- fun-asr and fun-asr-2025-11-07: Mandarin, Cantonese, Wu, Minnan, Hakka, Gan, Xiang, Jin, English, and Japanese. Also supports Mandarin accents from Zhongyuan, Southwest, Jilu, Jianghuai, Lanyin, Jiaoliao, Northeast, Beijing, and Hong Kong-Taiwan regions.
- fun-asr-2025-08-25: Mandarin and English.
- fun-asr-mtl and fun-asr-mtl-2025-08-25: Mandarin, Cantonese, English, Japanese, Korean, Vietnamese, Indonesian, Thai, Malay, Filipino, Arabic, Hindi, Bulgarian, Croatian, Czech, Danish, Dutch, Estonian, Finnish, Greek, Hungarian, Irish, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Slovak, Slovenian, and Swedish.
- Sample rates supported: Any
- Audio formats supported: aac, amr, avi, flac, flv, m4a, mkv, mov, mp3, mp4, mpeg, ogg, opus, wav, webm, wma, wmv
Limitations
Files must be at public URLs (HTTP/HTTPS, such as https://your-domain.com/file.mp3). Local files and Base64 encoding are not supported.
Pass URLs with the file_urls parameter. Up to 100 URLs per request.
- Audio formats:
aac,amr,avi,flac,flv,m4a,mkv,mov,mp3,mp4,mpeg,ogg,opus,wav,webm,wma,wmv
Not all format variants are tested. Test your files to verify results.
- Audio sample rate: Any
- File size and duration: Max 2 GB and 12 hours. For larger files, see Audio trimming.
- Batch processing: Up to 100 URLs per request.
- Languages: fun-asr supports Chinese and English. fun-asr-mtl supports Chinese, Cantonese, English, Japanese, Thai, Vietnamese, and Indonesian.
Getting started
The Transcription class has two usage patterns:
- Async submit + sync wait: Submit a task and block until done.
- Async submit + async query: Submit a task and poll for results.
Async submit and sync wait
1
Submit the task
Call
async_call on the Transcription class with the request parameters.- Tasks start in
PENDINGstate. Queue time depends on queue length and file duration. Processing is fast once started. - Results and download URLs expire 24 hours after completion.
2
Wait for the result
Call
wait on the Transcription class to block until done.Task statuses: PENDING, RUNNING, SUCCEEDED, FAILED. wait blocks on PENDING/RUNNING and returns on SUCCEEDED or FAILED.Returns a TranscriptionResponse.Click to view the complete sample code
Click to view the complete sample code
Async submit and async query
1
Submit the task
Call
async_call on the Transcription class with the request parameters.- Tasks start in
PENDINGstate. Queue time depends on queue length and file duration. Processing is fast once started. - Results and download URLs expire 24 hours after completion.
2
Poll for the result
Call
fetch on the Transcription class until the task reaches SUCCEEDED or FAILED.Returns a TranscriptionResponse.Click to view the complete sample code
Click to view the complete sample code
Request parameters
Pass these parameters to async_call on the Transcription class.
| Parameter | Type | Default | Required | Description |
|---|---|---|---|---|
| model | str | - | Yes | Model ID. See Model availability. |
| file_urls | list[str] | - | Yes | Audio/video file URLs (HTTP/HTTPS). Up to 100 per request. |
| vocabulary_id | str | - | No | Hotword vocabulary ID for this task. Disabled by default. See Customize hotwords. |
| channel_id | list[int] | [0] | No | Audio track indexes to recognize (0-based). [0] = first track, [0, 1] = first and second. Each track is billed separately. |
| special_word_filter | str | - | No | Sensitive word filter config. See Sensitive word filter. |
| diarization_enabled | bool | False | No | Enable speaker diarization (single-channel only). Results include speaker_id. See Recognition result. |
| speaker_count | int | - | No | Expected speaker count (2-100). Only applies when diarization_enabled is true. Auto-detected by default. Guides the algorithm but does not guarantee exact count. |
| language_hints | list[str] | ["zh", "en"] | No | Language codes. Leave unset for auto-detection. See Supported languages. |
| speech_noise_threshold | float | - | No | Speech noise threshold. |
Sensitive word filter
By default, words on the Qwen Cloud sensitive word list are replaced with asterisks (*).
With special_word_filter, you can:
- Replace with
*: Matched words become asterisks. - Filter out: Matched words are removed.
filter_with_signed(object, optional): Words to replace with*.- Example: "Help me test this code" becomes "Help me **** this code"
word_list: Words to replace.
filter_with_empty(object, optional): Words to remove.- Example: "Is the game about to start?" becomes "Is the game about to?"
word_list: Words to remove.
system_reserved_filter(boolean, optional, default: true): Enable system filtering. When true, words on the Qwen Cloud sensitive word list are replaced with*.
Supported languages
Language codes by model:
- fun-asr, fun-asr-2025-11-07:
zh: Chineseen: Englishja: Japanese
- fun-asr-2025-08-25:
zh: Chineseen: English
- fun-asr-mtl, fun-asr-mtl-2025-08-25:
zh: Chineseen: Englishja: Japaneseko: Koreanvi: Vietnameseid: Indonesianth: Thaims: Malaytl: Filipinoar: Arabichi: Hindibg: Bulgarianhr: Croatiancs: Czechda: Danishnl: Dutchet: Estonianfi: Finnishel: Greekhu: Hungarianga: Irishlv: Latvianlt: Lithuanianmt: Maltesepl: Polishpt: Portuguesero: Romaniansk: Slovaksl: Sloveniansv: Swedish
Response results
TranscriptionResponse
TranscriptionResponse contains task info (task_id, task_status) and results in output. See TranscriptionOutput.
Click to view a sample TranscriptionResponse structure
Click to view a sample TranscriptionResponse structure
- PENDING status
- RUNNING status
- SUCCEEDED status
- FAILED status
| Parameter | Description |
|---|---|
| status_code | HTTP status code. |
| code | Ignore top-level code. Check output.results[].code for errors. |
| message | Ignore top-level message. Check output.results[].message for errors. |
| task_id | Task ID. |
| task_status | Task status: PENDING, RUNNING, SUCCEEDED, FAILED. If any subtask succeeds, the task is SUCCEEDED. Check subtask_status for individual results. |
| results | Subtask results. |
| subtask_status | Subtask status: PENDING, RUNNING, SUCCEEDED, FAILED. |
| file_url | Audio file URL. |
| transcription_url | Result URL (JSON file). Download or read via HTTP. See Recognition result. |
TranscriptionOutput
TranscriptionOutput is the output property of TranscriptionResponse.
Click to view a sample TranscriptionOutput structure
Click to view a sample TranscriptionOutput structure
- PENDING status
- RUNNING status
- SUCCEEDED status
- FAILED status
| Parameter | Description |
|---|---|
| code | Error code. |
| message | Error message. |
| task_id | Task ID. |
| task_status | Task status: PENDING, RUNNING, SUCCEEDED, FAILED. If any subtask succeeds, the task is SUCCEEDED. Check subtask_status for individual results. |
| results | Subtask results. |
| subtask_status | Subtask status: PENDING, RUNNING, SUCCEEDED, FAILED. |
| file_url | Audio file URL. |
| transcription_url | Result URL (JSON file). Download or read via HTTP. See Recognition result. |
Recognition result
Results are JSON files.
Click to view a recognition result example
Click to view a recognition result example
speaker_id appears only when speaker diarization is enabled.| Parameter | Type | Description |
|---|---|---|
| audio_format | string | Audio format. |
| channels | array[integer] | Track indexes. [0] = single-track, [0, 1] = dual-track. |
| original_sampling_rate | integer | Sample rate (Hz). |
| original_duration_in_milliseconds | integer | Audio duration (ms). |
| channel_id | integer | Track index (0-based). |
| content_duration_in_milliseconds | integer | Speech duration (ms). Only speech is transcribed and billed. Non-speech is excluded. Speech duration is usually shorter than audio duration. |
| transcript | string | Paragraph-level text. |
| sentences | array | Sentence-level results. |
| words | array | Word-level results. |
| begin_time | integer | Start time (ms). |
| end_time | integer | End time (ms). |
| text | string | Transcription text. |
| speaker_id | integer | Speaker index (0-based). Only present when diarization is enabled. |
| punctuation | string | Predicted punctuation after the word. |
Transcription class
Import with from dashscope.audio.asr import Transcription.
| Method | Signature | Description |
|---|---|---|
| async_call | @classmethod def async_call(cls, model: str, file_urls: List[str], phrase_id: str = None, api_key: str = None, workspace: str = None, **kwargs) -> TranscriptionResponse | Submit a recognition task. |
| wait | @classmethod def wait(cls, task: Union[str, TranscriptionResponse], api_key: str = None, workspace: str = None, **kwargs) -> TranscriptionResponse | Block until done (SUCCEEDED or FAILED). Returns a TranscriptionResponse. |
| fetch | @classmethod def fetch(cls, task: Union[str, TranscriptionResponse], api_key: str = None, workspace: str = None, **kwargs) -> TranscriptionResponse | Query task status. Returns a TranscriptionResponse. |