File transcription Java
For model details, see Audio file recognition - Fun-ASR/Paraformer.
Input format: Only publicly accessible file URLs (HTTP/HTTPS) are accepted. Local files and Base64-encoded audio are not supported.
Example:
Set request parameters with
If
Field descriptions:
Supported language codes by model:
Success example
Error example
The recognition result is a JSON file.
Key fields:
Use
Import with
Prerequisites
For temporary access or high-risk operations, use a temporary token instead. Tokens expire after 60 seconds and reduce leakage risk.Replace the API key in your code with the token.
Model availability
| Model | Version | Unit price | Free quota (Note) |
|---|---|---|---|
| fun-asr Currently, fun-asr-2025-11-07 | Stable | $0.000035/second | 36,000 seconds (10 hours) Valid for 90 days |
| fun-asr-2025-11-07 Improved far-field VAD over fun-asr-2025-08-25 for higher accuracy | Snapshot | $0.000035/second | 36,000 seconds (10 hours) Valid for 90 days |
| fun-asr-2025-08-25 | Snapshot | $0.000035/second | 36,000 seconds (10 hours) Valid for 90 days |
| fun-asr-mtl Currently, fun-asr-mtl-2025-08-25 | Stable | $0.000035/second | 36,000 seconds (10 hours) Valid for 90 days |
| fun-asr-mtl-2025-08-25 | Snapshot | $0.000035/second | 36,000 seconds (10 hours) Valid for 90 days |
- Supported languages:
- fun-asr, fun-asr-2025-11-07, fun-asr-mtl, and fun-asr-mtl-2025-08-25: Chinese (Mandarin, Cantonese, Wu, Minnan, Hakka, Gan, Xiang, and Jin; also supports Mandarin accents from Zhongyuan, Southwest, Jilu, Jianghuai, Lanyin, Jiaoliao, Northeast, Beijing, and Hong Kong-Taiwan regions -- including Henan, Shaanxi, Hubei, Sichuan, Chongqing, Yunnan, Guizhou, Guangdong, Guangxi, Hebei, Tianjin, Shandong, Anhui, Nanjing, Jiangsu, Hangzhou, Gansu, and Ningxia), English, Japanese, Korean, Vietnamese, Indonesian, Thai, Malay, Filipino, Arabic, Bulgarian, Croatian, Czech, Danish, Dutch, Estonian, Finnish, Greek, Hindi, Hungarian, Irish, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Slovak, Slovenian, and Swedish.
- fun-asr-2025-08-25: Mandarin and English.
- Supported sample rates: Any
- Supported audio formats: aac, amr, avi, flac, flv, m4a, mkv, mov, mp3, mp4, mpeg, ogg, opus, wav, webm, wma, wmv
Limitations
Input format: Only publicly accessible file URLs (HTTP/HTTPS) are accepted. Local files and Base64-encoded audio are not supported.
Example: https://your-domain.com/file.mp3
Set file URLs with the fileUrls parameter. Each request accepts up to 100 URLs.
- Audio formats:
aac,amr,avi,flac,flv,m4a,mkv,mov,mp3,mp4,mpeg,ogg,opus,wav,webm,wma,wmv
Many audio and video format variants exist. The API cannot guarantee correct recognition for all of them. Test your files to verify results.
- Audio sample rate: Any
- File size and duration: Up to 2 GB and 12 hours. If speaker diarization is enabled, keep the audio duration under 2 hours. For larger files, see preprocessing best practices.
- Batch size: Up to 100 file URLs per request.
- Supported languages: fun-asr, fun-asr-mtl, and their snapshot versions support Chinese and 29 other languages. fun-asr-2025-08-25 supports Chinese and English only. See Supported languages.
Request parameters
Set request parameters with TranscriptionParam builder methods.
Click to view an example
Click to view an example
| Parameter | Type | Default | Required | Description |
|---|---|---|---|---|
| model | String | - | Yes | The model for transcription. See Model availability. |
| fileUrls | List<String> | - | Yes | URLs of audio or video files to transcribe. HTTP and HTTPS are supported. Up to 100 URLs per request. |
| vocabularyId | String | - | No | Hotword vocabulary ID. Hotwords in this vocabulary apply during recognition. Disabled by default. See Customize hotwords. |
| channelId | List<Integer> | [0] | No | Audio track indexes to recognize (starting from 0). [0] recognizes the first track only; [0, 1] recognizes both. Each track is billed separately. |
| specialWordFilter | String | - | No | Sensitive words to filter during recognition. See Sensitive word filter details. |
| diarizationEnabled | Boolean | false | No | Enable speaker diarization (single-channel audio only). Results include speaker_id to distinguish speakers. See Recognition result. |
| speakerCount | Integer | - | No | Expected speaker count (2-100). Only works when diarizationEnabled is true. Guides the algorithm but does not guarantee exact output count. |
| language_hints | String[] | ["zh", "en"] | No | Language codes. Leave unset for auto-detection. See Supported languages. |
| apiKey | String | - | No | Your API key. Not needed if set as an environment variable. |
Sensitive word filter details
If specialWordFilter is not set, built-in filtering applies (matches from the Qwen Cloud sensitive word list are replaced with *).
When set, you can use these policies:
- Replace with
*: Replaces matched words with asterisks of the same length. - Filter out: Removes matched words from the result.
filter_with_signed- Type: object (optional)
- Replaces matched words with
*of the same length. - Example: "Help me test this piece of code" -> "Help me **** this piece of code"
- Field:
word_list(string array of words to replace)
filter_with_empty- Type: object (optional)
- Removes matched words from results.
- Example: "Is the game about to start?" -> "Is the game about to?"
- Field:
word_list(string array of words to remove)
system_reserved_filter- Type: boolean (default: true)
- Enables preset sensitive word rules. When true, built-in filtering applies (Qwen Cloud word list matches replaced with
*).
Supported languages
Supported language codes by model:
- fun-asr, fun-asr-2025-11-07, fun-asr-mtl, fun-asr-mtl-2025-08-25:
- zh: Chinese
- en: English
- ja: Japanese
- ko: Korean
- vi: Vietnamese
- id: Indonesian
- th: Thai
- ms: Malay
- tl: Filipino
- ar: Arabic
- bg: Bulgarian
- hr: Croatian
- cs: Czech
- da: Danish
- nl: Dutch
- et: Estonian
- fi: Finnish
- el: Greek
- hi: Hindi
- hu: Hungarian
- ga: Irish
- lv: Latvian
- lt: Lithuanian
- mt: Maltese
- pl: Polish
- pt: Portuguese
- ro: Romanian
- sk: Slovak
- sl: Slovenian
- sv: Swedish
- fun-asr-2025-08-25:
- zh: Chinese
- en: English
Set
language_hints with the parameter or parameters method of TranscriptionParam:- Set using parameter
- Set using parameters
Response
Task result (TranscriptionResult)
TranscriptionResult holds the task result.
| Method | Parameter | Return value | Description |
|---|---|---|---|
public String getRequestId() | None | requestId | Gets the request ID. |
public String getTaskId() | None | taskId | Gets the task ID. |
public TaskStatus getTaskStatus() | None | TaskStatus | Gets the task status (PENDING, RUNNING, SUCCEEDED, or FAILED). A task with multiple subtasks shows SUCCEEDED if at least one subtask succeeds. Check subtask_status for each subtask. |
public List<TranscriptionTaskResult> getResults() | None | TranscriptionTaskResult | Gets subtask results. Each file creates one subtask. |
public JsonObject getOutput() | None | JSON | Gets the result as JSON. See JSON output examples. |
JSON output examples
Success example
code and message fields appear only on error.
Subtask result (TranscriptionTaskResult)
TranscriptionTaskResult holds the result for a single file.
| Method | Parameter | Return value | Description |
|---|---|---|---|
public String getFileUrl() | None | File URL | Gets the URL of the recognized file. |
public String getTranscriptionUrl() | None | Result URL | Gets the result URL (valid for 24 hours). The result is a JSON file you can download or read via HTTP. See Recognition result. |
public TaskStatus getSubTaskStatus() | None | TaskStatus | Gets the subtask status (PENDING, RUNNING, SUCCEEDED, or FAILED). |
public String getMessage() | None | Message (may be empty) | Gets error details. Check this if a task fails. |
Recognition result
The recognition result is a JSON file.
Click to view a recognition result example
Click to view a recognition result example
| Parameter | Type | Description |
|---|---|---|
| audio_format | string | Audio format of the source file. |
| channels | array[integer] | Audio track indexes. Returns [0] for single-track, [0, 1] for dual-track, and so on. |
| original_sampling_rate | integer | Sample rate (Hz). |
| original_duration_in_milliseconds | integer | Original audio duration (ms). |
| channel_id | integer | Transcribed audio track index (starting from 0). |
| content_duration | integer | Speech content duration (ms). Billing is based on speech duration only. Non-speech content is not billed. AI-determined speech duration may differ from total audio duration. |
| transcript | string | Paragraph-level transcription. |
| sentences | array | Sentence-level transcription. |
| words | array | Word-level transcription. |
| begin_time | integer | Start timestamp (ms). |
| end_time | integer | End timestamp (ms). |
| text | string | Transcribed text. |
| speaker_id | integer | Speaker index (starting from 0). Only present when diarization is enabled. |
| punctuation | string | Predicted punctuation after the word, if any. |
Key interfaces
Query parameter class (TranscriptionQueryParam)
Use TranscriptionQueryParam when calling wait or fetch on a Transcription instance.
Create one with the static method FromTranscriptionParam.
Click to view an example
Click to view an example
| Method | Parameter | Return value | Description |
|---|---|---|---|
public static TranscriptionQueryParam FromTranscriptionParam(TranscriptionParam param, String taskId) | param: A TranscriptionParam instance, taskId: The task ID | A TranscriptionQueryParam instance | Creates a TranscriptionQueryParam instance. |
Core class (Transcription)
Import with import com.alibaba.dashscope.audio.asr.transcription.*;. Key methods:
| Method | Parameter | Return value | Description |
|---|---|---|---|
public TranscriptionResult asyncCall(TranscriptionParam param) | param: A TranscriptionParam instance | TranscriptionResult | Submits a transcription task asynchronously. |
public TranscriptionResult wait(TranscriptionQueryParam queryParam) | queryParam: A TranscriptionQueryParam instance | TranscriptionResult | Blocks until the task reaches SUCCEEDED or FAILED. |
public TranscriptionResult fetch(TranscriptionQueryParam queryParam) | queryParam: A TranscriptionQueryParam instance | TranscriptionResult | Queries the current task result. |