File transcription Java
For model details, see Audio file recognition - Fun-ASR/Paraformer.
Input format: Only publicly accessible file URLs (HTTP/HTTPS) are accepted. Local files and Base64-encoded audio are not supported.
Example:
The Transcription class provides async and sync methods for submitting tasks and retrieving results. Two approaches:
Set request parameters with
If
Field descriptions:
Supported language codes by model:
Success example
Error example
The recognition result is a JSON file.
Key fields:
Use
Import with
Prerequisites
For temporary access or high-risk operations, use a temporary token instead. Tokens expire after 60 seconds and reduce leakage risk.Replace the API key in your code with the token.
Model availability
| Model | Version | Unit price | Free quota (Note) |
|---|---|---|---|
| fun-asr Currently, fun-asr-2025-11-07 | Stable | $0.000035/second | 36,000 seconds (10 hours) Valid for 90 days |
| fun-asr-2025-11-07 Improved far-field VAD over fun-asr-2025-08-25 for higher accuracy | Snapshot | $0.000035/second | 36,000 seconds (10 hours) Valid for 90 days |
| fun-asr-2025-08-25 | Snapshot | $0.000035/second | 36,000 seconds (10 hours) Valid for 90 days |
| fun-asr-mtl Currently, fun-asr-mtl-2025-08-25 | Stable | $0.000035/second | 36,000 seconds (10 hours) Valid for 90 days |
| fun-asr-mtl-2025-08-25 | Snapshot | $0.000035/second | 36,000 seconds (10 hours) Valid for 90 days |
- Supported languages:
- fun-asr and fun-asr-2025-11-07: Mandarin, Cantonese, Wu, Minnan, Hakka, Gan, Xiang, and Jin. Also supports Mandarin accents from Zhongyuan, Southwest, Jilu, Jianghuai, Lanyin, Jiaoliao, Northeast, Beijing, and Hong Kong-Taiwan regions -- including Henan, Shaanxi, Hubei, Sichuan, Chongqing, Yunnan, Guizhou, Guangdong, Guangxi, Hebei, Tianjin, Shandong, Anhui, Nanjing, Jiangsu, Hangzhou, Gansu, and Ningxia. Also supports English and Japanese.
- fun-asr-2025-08-25: Mandarin and English.
- fun-asr-mtl and fun-asr-mtl-2025-08-25: Mandarin, Cantonese, English, Japanese, Korean, Vietnamese, Indonesian, Thai, Malay, Filipino, Arabic, Hindi, Bulgarian, Croatian, Czech, Danish, Dutch, Estonian, Finnish, Greek, Hungarian, Irish, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Slovak, Slovenian, and Swedish.
- Supported sample rates: Any
- Supported audio formats: aac, amr, avi, flac, flv, m4a, mkv, mov, mp3, mp4, mpeg, ogg, opus, wav, webm, wma, wmv
Limitations
Input format: Only publicly accessible file URLs (HTTP/HTTPS) are accepted. Local files and Base64-encoded audio are not supported.
Example: https://your-domain.com/file.mp3
Set file URLs with the fileUrls parameter. Each request accepts up to 100 URLs.
- Audio formats:
aac,amr,avi,flac,flv,m4a,mkv,mov,mp3,mp4,mpeg,ogg,opus,wav,webm,wma,wmv
Many audio and video format variants exist. The API cannot guarantee correct recognition for all of them. Test your files to verify results.
- Audio sample rate: Any
- File size and duration: Up to 2 GB and 12 hours. For larger files, see preprocessing best practices.
- Batch size: Up to 100 file URLs per request.
- Supported languages: fun-asr supports Chinese and English. fun-asr-mtl-2025-08-25 supports Chinese, Cantonese, English, Japanese, Thai, Vietnamese, and Indonesian.
Getting started
The Transcription class provides async and sync methods for submitting tasks and retrieving results. Two approaches:
- Submit a task and block until it completes.
- Submit a task and poll for results.
Async submission + sync wait
1
Set request parameters
Set request parameters.
2
Create a Transcription instance
Create a Transcription instance.
3
Submit the task
Call
asyncCall on the Transcription instance to submit the task.Tasks enter
PENDING state after submission. Queue time depends on queue length and file duration (usually a few minutes). Recognition runs at accelerated speed once processing starts.Results and download URLs expire after 24 hours.4
Wait for the task to finish
Call
wait to block until the task reaches SUCCEEDED or FAILED status.Returns TranscriptionResult.Click to view the complete example
Click to view the complete example
Async submission + async query
1
Set request parameters
Set request parameters.
2
Create a Transcription instance
Create a Transcription instance.
3
Submit the task
Call
asyncCall on the Transcription instance to submit the task.Tasks enter
PENDING state after submission. Queue time depends on queue length and file duration (usually a few minutes). Recognition runs at accelerated speed once processing starts.Results and download URLs expire after 24 hours.4
Poll for the result
Call
fetch repeatedly until the task reaches SUCCEEDED or FAILED status.Returns TranscriptionResult.Click to view the complete example
Click to view the complete example
Request parameters
Set request parameters with TranscriptionParam builder methods.
Click to view an example
Click to view an example
| Parameter | Type | Default | Required | Description |
|---|---|---|---|---|
| model | String | - | Yes | The model for transcription. See Model availability. |
| fileUrls | List<String> | - | Yes | URLs of audio or video files to transcribe. HTTP and HTTPS are supported. Up to 100 URLs per request. |
| vocabularyId | String | - | No | Hotword vocabulary ID. Hotwords in this vocabulary apply during recognition. Disabled by default. See Customize hotwords. |
| channelId | List<Integer> | [0] | No | Audio track indexes to recognize (starting from 0). [0] recognizes the first track only; [0, 1] recognizes both. Each track is billed separately. |
| specialWordFilter | String | - | No | Sensitive words to filter during recognition. See Sensitive word filter details. |
| diarizationEnabled | Boolean | false | No | Enable speaker diarization (single-channel audio only). Results include speaker_id to distinguish speakers. See Recognition result. |
| speakerCount | Integer | - | No | Expected speaker count (2-100). Only works when diarizationEnabled is true. Guides the algorithm but does not guarantee exact output count. |
| language_hints | String[] | ["zh", "en"] | No | Language codes. Leave unset for auto-detection. See Supported languages. |
| apiKey | String | - | No | Your API key. Not needed if set as an environment variable. |
Sensitive word filter details
If specialWordFilter is not set, built-in filtering applies (matches from the Qwen Cloud sensitive word list are replaced with *).
When set, you can use these policies:
- Replace with
*: Replaces matched words with asterisks of the same length. - Filter out: Removes matched words from the result.
filter_with_signed- Type: object (optional)
- Replaces matched words with
*of the same length. - Example: "Help me test this piece of code" -> "Help me **** this piece of code"
- Field:
word_list(string array of words to replace)
filter_with_empty- Type: object (optional)
- Removes matched words from results.
- Example: "Is the game about to start?" -> "Is the game about to?"
- Field:
word_list(string array of words to remove)
system_reserved_filter- Type: boolean (default: true)
- Enables preset sensitive word rules. When true, built-in filtering applies (Qwen Cloud word list matches replaced with
*).
Supported languages
Supported language codes by model:
- fun-asr, fun-asr-2025-11-07:
- zh: Chinese
- en: English
- ja: Japanese
- fun-asr-2025-08-25:
- zh: Chinese
- en: English
- fun-asr-mtl, fun-asr-mtl-2025-08-25:
- zh: Chinese
- en: English
- ja: Japanese
- ko: Korean
- vi: Vietnamese
- id: Indonesian
- th: Thai
- ms: Malay
- tl: Filipino
- ar: Arabic
- hi: Hindi
- bg: Bulgarian
- hr: Croatian
- cs: Czech
- da: Danish
- nl: Dutch
- et: Estonian
- fi: Finnish
- el: Greek
- hu: Hungarian
- ga: Irish
- lv: Latvian
- lt: Lithuanian
- mt: Maltese
- pl: Polish
- pt: Portuguese
- ro: Romanian
- sk: Slovak
- sl: Slovenian
- sv: Swedish
Set
language_hints with the parameter or parameters method of TranscriptionParam:- Set using parameter
- Set using parameters
Response
Task result (TranscriptionResult)
TranscriptionResult holds the task result.
| Method | Parameter | Return value | Description |
|---|---|---|---|
public String getRequestId() | None | requestId | Gets the request ID. |
public String getTaskId() | None | taskId | Gets the task ID. |
public TaskStatus getTaskStatus() | None | TaskStatus | Gets the task status (PENDING, RUNNING, SUCCEEDED, or FAILED). A task with multiple subtasks shows SUCCEEDED if at least one subtask succeeds. Check subtask_status for each subtask. |
public List<TranscriptionTaskResult> getResults() | None | TranscriptionTaskResult | Gets subtask results. Each file creates one subtask. |
public JsonObject getOutput() | None | JSON | Gets the result as JSON. See JSON output examples. |
JSON output examples
Success example
code and message fields appear only on error.
Subtask result (TranscriptionTaskResult)
TranscriptionTaskResult holds the result for a single file.
| Method | Parameter | Return value | Description |
|---|---|---|---|
public String getFileUrl() | None | File URL | Gets the URL of the recognized file. |
public String getTranscriptionUrl() | None | Result URL | Gets the result URL (valid for 24 hours). The result is a JSON file you can download or read via HTTP. See Recognition result. |
public TaskStatus getSubTaskStatus() | None | TaskStatus | Gets the subtask status (PENDING, RUNNING, SUCCEEDED, or FAILED). |
public String getMessage() | None | Message (may be empty) | Gets error details. Check this if a task fails. |
Recognition result
The recognition result is a JSON file.
Click to view a recognition result example
Click to view a recognition result example
| Parameter | Type | Description |
|---|---|---|
| audio_format | string | Audio format of the source file. |
| channels | array[integer] | Audio track indexes. Returns [0] for single-track, [0, 1] for dual-track, and so on. |
| original_sampling_rate | integer | Sample rate (Hz). |
| original_duration_in_milliseconds | integer | Original audio duration (ms). |
| channel_id | integer | Transcribed audio track index (starting from 0). |
| content_duration | integer | Speech content duration (ms). Billing is based on speech duration only. Non-speech content is not billed. AI-determined speech duration may differ from total audio duration. |
| transcript | string | Paragraph-level transcription. |
| sentences | array | Sentence-level transcription. |
| words | array | Word-level transcription. |
| begin_time | integer | Start timestamp (ms). |
| end_time | integer | End timestamp (ms). |
| text | string | Transcribed text. |
| speaker_id | integer | Speaker index (starting from 0). Only present when diarization is enabled. |
| punctuation | string | Predicted punctuation after the word, if any. |
Key interfaces
Query parameter class (TranscriptionQueryParam)
Use TranscriptionQueryParam when calling wait or fetch on a Transcription instance.
Create one with the static method FromTranscriptionParam.
Click to view an example
Click to view an example
| Method | Parameter | Return value | Description |
|---|---|---|---|
public static TranscriptionQueryParam FromTranscriptionParam(TranscriptionParam param, String taskId) | param: A TranscriptionParam instance, taskId: The task ID | A TranscriptionQueryParam instance | Creates a TranscriptionQueryParam instance. |
Core class (Transcription)
Import with import com.alibaba.dashscope.audio.asr.transcription.*;. Key methods:
| Method | Parameter | Return value | Description |
|---|---|---|---|
public TranscriptionResult asyncCall(TranscriptionParam param) | param: A TranscriptionParam instance | TranscriptionResult | Submits a transcription task asynchronously. |
public TranscriptionResult wait(TranscriptionQueryParam queryParam) | queryParam: A TranscriptionQueryParam instance | TranscriptionResult | Blocks until the task reaches SUCCEEDED or FAILED. |
public TranscriptionResult fetch(TranscriptionQueryParam queryParam) | queryParam: A TranscriptionQueryParam instance | TranscriptionResult | Queries the current task result. |