Fun-ASR recording Java SDK

For model details, see Audio file recognition - Fun-ASR/Paraformer.

Prerequisites

Get an API key and set it as an environment variable.

For temporary access or high-risk operations, use a temporary token instead. Tokens expire after 60 seconds and reduce leakage risk.Replace the API key in your code with the token.

Install the latest DashScope SDK.

Model availability

Model	Version	Unit price	Free quota (Note)
fun-asr Currently, fun-asr-2025-11-07	Stable	$0.000035/second	36,000 seconds (10 hours) Valid for 90 days
fun-asr-2025-11-07 Improved far-field VAD over fun-asr-2025-08-25 for higher accuracy	Snapshot	$0.000035/second	36,000 seconds (10 hours) Valid for 90 days
fun-asr-2025-08-25	Snapshot	$0.000035/second	36,000 seconds (10 hours) Valid for 90 days
fun-asr-mtl Currently, fun-asr-mtl-2025-08-25	Stable	$0.000035/second	36,000 seconds (10 hours) Valid for 90 days
fun-asr-mtl-2025-08-25	Snapshot	$0.000035/second	36,000 seconds (10 hours) Valid for 90 days

Supported languages:
- fun-asr, fun-asr-2025-11-07, fun-asr-mtl, and fun-asr-mtl-2025-08-25: Chinese (Mandarin, Cantonese, Wu, Minnan, Hakka, Gan, Xiang, and Jin; also supports Mandarin accents from Zhongyuan, Southwest, Jilu, Jianghuai, Lanyin, Jiaoliao, Northeast, Beijing, and Hong Kong-Taiwan regions -- including Henan, Shaanxi, Hubei, Sichuan, Chongqing, Yunnan, Guizhou, Guangdong, Guangxi, Hebei, Tianjin, Shandong, Anhui, Nanjing, Jiangsu, Hangzhou, Gansu, and Ningxia), English, Japanese, Korean, Vietnamese, Indonesian, Thai, Malay, Filipino, Arabic, Bulgarian, Croatian, Czech, Danish, Dutch, Estonian, Finnish, Greek, Hindi, Hungarian, Irish, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Slovak, Slovenian, and Swedish.
- fun-asr-2025-08-25: Mandarin and English.
Supported sample rates: Any
Supported audio formats: aac, amr, avi, flac, flv, m4a, mkv, mov, mp3, mp4, mpeg, ogg, opus, wav, webm, wma, wmv

Limitations

Input format: Only publicly accessible file URLs (HTTP/HTTPS) are accepted. Local files and Base64-encoded audio are not supported. Example: https://your-domain.com/file.mp3 Set file URLs with the fileUrls parameter. Each request accepts up to 100 URLs.

Audio formats: aac, amr, avi, flac, flv, m4a, mkv, mov, mp3, mp4, mpeg, ogg, opus, wav, webm, wma, wmv

Many audio and video format variants exist. The API cannot guarantee correct recognition for all of them. Test your files to verify results.

Audio sample rate: Any
File size and duration: Up to 2 GB and 12 hours. If speaker diarization is enabled, keep the audio duration under 2 hours. For larger files, see preprocessing best practices.
Batch size: Up to 100 file URLs per request.
Supported languages: fun-asr, fun-asr-mtl, and their snapshot versions support Chinese and 29 other languages. fun-asr-2025-08-25 supports Chinese and English only. See Supported languages.

Request parameters

Set request parameters with TranscriptionParam builder methods.

Click to view an example

TranscriptionParam param = TranscriptionParam.builder()
  .model("fun-asr")
  .fileUrls(
          Arrays.asList(
                  "https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_female2.wav",
                  "https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_male2.wav"))
  .build();

Parameter	Type	Default	Required	Description
model	String	-	Yes	The model for transcription. See Model availability.
fileUrls	List<String>	-	Yes	URLs of audio or video files to transcribe. HTTP and HTTPS are supported. Up to 100 URLs per request.
vocabularyId	String	-	No	Hotword vocabulary ID. Hotwords in this vocabulary apply during recognition. Disabled by default. See Customize hotwords.
channelId	List<Integer>	[0]	No	Audio track indexes to recognize (starting from 0). [0] recognizes the first track only; [0, 1] recognizes both. Each track is billed separately.
specialWordFilter	String	-	No	Sensitive words to filter during recognition. See Sensitive word filter details.
diarizationEnabled	Boolean	false	No	Enable speaker diarization (single-channel audio only). Results include `speaker_id` to distinguish speakers. See Recognition result.
speakerCount	Integer	-	No	Expected speaker count (2-100). Only works when `diarizationEnabled` is true. Guides the algorithm but does not guarantee exact output count.
language_hints	String[]	["zh", "en"]	No	Language codes. Leave unset for auto-detection. See Supported languages.
apiKey	String	-	No	Your API key. Not needed if set as an environment variable.

Sensitive word filter details

If specialWordFilter is not set, built-in filtering applies (matches from the Qwen Cloud sensitive word list are replaced with *). When set, you can use these policies:

Replace with *: Replaces matched words with asterisks of the same length.
Filter out: Removes matched words from the result.

The value must be a JSON string:

{
  "filter_with_signed": {
    "word_list": ["test"]
  },
  "filter_with_empty": {
    "word_list": ["start", "happen"]
  },
  "system_reserved_filter": true
}

Field descriptions:

filter_with_signed
- Type: object (optional)
- Replaces matched words with * of the same length.
- Example: "Help me test this piece of code" -> "Help me **** this piece of code"
- Field: word_list (string array of words to replace)
filter_with_empty
- Type: object (optional)
- Removes matched words from results.
- Example: "Is the game about to start?" -> "Is the game about to?"
- Field: word_list (string array of words to remove)
system_reserved_filter
- Type: boolean (default: true)
- Enables preset sensitive word rules. When true, built-in filtering applies (Qwen Cloud word list matches replaced with *).

Supported languages

Supported language codes by model:

fun-asr, fun-asr-2025-11-07, fun-asr-mtl, fun-asr-mtl-2025-08-25:
- zh: Chinese
- en: English
- ja: Japanese
- ko: Korean
- vi: Vietnamese
- id: Indonesian
- th: Thai
- ms: Malay
- tl: Filipino
- ar: Arabic
- bg: Bulgarian
- hr: Croatian
- cs: Czech
- da: Danish
- nl: Dutch
- et: Estonian
- fi: Finnish
- el: Greek
- hi: Hindi
- hu: Hungarian
- ga: Irish
- lv: Latvian
- lt: Lithuanian
- mt: Maltese
- pl: Polish
- pt: Portuguese
- ro: Romanian
- sk: Slovak
- sl: Slovenian
- sv: Swedish
fun-asr-2025-08-25:
- zh: Chinese
- en: English

Set language_hints with the parameter or parameters method of TranscriptionParam:

Set using parameter
Set using parameters

TranscriptionParam param = TranscriptionParam.builder()
  .model("fun-asr")
  .parameter("language_hints", new String[]{"zh", "en"})
  .build();

Response

Task result (`TranscriptionResult`)

TranscriptionResult holds the task result.

Method	Parameter	Return value	Description
`public String getRequestId()`	None	requestId	Gets the request ID.
`public String getTaskId()`	None	taskId	Gets the task ID.
`public TaskStatus getTaskStatus()`	None	`TaskStatus`	Gets the task status (`PENDING`, `RUNNING`, `SUCCEEDED`, or `FAILED`). A task with multiple subtasks shows `SUCCEEDED` if at least one subtask succeeds. Check `subtask_status` for each subtask.
`public List<TranscriptionTaskResult> getResults()`	None	TranscriptionTaskResult	Gets subtask results. Each file creates one subtask.
`public JsonObject getOutput()`	None	JSON	Gets the result as JSON. See JSON output examples.

JSON output examples

Success example

{
  "task_id":"0795ff8c-b666-4e91-bb8b-xxx",
  "task_status":"SUCCEEDED",
  "submit_time":"2025-02-13 16:12:09.109",
  "scheduled_time":"2025-02-13 16:12:09.128",
  "end_time":"2025-02-13 16:12:10.189",
  "results":[
    {
      "file_url":"https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_male2.wav",
      "transcription_url":"https://dashscope-result-bj.oss-cn-beijing.aliyuncs.com/prod/paraformer-v2/20250213/16%3A12/34604a7b-579a-4223-8797-5116a49b07ec-1.json?Expires=1739520730&OSSAccessKeyId=yourOSSAccessKeyId&Signature=tMqyH56oB5rDW9%2FFqD8Yo%2F3WaPk%3D",
      "subtask_status":"SUCCEEDED"
    },
    {
      "file_url":"https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_female2.wav",
      "transcription_url":"https://dashscope-result-bj.oss-cn-beijing.aliyuncs.com/prod/paraformer-v2/20250213/16%3A12/3baafe5f-d09d-46c6-8b01-724927670edb-1.json?Expires=1739520730&OSSAccessKeyId=yourOSSAccessKeyId&Signature=BF7vPxlsJN9hkJlY%2BLReezxOwK8%3D",
      "subtask_status":"SUCCEEDED"
    }
  ],
  "task_metrics":{
    "TOTAL":2,
    "SUCCEEDED":2,
    "FAILED":0
  }
}

Error example code and message fields appear only on error.

{
  "task_id": "7bac899c-06ec-4a79-8875-xxxxxxxxxxxx",
  "task_status": "SUCCEEDED",
  "submit_time": "2024-12-16 16:30:59.170",
  "scheduled_time": "2024-12-16 16:30:59.204",
  "end_time": "2024-12-16 16:31:02.375",
  "results": [
    {
      "file_url": "https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/sensevoice/long_audio_demo_cn.mp3",
      "transcription_url": "https://dashscope-result-bj.oss-cn-beijing.aliyuncs.com/prod/paraformer-v2/20241216/xxxx",
      "subtask_status": "SUCCEEDED"
    },
    {
      "file_url": "https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/sensevoice/rich_text_exaple_1.wav",
      "code": "InvalidFile.DownloadFailed",
      "message": "The audio file cannot be downloaded.",
      "subtask_status": "FAILED"
    }
  ],
  "task_metrics": {
    "TOTAL": 2,
    "SUCCEEDED": 1,
    "FAILED": 1
  }
}

Subtask result (`TranscriptionTaskResult`)

TranscriptionTaskResult holds the result for a single file.

Method	Parameter	Return value	Description
`public String getFileUrl()`	None	File URL	Gets the URL of the recognized file.
`public String getTranscriptionUrl()`	None	Result URL	Gets the result URL (valid for 24 hours). The result is a JSON file you can download or read via HTTP. See Recognition result.
`public TaskStatus getSubTaskStatus()`	None	`TaskStatus`	Gets the subtask status (`PENDING`, `RUNNING`, `SUCCEEDED`, or `FAILED`).
`public String getMessage()`	None	Message (may be empty)	Gets error details. Check this if a task fails.

Recognition result

The recognition result is a JSON file.

Click to view a recognition result example

{
  "file_url":"https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_female2.wav",
  "properties":{
    "audio_format":"pcm_s16le",
    "channels":[
      0
    ],
    "original_sampling_rate":16000,
    "original_duration_in_milliseconds":3834
  },
  "transcripts":[
    {
      "channel_id":0,
      "content_duration_in_milliseconds":3720,
      "text":"Hello world, this is Alibaba Speech Lab.",
      "sentences":[
        {
          "begin_time":100,
          "end_time":3820,
          "text":"Hello world, this is Alibaba Speech Lab.",
          "sentence_id":1,
          "speaker_id":0,
          "words":[
            {
              "begin_time":100,
              "end_time":596,
              "text":"Hello ",
              "punctuation":""
            },
            {
              "begin_time":596,
              "end_time":844,
              "text":"world",
              "punctuation":", "
            }
          ]
        }
      ]
    }
  ]
}

Key fields:

Parameter	Type	Description
audio_format	string	Audio format of the source file.
channels	array[integer]	Audio track indexes. Returns [0] for single-track, [0, 1] for dual-track, and so on.
original_sampling_rate	integer	Sample rate (Hz).
original_duration_in_milliseconds	integer	Original audio duration (ms).
channel_id	integer	Transcribed audio track index (starting from 0).
content_duration	integer	Speech content duration (ms). Billing is based on speech duration only. Non-speech content is not billed. AI-determined speech duration may differ from total audio duration.
transcript	string	Paragraph-level transcription.
sentences	array	Sentence-level transcription.
words	array	Word-level transcription.
begin_time	integer	Start timestamp (ms).
end_time	integer	End timestamp (ms).
text	string	Transcribed text.
speaker_id	integer	Speaker index (starting from 0). Only present when diarization is enabled.
punctuation	string	Predicted punctuation after the word, if any.

Key interfaces

Query parameter class (`TranscriptionQueryParam`)

Use TranscriptionQueryParam when calling wait or fetch on a Transcription instance. Create one with the static method FromTranscriptionParam.

Click to view an example

// Create transcription request parameters.
TranscriptionParam param =
    TranscriptionParam.builder()
        // If you have not configured the API key as an environment variable, replace apiKey with your own API key.
        //.apiKey("apikey")
        .model("fun-asr")
        .fileUrls(
            Arrays.asList(
                "https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_female2.wav",
                "https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_male2.wav"))
        .build();
try {
  Transcription transcription = new Transcription();
  // Submit the transcription request.
  TranscriptionResult result = transcription.asyncCall(param);
  System.out.println("RequestId: " + result.getRequestId());
  TranscriptionQueryParam queryParam = TranscriptionQueryParam.FromTranscriptionParam(param, result.getTaskId());

} catch (Exception e) {
  System.out.println("error: " + e);
}

Method	Parameter	Return value	Description
`public static TranscriptionQueryParam FromTranscriptionParam(TranscriptionParam param, String taskId)`	`param`: A `TranscriptionParam` instance, `taskId`: The task ID	A `TranscriptionQueryParam` instance	Creates a `TranscriptionQueryParam` instance.

Core class (`Transcription`)

Import with import com.alibaba.dashscope.audio.asr.transcription.*;. Key methods:

Method	Parameter	Return value	Description
`public TranscriptionResult asyncCall(TranscriptionParam param)`	`param`: A `TranscriptionParam` instance	TranscriptionResult	Submits a transcription task asynchronously.
`public TranscriptionResult wait(TranscriptionQueryParam queryParam)`	`queryParam`: A `TranscriptionQueryParam` instance	TranscriptionResult	Blocks until the task reaches `SUCCEEDED` or `FAILED`.
`public TranscriptionResult fetch(TranscriptionQueryParam queryParam)`	`queryParam`: A `TranscriptionQueryParam` instance	TranscriptionResult	Queries the current task result.

​Prerequisites

​Model availability

​Limitations

​Request parameters

​Sensitive word filter details

​Supported languages

​Response

​Task result (TranscriptionResult)

​JSON output examples

​Subtask result (TranscriptionTaskResult)

​Recognition result

​Key interfaces

​Query parameter class (TranscriptionQueryParam)

​Core class (Transcription)

Prerequisites

Model availability

Limitations

Request parameters

Sensitive word filter details

Supported languages

Response

Task result (`TranscriptionResult`)

JSON output examples

Subtask result (`TranscriptionTaskResult`)

Recognition result

Key interfaces

Query parameter class (`TranscriptionQueryParam`)

Core class (`Transcription`)