Skip to main content
Non-realtime

Fun-ASR recording Java SDK

File transcription Java

For model details, see Audio file recognition - Fun-ASR/Paraformer.

Prerequisites

For temporary access or high-risk operations, use a temporary token instead. Tokens expire after 60 seconds and reduce leakage risk.Replace the API key in your code with the token.

Model availability

ModelVersionUnit priceFree quota (Note)
fun-asr
Currently, fun-asr-2025-11-07
Stable$0.000035/second36,000 seconds (10 hours)
Valid for 90 days
fun-asr-2025-11-07
Improved far-field VAD over fun-asr-2025-08-25 for higher accuracy
Snapshot$0.000035/second36,000 seconds (10 hours)
Valid for 90 days
fun-asr-2025-08-25Snapshot$0.000035/second36,000 seconds (10 hours)
Valid for 90 days
fun-asr-mtl
Currently, fun-asr-mtl-2025-08-25
Stable$0.000035/second36,000 seconds (10 hours)
Valid for 90 days
fun-asr-mtl-2025-08-25Snapshot$0.000035/second36,000 seconds (10 hours)
Valid for 90 days
  • Supported languages:
    • fun-asr, fun-asr-2025-11-07, fun-asr-mtl, and fun-asr-mtl-2025-08-25: Chinese (Mandarin, Cantonese, Wu, Minnan, Hakka, Gan, Xiang, and Jin; also supports Mandarin accents from Zhongyuan, Southwest, Jilu, Jianghuai, Lanyin, Jiaoliao, Northeast, Beijing, and Hong Kong-Taiwan regions -- including Henan, Shaanxi, Hubei, Sichuan, Chongqing, Yunnan, Guizhou, Guangdong, Guangxi, Hebei, Tianjin, Shandong, Anhui, Nanjing, Jiangsu, Hangzhou, Gansu, and Ningxia), English, Japanese, Korean, Vietnamese, Indonesian, Thai, Malay, Filipino, Arabic, Bulgarian, Croatian, Czech, Danish, Dutch, Estonian, Finnish, Greek, Hindi, Hungarian, Irish, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Slovak, Slovenian, and Swedish.
    • fun-asr-2025-08-25: Mandarin and English.
  • Supported sample rates: Any
  • Supported audio formats: aac, amr, avi, flac, flv, m4a, mkv, mov, mp3, mp4, mpeg, ogg, opus, wav, webm, wma, wmv

Limitations

Input format: Only publicly accessible file URLs (HTTP/HTTPS) are accepted. Local files and Base64-encoded audio are not supported. Example: https://your-domain.com/file.mp3 Set file URLs with the fileUrls parameter. Each request accepts up to 100 URLs.
  • Audio formats: aac, amr, avi, flac, flv, m4a, mkv, mov, mp3, mp4, mpeg, ogg, opus, wav, webm, wma, wmv
Many audio and video format variants exist. The API cannot guarantee correct recognition for all of them. Test your files to verify results.
  • Audio sample rate: Any
  • File size and duration: Up to 2 GB and 12 hours. If speaker diarization is enabled, keep the audio duration under 2 hours. For larger files, see preprocessing best practices.
  • Batch size: Up to 100 file URLs per request.
  • Supported languages: fun-asr, fun-asr-mtl, and their snapshot versions support Chinese and 29 other languages. fun-asr-2025-08-25 supports Chinese and English only. See Supported languages.

Request parameters

Set request parameters with TranscriptionParam builder methods.
TranscriptionParam param = TranscriptionParam.builder()
  .model("fun-asr")
  .fileUrls(
          Arrays.asList(
                  "https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_female2.wav",
                  "https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_male2.wav"))
  .build();
ParameterTypeDefaultRequiredDescription
modelString-YesThe model for transcription. See Model availability.
fileUrlsList<String>-YesURLs of audio or video files to transcribe. HTTP and HTTPS are supported. Up to 100 URLs per request.
vocabularyIdString-NoHotword vocabulary ID. Hotwords in this vocabulary apply during recognition. Disabled by default. See Customize hotwords.
channelIdList<Integer>[0]NoAudio track indexes to recognize (starting from 0). [0] recognizes the first track only; [0, 1] recognizes both.
Each track is billed separately.
specialWordFilterString-NoSensitive words to filter during recognition. See Sensitive word filter details.
diarizationEnabledBooleanfalseNoEnable speaker diarization (single-channel audio only). Results include speaker_id to distinguish speakers. See Recognition result.
speakerCountInteger-NoExpected speaker count (2-100). Only works when diarizationEnabled is true. Guides the algorithm but does not guarantee exact output count.
language_hintsString[]["zh", "en"]NoLanguage codes. Leave unset for auto-detection. See Supported languages.
apiKeyString-NoYour API key. Not needed if set as an environment variable.

Sensitive word filter details

If specialWordFilter is not set, built-in filtering applies (matches from the Qwen Cloud sensitive word list are replaced with *). When set, you can use these policies:
  • Replace with *: Replaces matched words with asterisks of the same length.
  • Filter out: Removes matched words from the result.
The value must be a JSON string:
{
  "filter_with_signed": {
    "word_list": ["test"]
  },
  "filter_with_empty": {
    "word_list": ["start", "happen"]
  },
  "system_reserved_filter": true
}
Field descriptions:
  • filter_with_signed
    • Type: object (optional)
    • Replaces matched words with * of the same length.
    • Example: "Help me test this piece of code" -> "Help me **** this piece of code"
    • Field: word_list (string array of words to replace)
  • filter_with_empty
    • Type: object (optional)
    • Removes matched words from results.
    • Example: "Is the game about to start?" -> "Is the game about to?"
    • Field: word_list (string array of words to remove)
  • system_reserved_filter
    • Type: boolean (default: true)
    • Enables preset sensitive word rules. When true, built-in filtering applies (Qwen Cloud word list matches replaced with *).

Supported languages

Supported language codes by model:
  • fun-asr, fun-asr-2025-11-07, fun-asr-mtl, fun-asr-mtl-2025-08-25:
    • zh: Chinese
    • en: English
    • ja: Japanese
    • ko: Korean
    • vi: Vietnamese
    • id: Indonesian
    • th: Thai
    • ms: Malay
    • tl: Filipino
    • ar: Arabic
    • bg: Bulgarian
    • hr: Croatian
    • cs: Czech
    • da: Danish
    • nl: Dutch
    • et: Estonian
    • fi: Finnish
    • el: Greek
    • hi: Hindi
    • hu: Hungarian
    • ga: Irish
    • lv: Latvian
    • lt: Lithuanian
    • mt: Maltese
    • pl: Polish
    • pt: Portuguese
    • ro: Romanian
    • sk: Slovak
    • sl: Slovenian
    • sv: Swedish
  • fun-asr-2025-08-25:
    • zh: Chinese
    • en: English
Set language_hints with the parameter or parameters method of TranscriptionParam:
  • Set using parameter
  • Set using parameters
TranscriptionParam param = TranscriptionParam.builder()
  .model("fun-asr")
  .parameter("language_hints", new String[]{"zh", "en"})
  .build();

Response

Task result (TranscriptionResult)

TranscriptionResult holds the task result.
MethodParameterReturn valueDescription
public String getRequestId()NonerequestIdGets the request ID.
public String getTaskId()NonetaskIdGets the task ID.
public TaskStatus getTaskStatus()NoneTaskStatusGets the task status (PENDING, RUNNING, SUCCEEDED, or FAILED).
A task with multiple subtasks shows SUCCEEDED if at least one subtask succeeds. Check subtask_status for each subtask.
public List<TranscriptionTaskResult> getResults()NoneTranscriptionTaskResultGets subtask results. Each file creates one subtask.
public JsonObject getOutput()NoneJSONGets the result as JSON. See JSON output examples.

JSON output examples

Success example
{
  "task_id":"0795ff8c-b666-4e91-bb8b-xxx",
  "task_status":"SUCCEEDED",
  "submit_time":"2025-02-13 16:12:09.109",
  "scheduled_time":"2025-02-13 16:12:09.128",
  "end_time":"2025-02-13 16:12:10.189",
  "results":[
    {
      "file_url":"https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_male2.wav",
      "transcription_url":"https://dashscope-result-bj.oss-cn-beijing.aliyuncs.com/prod/paraformer-v2/20250213/16%3A12/34604a7b-579a-4223-8797-5116a49b07ec-1.json?Expires=1739520730&OSSAccessKeyId=yourOSSAccessKeyId&Signature=tMqyH56oB5rDW9%2FFqD8Yo%2F3WaPk%3D",
      "subtask_status":"SUCCEEDED"
    },
    {
      "file_url":"https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_female2.wav",
      "transcription_url":"https://dashscope-result-bj.oss-cn-beijing.aliyuncs.com/prod/paraformer-v2/20250213/16%3A12/3baafe5f-d09d-46c6-8b01-724927670edb-1.json?Expires=1739520730&OSSAccessKeyId=yourOSSAccessKeyId&Signature=BF7vPxlsJN9hkJlY%2BLReezxOwK8%3D",
      "subtask_status":"SUCCEEDED"
    }
  ],
  "task_metrics":{
    "TOTAL":2,
    "SUCCEEDED":2,
    "FAILED":0
  }
}
Error example code and message fields appear only on error.
{
  "task_id": "7bac899c-06ec-4a79-8875-xxxxxxxxxxxx",
  "task_status": "SUCCEEDED",
  "submit_time": "2024-12-16 16:30:59.170",
  "scheduled_time": "2024-12-16 16:30:59.204",
  "end_time": "2024-12-16 16:31:02.375",
  "results": [
    {
      "file_url": "https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/sensevoice/long_audio_demo_cn.mp3",
      "transcription_url": "https://dashscope-result-bj.oss-cn-beijing.aliyuncs.com/prod/paraformer-v2/20241216/xxxx",
      "subtask_status": "SUCCEEDED"
    },
    {
      "file_url": "https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/sensevoice/rich_text_exaple_1.wav",
      "code": "InvalidFile.DownloadFailed",
      "message": "The audio file cannot be downloaded.",
      "subtask_status": "FAILED"
    }
  ],
  "task_metrics": {
    "TOTAL": 2,
    "SUCCEEDED": 1,
    "FAILED": 1
  }
}

Subtask result (TranscriptionTaskResult)

TranscriptionTaskResult holds the result for a single file.
MethodParameterReturn valueDescription
public String getFileUrl()NoneFile URLGets the URL of the recognized file.
public String getTranscriptionUrl()NoneResult URLGets the result URL (valid for 24 hours). The result is a JSON file you can download or read via HTTP. See Recognition result.
public TaskStatus getSubTaskStatus()NoneTaskStatusGets the subtask status (PENDING, RUNNING, SUCCEEDED, or FAILED).
public String getMessage()NoneMessage (may be empty)Gets error details. Check this if a task fails.

Recognition result

The recognition result is a JSON file.
{
  "file_url":"https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_female2.wav",
  "properties":{
    "audio_format":"pcm_s16le",
    "channels":[
      0
    ],
    "original_sampling_rate":16000,
    "original_duration_in_milliseconds":3834
  },
  "transcripts":[
    {
      "channel_id":0,
      "content_duration_in_milliseconds":3720,
      "text":"Hello world, this is Alibaba Speech Lab.",
      "sentences":[
        {
          "begin_time":100,
          "end_time":3820,
          "text":"Hello world, this is Alibaba Speech Lab.",
          "sentence_id":1,
          "speaker_id":0,
          "words":[
            {
              "begin_time":100,
              "end_time":596,
              "text":"Hello ",
              "punctuation":""
            },
            {
              "begin_time":596,
              "end_time":844,
              "text":"world",
              "punctuation":", "
            }
          ]
        }
      ]
    }
  ]
}
Key fields:
ParameterTypeDescription
audio_formatstringAudio format of the source file.
channelsarray[integer]Audio track indexes. Returns [0] for single-track, [0, 1] for dual-track, and so on.
original_sampling_rateintegerSample rate (Hz).
original_duration_in_millisecondsintegerOriginal audio duration (ms).
channel_idintegerTranscribed audio track index (starting from 0).
content_durationintegerSpeech content duration (ms).
Billing is based on speech duration only. Non-speech content is not billed. AI-determined speech duration may differ from total audio duration.
transcriptstringParagraph-level transcription.
sentencesarraySentence-level transcription.
wordsarrayWord-level transcription.
begin_timeintegerStart timestamp (ms).
end_timeintegerEnd timestamp (ms).
textstringTranscribed text.
speaker_idintegerSpeaker index (starting from 0). Only present when diarization is enabled.
punctuationstringPredicted punctuation after the word, if any.

Key interfaces

Query parameter class (TranscriptionQueryParam)

Use TranscriptionQueryParam when calling wait or fetch on a Transcription instance. Create one with the static method FromTranscriptionParam.
// Create transcription request parameters.
TranscriptionParam param =
    TranscriptionParam.builder()
        // If you have not configured the API key as an environment variable, replace apiKey with your own API key.
        //.apiKey("apikey")
        .model("fun-asr")
        .fileUrls(
            Arrays.asList(
                "https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_female2.wav",
                "https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_male2.wav"))
        .build();
try {
  Transcription transcription = new Transcription();
  // Submit the transcription request.
  TranscriptionResult result = transcription.asyncCall(param);
  System.out.println("RequestId: " + result.getRequestId());
  TranscriptionQueryParam queryParam = TranscriptionQueryParam.FromTranscriptionParam(param, result.getTaskId());

} catch (Exception e) {
  System.out.println("error: " + e);
}
MethodParameterReturn valueDescription
public static TranscriptionQueryParam FromTranscriptionParam(TranscriptionParam param, String taskId)param: A TranscriptionParam instance, taskId: The task IDA TranscriptionQueryParam instanceCreates a TranscriptionQueryParam instance.

Core class (Transcription)

Import with import com.alibaba.dashscope.audio.asr.transcription.*;. Key methods:
MethodParameterReturn valueDescription
public TranscriptionResult asyncCall(TranscriptionParam param)param: A TranscriptionParam instanceTranscriptionResultSubmits a transcription task asynchronously.
public TranscriptionResult wait(TranscriptionQueryParam queryParam)queryParam: A TranscriptionQueryParam instanceTranscriptionResultBlocks until the task reaches SUCCEEDED or FAILED.
public TranscriptionResult fetch(TranscriptionQueryParam queryParam)queryParam: A TranscriptionQueryParam instanceTranscriptionResultQueries the current task result.
Fun-ASR recording Java SDK - Qwen Cloud