Skip to main content
Non-realtime

Fun-ASR recording Java SDK

File transcription Java

For model details, see Audio file recognition - Fun-ASR/Paraformer.

Prerequisites

For temporary access or high-risk operations, use a temporary token instead. Tokens expire after 60 seconds and reduce leakage risk.Replace the API key in your code with the token.

Model availability

ModelVersionUnit priceFree quota (Note)
fun-asr
Currently, fun-asr-2025-11-07
Stable$0.000035/second36,000 seconds (10 hours)
Valid for 90 days
fun-asr-2025-11-07
Improved far-field VAD over fun-asr-2025-08-25 for higher accuracy
Snapshot$0.000035/second36,000 seconds (10 hours)
Valid for 90 days
fun-asr-2025-08-25Snapshot$0.000035/second36,000 seconds (10 hours)
Valid for 90 days
fun-asr-mtl
Currently, fun-asr-mtl-2025-08-25
Stable$0.000035/second36,000 seconds (10 hours)
Valid for 90 days
fun-asr-mtl-2025-08-25Snapshot$0.000035/second36,000 seconds (10 hours)
Valid for 90 days
  • Supported languages:
    • fun-asr and fun-asr-2025-11-07: Mandarin, Cantonese, Wu, Minnan, Hakka, Gan, Xiang, and Jin. Also supports Mandarin accents from Zhongyuan, Southwest, Jilu, Jianghuai, Lanyin, Jiaoliao, Northeast, Beijing, and Hong Kong-Taiwan regions -- including Henan, Shaanxi, Hubei, Sichuan, Chongqing, Yunnan, Guizhou, Guangdong, Guangxi, Hebei, Tianjin, Shandong, Anhui, Nanjing, Jiangsu, Hangzhou, Gansu, and Ningxia. Also supports English and Japanese.
    • fun-asr-2025-08-25: Mandarin and English.
    • fun-asr-mtl and fun-asr-mtl-2025-08-25: Mandarin, Cantonese, English, Japanese, Korean, Vietnamese, Indonesian, Thai, Malay, Filipino, Arabic, Hindi, Bulgarian, Croatian, Czech, Danish, Dutch, Estonian, Finnish, Greek, Hungarian, Irish, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Slovak, Slovenian, and Swedish.
  • Supported sample rates: Any
  • Supported audio formats: aac, amr, avi, flac, flv, m4a, mkv, mov, mp3, mp4, mpeg, ogg, opus, wav, webm, wma, wmv

Limitations

Input format: Only publicly accessible file URLs (HTTP/HTTPS) are accepted. Local files and Base64-encoded audio are not supported. Example: https://your-domain.com/file.mp3 Set file URLs with the fileUrls parameter. Each request accepts up to 100 URLs.
  • Audio formats: aac, amr, avi, flac, flv, m4a, mkv, mov, mp3, mp4, mpeg, ogg, opus, wav, webm, wma, wmv
Many audio and video format variants exist. The API cannot guarantee correct recognition for all of them. Test your files to verify results.
  • Audio sample rate: Any
  • File size and duration: Up to 2 GB and 12 hours. For larger files, see preprocessing best practices.
  • Batch size: Up to 100 file URLs per request.
  • Supported languages: fun-asr supports Chinese and English. fun-asr-mtl-2025-08-25 supports Chinese, Cantonese, English, Japanese, Thai, Vietnamese, and Indonesian.

Getting started

The Transcription class provides async and sync methods for submitting tasks and retrieving results. Two approaches:
  • Submit a task and block until it completes.
  • Submit a task and poll for results.

Async submission + sync wait

Flowchart
1

Set request parameters

2

Create a Transcription instance

Create a Transcription instance.
3

Submit the task

Call asyncCall on the Transcription instance to submit the task.
Tasks enter PENDING state after submission. Queue time depends on queue length and file duration (usually a few minutes). Recognition runs at accelerated speed once processing starts.Results and download URLs expire after 24 hours.
4

Wait for the task to finish

Call wait to block until the task reaches SUCCEEDED or FAILED status.Returns TranscriptionResult.
import com.alibaba.dashscope.audio.asr.transcription.*;
import com.alibaba.dashscope.utils.Constants;
import com.google.gson.*;

import java.util.Arrays;

public class Main {
  public static void main(String[] args) {
    Constants.baseHttpApiUrl = "https://dashscope-intl.aliyuncs.com/api/v1";
    // Create transcription request parameters.
    TranscriptionParam param =
        TranscriptionParam.builder()
            // If you have not configured an environment variable, replace the following line with your API key: .apiKey("sk-xxx")
            //.apiKey("apikey")
            .model("fun-asr") // Here, fun-asr is used as an example. You can change the model name as needed.
            .fileUrls(
                Arrays.asList(
                    "https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_female2.wav",
                    "https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_male2.wav"))
            .build();
    try {
      Transcription transcription = new Transcription();
      // Submit the transcription request.
      TranscriptionResult result = transcription.asyncCall(param);
      System.out.println("RequestId: " + result.getRequestId());
      // Block and wait for the task to complete, then get the result.
      result = transcription.wait(
          TranscriptionQueryParam.FromTranscriptionParam(param, result.getTaskId()));
      // Print the result.
      System.out.println(new GsonBuilder().setPrettyPrinting().create().toJson(result.getOutput()));
    } catch (Exception e) {
      System.out.println("error: " + e);
    }
    System.exit(0);
  }
}

Async submission + async query

Flowchart
1

Set request parameters

2

Create a Transcription instance

Create a Transcription instance.
3

Submit the task

Call asyncCall on the Transcription instance to submit the task.
Tasks enter PENDING state after submission. Queue time depends on queue length and file duration (usually a few minutes). Recognition runs at accelerated speed once processing starts.Results and download URLs expire after 24 hours.
4

Poll for the result

Call fetch repeatedly until the task reaches SUCCEEDED or FAILED status.Returns TranscriptionResult.
import com.alibaba.dashscope.audio.asr.transcription.*;
import com.alibaba.dashscope.common.TaskStatus;
import com.alibaba.dashscope.utils.Constants;
import com.google.gson.*;

import java.util.Arrays;

public class Main {
  public static void main(String[] args) {
    Constants.baseHttpApiUrl = "https://dashscope-intl.aliyuncs.com/api/v1";
    // Create transcription request parameters.
    TranscriptionParam param =
        TranscriptionParam.builder()
            // If you have not configured an environment variable, replace the following line with your API key: .apiKey("sk-xxx")
            //.apiKey("apikey")
            .model("fun-asr") // Here, fun-asr is used as an example. You can change the model name as needed.
            .fileUrls(
                Arrays.asList(
                    "https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_female2.wav",
                    "https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_male2.wav"))
            .build();
    try {
      Transcription transcription = new Transcription();
      // Submit the transcription request.
      TranscriptionResult result = transcription.asyncCall(param);
      System.out.println("RequestId: " + result.getRequestId());
      // Poll for the task execution result until the task is complete.
      while (true) {
        result = transcription.fetch(TranscriptionQueryParam.FromTranscriptionParam(param, result.getTaskId()));
        if (result.getTaskStatus() == TaskStatus.SUCCEEDED || result.getTaskStatus() == TaskStatus.FAILED) {
          break;
        }
        Thread.sleep(1000);
      }
      // Print the result.
      System.out.println(new GsonBuilder().setPrettyPrinting().create().toJson(result.getOutput()));
    } catch (Exception e) {
      System.out.println("error: " + e);
    }
    System.exit(0);
  }
}

Request parameters

Set request parameters with TranscriptionParam builder methods.
TranscriptionParam param = TranscriptionParam.builder()
  .model("fun-asr")
  .fileUrls(
          Arrays.asList(
                  "https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_female2.wav",
                  "https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_male2.wav"))
  .build();
ParameterTypeDefaultRequiredDescription
modelString-YesThe model for transcription. See Model availability.
fileUrlsList<String>-YesURLs of audio or video files to transcribe. HTTP and HTTPS are supported. Up to 100 URLs per request.
vocabularyIdString-NoHotword vocabulary ID. Hotwords in this vocabulary apply during recognition. Disabled by default. See Customize hotwords.
channelIdList<Integer>[0]NoAudio track indexes to recognize (starting from 0). [0] recognizes the first track only; [0, 1] recognizes both.
Each track is billed separately.
specialWordFilterString-NoSensitive words to filter during recognition. See Sensitive word filter details.
diarizationEnabledBooleanfalseNoEnable speaker diarization (single-channel audio only). Results include speaker_id to distinguish speakers. See Recognition result.
speakerCountInteger-NoExpected speaker count (2-100). Only works when diarizationEnabled is true. Guides the algorithm but does not guarantee exact output count.
language_hintsString[]["zh", "en"]NoLanguage codes. Leave unset for auto-detection. See Supported languages.
apiKeyString-NoYour API key. Not needed if set as an environment variable.

Sensitive word filter details

If specialWordFilter is not set, built-in filtering applies (matches from the Qwen Cloud sensitive word list are replaced with *). When set, you can use these policies:
  • Replace with *: Replaces matched words with asterisks of the same length.
  • Filter out: Removes matched words from the result.
The value must be a JSON string:
{
  "filter_with_signed": {
    "word_list": ["test"]
  },
  "filter_with_empty": {
    "word_list": ["start", "happen"]
  },
  "system_reserved_filter": true
}
Field descriptions:
  • filter_with_signed
    • Type: object (optional)
    • Replaces matched words with * of the same length.
    • Example: "Help me test this piece of code" -> "Help me **** this piece of code"
    • Field: word_list (string array of words to replace)
  • filter_with_empty
    • Type: object (optional)
    • Removes matched words from results.
    • Example: "Is the game about to start?" -> "Is the game about to?"
    • Field: word_list (string array of words to remove)
  • system_reserved_filter
    • Type: boolean (default: true)
    • Enables preset sensitive word rules. When true, built-in filtering applies (Qwen Cloud word list matches replaced with *).

Supported languages

Supported language codes by model:
  • fun-asr, fun-asr-2025-11-07:
    • zh: Chinese
    • en: English
    • ja: Japanese
  • fun-asr-2025-08-25:
    • zh: Chinese
    • en: English
  • fun-asr-mtl, fun-asr-mtl-2025-08-25:
    • zh: Chinese
    • en: English
    • ja: Japanese
    • ko: Korean
    • vi: Vietnamese
    • id: Indonesian
    • th: Thai
    • ms: Malay
    • tl: Filipino
    • ar: Arabic
    • hi: Hindi
    • bg: Bulgarian
    • hr: Croatian
    • cs: Czech
    • da: Danish
    • nl: Dutch
    • et: Estonian
    • fi: Finnish
    • el: Greek
    • hu: Hungarian
    • ga: Irish
    • lv: Latvian
    • lt: Lithuanian
    • mt: Maltese
    • pl: Polish
    • pt: Portuguese
    • ro: Romanian
    • sk: Slovak
    • sl: Slovenian
    • sv: Swedish
Set language_hints with the parameter or parameters method of TranscriptionParam:
  • Set using parameter
  • Set using parameters
TranscriptionParam param = TranscriptionParam.builder()
  .model("fun-asr")
  .parameter("language_hints", new String[]{"zh", "en"})
  .build();

Response

Task result (TranscriptionResult)

TranscriptionResult holds the task result.
MethodParameterReturn valueDescription
public String getRequestId()NonerequestIdGets the request ID.
public String getTaskId()NonetaskIdGets the task ID.
public TaskStatus getTaskStatus()NoneTaskStatusGets the task status (PENDING, RUNNING, SUCCEEDED, or FAILED).
A task with multiple subtasks shows SUCCEEDED if at least one subtask succeeds. Check subtask_status for each subtask.
public List<TranscriptionTaskResult> getResults()NoneTranscriptionTaskResultGets subtask results. Each file creates one subtask.
public JsonObject getOutput()NoneJSONGets the result as JSON. See JSON output examples.

JSON output examples

Success example
{
  "task_id":"0795ff8c-b666-4e91-bb8b-xxx",
  "task_status":"SUCCEEDED",
  "submit_time":"2025-02-13 16:12:09.109",
  "scheduled_time":"2025-02-13 16:12:09.128",
  "end_time":"2025-02-13 16:12:10.189",
  "results":[
    {
      "file_url":"https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_male2.wav",
      "transcription_url":"https://dashscope-result-bj.oss-cn-beijing.aliyuncs.com/prod/paraformer-v2/20250213/16%3A12/34604a7b-579a-4223-8797-5116a49b07ec-1.json?Expires=1739520730&OSSAccessKeyId=yourOSSAccessKeyId&Signature=tMqyH56oB5rDW9%2FFqD8Yo%2F3WaPk%3D",
      "subtask_status":"SUCCEEDED"
    },
    {
      "file_url":"https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_female2.wav",
      "transcription_url":"https://dashscope-result-bj.oss-cn-beijing.aliyuncs.com/prod/paraformer-v2/20250213/16%3A12/3baafe5f-d09d-46c6-8b01-724927670edb-1.json?Expires=1739520730&OSSAccessKeyId=yourOSSAccessKeyId&Signature=BF7vPxlsJN9hkJlY%2BLReezxOwK8%3D",
      "subtask_status":"SUCCEEDED"
    }
  ],
  "task_metrics":{
    "TOTAL":2,
    "SUCCEEDED":2,
    "FAILED":0
  }
}
Error example code and message fields appear only on error.
{
  "task_id": "7bac899c-06ec-4a79-8875-xxxxxxxxxxxx",
  "task_status": "SUCCEEDED",
  "submit_time": "2024-12-16 16:30:59.170",
  "scheduled_time": "2024-12-16 16:30:59.204",
  "end_time": "2024-12-16 16:31:02.375",
  "results": [
    {
      "file_url": "https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/sensevoice/long_audio_demo_cn.mp3",
      "transcription_url": "https://dashscope-result-bj.oss-cn-beijing.aliyuncs.com/prod/paraformer-v2/20241216/xxxx",
      "subtask_status": "SUCCEEDED"
    },
    {
      "file_url": "https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/sensevoice/rich_text_exaple_1.wav",
      "code": "InvalidFile.DownloadFailed",
      "message": "The audio file cannot be downloaded.",
      "subtask_status": "FAILED"
    }
  ],
  "task_metrics": {
    "TOTAL": 2,
    "SUCCEEDED": 1,
    "FAILED": 1
  }
}

Subtask result (TranscriptionTaskResult)

TranscriptionTaskResult holds the result for a single file.
MethodParameterReturn valueDescription
public String getFileUrl()NoneFile URLGets the URL of the recognized file.
public String getTranscriptionUrl()NoneResult URLGets the result URL (valid for 24 hours). The result is a JSON file you can download or read via HTTP. See Recognition result.
public TaskStatus getSubTaskStatus()NoneTaskStatusGets the subtask status (PENDING, RUNNING, SUCCEEDED, or FAILED).
public String getMessage()NoneMessage (may be empty)Gets error details. Check this if a task fails.

Recognition result

The recognition result is a JSON file.
{
  "file_url":"https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_female2.wav",
  "properties":{
    "audio_format":"pcm_s16le",
    "channels":[
      0
    ],
    "original_sampling_rate":16000,
    "original_duration_in_milliseconds":3834
  },
  "transcripts":[
    {
      "channel_id":0,
      "content_duration_in_milliseconds":3720,
      "text":"Hello world, this is Alibaba Speech Lab.",
      "sentences":[
        {
          "begin_time":100,
          "end_time":3820,
          "text":"Hello world, this is Alibaba Speech Lab.",
          "sentence_id":1,
          "speaker_id":0,
          "words":[
            {
              "begin_time":100,
              "end_time":596,
              "text":"Hello ",
              "punctuation":""
            },
            {
              "begin_time":596,
              "end_time":844,
              "text":"world",
              "punctuation":", "
            }
          ]
        }
      ]
    }
  ]
}
Key fields:
ParameterTypeDescription
audio_formatstringAudio format of the source file.
channelsarray[integer]Audio track indexes. Returns [0] for single-track, [0, 1] for dual-track, and so on.
original_sampling_rateintegerSample rate (Hz).
original_duration_in_millisecondsintegerOriginal audio duration (ms).
channel_idintegerTranscribed audio track index (starting from 0).
content_durationintegerSpeech content duration (ms).
Billing is based on speech duration only. Non-speech content is not billed. AI-determined speech duration may differ from total audio duration.
transcriptstringParagraph-level transcription.
sentencesarraySentence-level transcription.
wordsarrayWord-level transcription.
begin_timeintegerStart timestamp (ms).
end_timeintegerEnd timestamp (ms).
textstringTranscribed text.
speaker_idintegerSpeaker index (starting from 0). Only present when diarization is enabled.
punctuationstringPredicted punctuation after the word, if any.

Key interfaces

Query parameter class (TranscriptionQueryParam)

Use TranscriptionQueryParam when calling wait or fetch on a Transcription instance. Create one with the static method FromTranscriptionParam.
// Create transcription request parameters.
TranscriptionParam param =
    TranscriptionParam.builder()
        // If you have not configured the API key as an environment variable, replace apiKey with your own API key.
        //.apiKey("apikey")
        .model("fun-asr")
        .fileUrls(
            Arrays.asList(
                "https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_female2.wav",
                "https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_male2.wav"))
        .build();
try {
  Transcription transcription = new Transcription();
  // Submit the transcription request.
  TranscriptionResult result = transcription.asyncCall(param);
  System.out.println("RequestId: " + result.getRequestId());
  TranscriptionQueryParam queryParam = TranscriptionQueryParam.FromTranscriptionParam(param, result.getTaskId());

} catch (Exception e) {
  System.out.println("error: " + e);
}
MethodParameterReturn valueDescription
public static TranscriptionQueryParam FromTranscriptionParam(TranscriptionParam param, String taskId)param: A TranscriptionParam instance, taskId: The task IDA TranscriptionQueryParam instanceCreates a TranscriptionQueryParam instance.

Core class (Transcription)

Import with import com.alibaba.dashscope.audio.asr.transcription.*;. Key methods:
MethodParameterReturn valueDescription
public TranscriptionResult asyncCall(TranscriptionParam param)param: A TranscriptionParam instanceTranscriptionResultSubmits a transcription task asynchronously.
public TranscriptionResult wait(TranscriptionQueryParam queryParam)queryParam: A TranscriptionQueryParam instanceTranscriptionResultBlocks until the task reaches SUCCEEDED or FAILED.
public TranscriptionResult fetch(TranscriptionQueryParam queryParam)queryParam: A TranscriptionQueryParam instanceTranscriptionResultQueries the current task result.