Skip to main content
Realtime

LiveTranslate Java SDK

LiveTranslate Java SDK

Prerequisites

1

Install the SDK

Install the DashScope SDK, version 2.22.5 or later.
2

Get an API key

3

Set the API key

  • Linux / macOS
  • Windows
export DASHSCOPE_API_KEY=YOUR_API_KEY
source ~/.bashrc
4

Review the model overview

Getting started

Connect, stream audio, and receive translations:
import com.alibaba.dashscope.audio.omni.*;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.google.gson.JsonObject;
import java.util.Arrays;

// 1. Build connection parameters
OmniRealtimeParam param = OmniRealtimeParam.builder()
    .model("qwen3-livetranslate-flash-realtime")
    .url("wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime")
    .apikey(System.getenv("DASHSCOPE_API_KEY"))
    .build();

// 2. Define a callback to handle server events
OmniRealtimeCallback callback = new OmniRealtimeCallback() {
  @Override public void onOpen() { System.out.println("Connected"); }

  @Override
  public void onEvent(JsonObject message) {
    String type = message.get("type").getAsString();
    if ("response.audio_transcript.done".equals(type)) {
      System.out.println("Translation: " + message.get("transcript").getAsString());
    }
  }

  @Override
  public void onClose(int code, String reason) {
    System.out.println("Closed: " + code + " " + reason);
  }
};

// 3. Open a session and start streaming audio
OmniRealtimeConversation conversation = new OmniRealtimeConversation(param, callback);
conversation.connect();

// 4. Configure translation target language
OmniRealtimeConfig config = OmniRealtimeConfig.builder()
    .modalities(Arrays.asList(OmniRealtimeModality.AUDIO, OmniRealtimeModality.TEXT))
    .translationConfig(OmniRealtimeTranslationParam.builder()
        .language("en")
        .build())
    .build();
conversation.updateSession(config);

// 5. Send Base64-encoded audio chunks (PCM 16 kHz, 16-bit, mono)
conversation.appendAudio(audioBase64);

// 6. End the session when done
conversation.endSession();

Configuration overview

Three builder objects control a translation session:
OmniRealtimeParam          --> Connection: model, endpoint, API key
  +-- OmniRealtimeConfig   --> Session: audio formats, voice, modalities
       +-- OmniRealtimeTranslationParam  --> Translation: target language, custom terminology
Pass OmniRealtimeParam to the constructor. After connecting, call updateSession() with OmniRealtimeConfig to set audio and translation options. Defaults apply if you skip updateSession().

Request parameters

OmniRealtimeParam

Build connection parameters with OmniRealtimeParam.builder().
OmniRealtimeParam param = OmniRealtimeParam.builder()
  .model("qwen3-livetranslate-flash-realtime")
  .url("wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime")
  // If no environment variable is set, replace the next line with your API key: .apikey("YOUR_API_KEY")
  .apikey(System.getenv("DASHSCOPE_API_KEY"))
  .build();
ParameterTypeRequiredDescription
modelStringYesModel name. Use qwen3-livetranslate-flash-realtime.
urlStringYesWebSocket endpoint. Use wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime.
apikeyStringNoAPI key. Defaults to the DASHSCOPE_API_KEY environment variable.

OmniRealtimeConfig

Build session parameters with OmniRealtimeConfig.builder(), then call conversation.updateSession(config).
// Set custom translation phrases
Map<String, Object> phrases = new HashMap<>();
phrases.put("Inteligencia Artificial", "Artificial Intelligence");
phrases.put("Aprendizaje Automático", "Machine Learning");

OmniRealtimeConfig config = OmniRealtimeConfig.builder()
  .modalities(Arrays.asList(OmniRealtimeModality.AUDIO, OmniRealtimeModality.TEXT))
  .voice("Cherry")
  .inputAudioFormat(OmniRealtimeAudioFormat.PCM_16000HZ_MONO_16BIT)
  .outputAudioFormat(OmniRealtimeAudioFormat.PCM_24000HZ_MONO_16BIT)
  .InputAudioTranscription("qwen3-asr-flash-realtime")
  .translationConfig(OmniRealtimeTranslationParam.builder()
    .language("en")
    .corpus(OmniRealtimeTranslationParam.Corpus.builder()
      .phrases(phrases)
      .build())
    .build())
  .build();

conversation.updateSession(config);
ParameterTypeRequiredDescription
modalitiesList<OmniRealtimeModality>NoOutput modalities. Default: [AUDIO, TEXT]. Set [TEXT] for text only.
voiceStringNoVoice for synthesized speech. Default: Cherry. See supported voices.
inputAudioFormatOmniRealtimeAudioFormatNoInput audio format. Default: PCM_16000HZ_MONO_16BIT.
outputAudioFormatOmniRealtimeAudioFormatNoOutput audio format. Default: PCM_24000HZ_MONO_16BIT.
InputAudioTranscriptionStringNoASR model for transcribing input speech. Set to qwen3-asr-flash-realtime to receive source-language transcription with translation.
translationConfigOmniRealtimeTranslationParamNoTranslation settings. See OmniRealtimeTranslationParam below.

OmniRealtimeTranslationParam

Build translation parameters with OmniRealtimeTranslationParam.builder().
// Set translation phrases
Map<String, Object> phrases = new HashMap<>();
phrases.put("Inteligencia Artificial", "Artificial Intelligence");  // Source language word: Target language translation
phrases.put("Aprendizaje Automático", "Machine Learning");

OmniRealtimeTranslationParam translationParam = OmniRealtimeTranslationParam.builder()
  .language("en")  // Target language code
  .corpus(OmniRealtimeTranslationParam.Corpus.builder()
    .phrases(phrases)
    .build())
  .build();
ParameterTypeRequiredDescription
languageStringNoTarget language code. Default: en. See supported languages.
corpusCorpusNoCustom terminology for domain-specific terms.
corpus.phrasesMap<String, Object>NoTerm mappings. Keys: source terms; values: target translations. Example: {"Inteligencia Artificial": "Artificial Intelligence"}

Key interfaces

OmniRealtimeConversation

Manages the WebSocket connection and audio streaming. Import: com.alibaba.dashscope.audio.omni.OmniRealtimeConversation
MethodDescription
OmniRealtimeConversation(OmniRealtimeParam param, OmniRealtimeCallback callback)Creates a conversation with connection parameters and an event callback.
void connect()Opens the WebSocket connection. Triggers session.created and session.updated. Throws NoApiKeyException, InterruptedException.
void updateSession(OmniRealtimeConfig config)Updates session configuration. Triggers session.updated. Omitted parameters use defaults.
void appendAudio(String audioBase64)Sends a Base64-encoded audio chunk. The server detects speech boundaries and triggers translation automatically.
void endSession()Ends the session. The server finishes in-progress translations before sending session.finished. Throws InterruptedException.
void close(int code, String reason)Stops the task and closes the WebSocket connection.
String getSessionId()Returns the session ID.
String getResponseId()Returns the response ID of the latest server response.
long getFirstTextDelay()Returns the first text delay of the latest response in milliseconds.
long getFirstAudioDelay()Returns the first audio delay of the latest response in milliseconds.

OmniRealtimeCallback

Handles server events over WebSocket. Extend this class and implement each method. Import: com.alibaba.dashscope.audio.omni.OmniRealtimeCallback
MethodParametersDescription
void onOpen()NoneCalled when the WebSocket connection opens.
abstract void onEvent(JsonObject message)message: A JSON object containing a server event.Called for each server event. Parse the type field to identify the event.
abstract void onClose(int code, String reason)code: WebSocket status code. reason: Closure description.Called when the WebSocket closes.
Common event types in onEvent:
Event typeDescription
input_audio_buffer.speech_startedSpeech detected in the audio stream.
input_audio_buffer.speech_stoppedEnd of a speech segment detected.
conversation.item.input_audio_transcription.completedSource-language transcription ready. Read message.get("transcript"). Requires InputAudioTranscription.
response.audio_transcript.doneTranslated text ready. Read message.get("transcript").
response.audio.deltaTranslated audio chunk available. Read message.get("delta") for Base64-encoded audio.
errorAn error occurred. Read message.get("error").getAsJsonObject().get("message") for details.

Complete example

This example captures microphone audio, translates it in real time, and plays the translated speech through the speaker. What it does:
  1. Connects to Qwen-LiveTranslate over WebSocket.
  2. Sets up Spanish-to-English translation with custom terminology.
  3. Streams microphone audio in 100 ms chunks.
  4. Prints the original transcription and translation.
  5. Plays translated audio.
import com.alibaba.dashscope.audio.omni.*;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.google.gson.JsonObject;

import javax.sound.sampled.*;
import java.util.*;
import java.util.concurrent.atomic.AtomicBoolean;

/**
 * Example of using a microphone with the real-time audio and video translation model.
 */
public class Main {
    private static final int INPUT_CHUNK_SIZE = 3200;   // 100 ms of 16 kHz, 16-bit, mono audio
    private static final int OUTPUT_CHUNK_SIZE = 4800;  // 100 ms of 24 kHz, 16-bit, mono audio
    private static final AtomicBoolean running = new AtomicBoolean(true);
    private static SourceDataLine speaker;  // Speaker

    public static void main(String[] args) throws InterruptedException {
        String apiKey = System.getenv("DASHSCOPE_API_KEY");
        if (apiKey == null || apiKey.isEmpty()) {
            System.err.println("Set the DASHSCOPE_API_KEY environment variable.");
            System.exit(1);
        }

        // Create connection parameters.
        OmniRealtimeParam param = OmniRealtimeParam.builder()
                .model("qwen3-livetranslate-flash-realtime")
                .url("wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime")
                .apikey(apiKey)
                .build();

        // Create a callback handler.
        OmniRealtimeCallback callback = new OmniRealtimeCallback() {
            @Override
            public void onOpen() {
                System.out.println("[Connection established]");
            }

            @Override
            public void onEvent(JsonObject message) {
                String type = message.get("type").getAsString();
                switch (type) {
                    case "input_audio_buffer.speech_started":
                        System.out.println("====== Speech input detected ======");
                        break;
                    case "input_audio_buffer.speech_stopped":
                        System.out.println("====== Speech input ended ======");
                        break;
                    case "conversation.item.input_audio_transcription.completed":
                        String originalText = message.get("transcript").getAsString();
                        System.out.println("[Original text] " + originalText);
                        break;
                    case "response.audio_transcript.done":
                        String translatedText = message.get("transcript").getAsString();
                        System.out.println("[Translation result] " + translatedText);
                        break;
                    case "response.audio.delta":
                        // Decode and play the translated audio.
                        String audioB64 = message.get("delta").getAsString();
                        byte[] audioBytes = Base64.getDecoder().decode(audioB64);
                        if (speaker != null) {
                            speaker.write(audioBytes, 0, audioBytes.length);
                        }
                        break;
                    case "error":
                        JsonObject error = message.get("error").getAsJsonObject();
                        System.err.println("[Error] " + error.get("message").getAsString());
                        break;
                }
            }

            @Override
            public void onClose(int code, String reason) {
                System.out.println("[Connection closed] code: " + code + ", reason: " + reason);
            }
        };

        // Create a session.
        OmniRealtimeConversation conversation = new OmniRealtimeConversation(param, callback);

        try {
            // Initialize the speaker (for playing the translated speech).
            AudioFormat speakerFormat = new AudioFormat(24000, 16, 1, true, false);
            DataLine.Info speakerInfo = new DataLine.Info(SourceDataLine.class, speakerFormat);
            speaker = (SourceDataLine) AudioSystem.getLine(speakerInfo);
            speaker.open(speakerFormat, OUTPUT_CHUNK_SIZE * 4);
            speaker.start();

            // Initialize the microphone (for capturing speech input).
            AudioFormat micFormat = new AudioFormat(16000, 16, 1, true, false);
            DataLine.Info micInfo = new DataLine.Info(TargetDataLine.class, micFormat);
            if (!AudioSystem.isLineSupported(micInfo)) {
                System.err.println("Microphone is not available.");
                System.exit(1);
            }
            TargetDataLine microphone = (TargetDataLine) AudioSystem.getLine(micInfo);
            microphone.open(micFormat);
            microphone.start();

            // Connect to the server.
            conversation.connect();

            // Configure translation parameters.
            Map<String, Object> phrases = new HashMap<>();
            phrases.put("Inteligencia Artificial", "Artificial Intelligence");
            phrases.put("Aprendizaje Automático", "Machine Learning");

            OmniRealtimeConfig config = OmniRealtimeConfig.builder()
                    .modalities(Arrays.asList(OmniRealtimeModality.AUDIO, OmniRealtimeModality.TEXT))
                    .voice("Cherry")
                    .inputAudioFormat(OmniRealtimeAudioFormat.PCM_16000HZ_MONO_16BIT)
                    .outputAudioFormat(OmniRealtimeAudioFormat.PCM_24000HZ_MONO_16BIT)
                    .InputAudioTranscription("qwen3-asr-flash-realtime")
                    .translationConfig(OmniRealtimeTranslationParam.builder()
                            .language("en")
                            .corpus(OmniRealtimeTranslationParam.Corpus.builder()
                                    .phrases(phrases)
                                    .build())
                            .build())
                    .build();

            conversation.updateSession(config);

            // Register a shutdown hook.
            Runtime.getRuntime().addShutdownHook(new Thread(() -> {
                System.out.println("\n[Exiting...]");
                running.set(false);
                microphone.stop();
                microphone.close();
                speaker.stop();
                speaker.close();
                conversation.close(1000, "User stopped");
            }));

            System.out.println("[Starting real-time translation] Speak into the microphone. Press Ctrl+C to exit.");

            // Continuously capture and send microphone audio.
            byte[] buffer = new byte[INPUT_CHUNK_SIZE];
            while (running.get()) {
                int bytesRead = microphone.read(buffer, 0, buffer.length);
                if (bytesRead > 0) {
                    conversation.appendAudio(Base64.getEncoder().encodeToString(buffer));
                }
            }

        } catch (NoApiKeyException e) {
            System.err.println("API Key error: " + e.getMessage());
        } catch (Exception e) {
            System.err.println("An exception occurred: " + e.getMessage());
            e.printStackTrace();
        }
    }
}
Replace the placeholder with your value:
PlaceholderDescriptionExample
YOUR_API_KEYYour API keyYOUR_API_KEY

Learn more