Skip to main content
Realtime

LiveTranslate Java SDK

LiveTranslate Java SDK

User guide: For tutorials and complete examples, see Real-time translation.

Configuration overview

Three builder objects control a translation session:
OmniRealtimeParam          --> Connection: model, endpoint, API key
  +-- OmniRealtimeConfig   --> Session: audio formats, voice, modalities
       +-- OmniRealtimeTranslationParam  --> Translation: target language, custom terminology
Pass OmniRealtimeParam to the constructor. After connecting, call updateSession() with OmniRealtimeConfig to set audio and translation options. Defaults apply if you skip updateSession().

Request parameters

OmniRealtimeParam

Build connection parameters with OmniRealtimeParam.builder().
OmniRealtimeParam param = OmniRealtimeParam.builder()
  .model("qwen3-livetranslate-flash-realtime")
  .url("wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime")
  // If no environment variable is set, replace the next line with your API key: .apikey("YOUR_API_KEY")
  .apikey(System.getenv("DASHSCOPE_API_KEY"))
  .build();
ParameterTypeRequiredDescription
modelStringYesModel name. Use qwen3-livetranslate-flash-realtime.
urlStringYesWebSocket endpoint. Use wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime.
apikeyStringNoAPI key. Defaults to the DASHSCOPE_API_KEY environment variable.

OmniRealtimeConfig

Build session parameters with OmniRealtimeConfig.builder(), then call conversation.updateSession(config).
// Set custom translation phrases
Map<String, Object> phrases = new HashMap<>();
phrases.put("Inteligencia Artificial", "Artificial Intelligence");
phrases.put("Aprendizaje Automático", "Machine Learning");

OmniRealtimeConfig config = OmniRealtimeConfig.builder()
  .modalities(Arrays.asList(OmniRealtimeModality.AUDIO, OmniRealtimeModality.TEXT))
  .voice("Cherry")
  .inputAudioFormat(OmniRealtimeAudioFormat.PCM_16000HZ_MONO_16BIT)
  .outputAudioFormat(OmniRealtimeAudioFormat.PCM_24000HZ_MONO_16BIT)
  .InputAudioTranscription("qwen3-asr-flash-realtime")
  .translationConfig(OmniRealtimeTranslationParam.builder()
    .language("en")
    .corpus(OmniRealtimeTranslationParam.Corpus.builder()
      .phrases(phrases)
      .build())
    .build())
  .build();

conversation.updateSession(config);
ParameterTypeRequiredDescription
modalitiesList<OmniRealtimeModality>NoOutput modalities. Default: [AUDIO, TEXT]. Set [TEXT] for text only.
voiceStringNoVoice for synthesized speech. Default: Cherry. See supported voices.
inputAudioFormatOmniRealtimeAudioFormatNoInput audio format. Default: PCM_16000HZ_MONO_16BIT.
outputAudioFormatOmniRealtimeAudioFormatNoOutput audio format. Default: PCM_24000HZ_MONO_16BIT.
InputAudioTranscriptionStringNoASR model for transcribing input speech. Set to qwen3-asr-flash-realtime to receive source-language transcription with translation.
translationConfigOmniRealtimeTranslationParamNoTranslation settings. See OmniRealtimeTranslationParam below.

OmniRealtimeTranslationParam

Build translation parameters with OmniRealtimeTranslationParam.builder().
// Set translation phrases
Map<String, Object> phrases = new HashMap<>();
phrases.put("Inteligencia Artificial", "Artificial Intelligence");  // Source language word: Target language translation
phrases.put("Aprendizaje Automático", "Machine Learning");

OmniRealtimeTranslationParam translationParam = OmniRealtimeTranslationParam.builder()
  .language("en")  // Target language code
  .corpus(OmniRealtimeTranslationParam.Corpus.builder()
    .phrases(phrases)
    .build())
  .build();
ParameterTypeRequiredDescription
languageStringNoTarget language code. Default: en. See supported languages.
corpusCorpusNoCustom terminology for domain-specific terms.
corpus.phrasesMap<String, Object>NoTerm mappings. Keys: source terms; values: target translations. Example: {"Inteligencia Artificial": "Artificial Intelligence"}

Key interfaces

OmniRealtimeConversation

Manages the WebSocket connection and audio streaming. Import: com.alibaba.dashscope.audio.omni.OmniRealtimeConversation
MethodDescription
OmniRealtimeConversation(OmniRealtimeParam param, OmniRealtimeCallback callback)Creates a conversation with connection parameters and an event callback.
void connect()Opens the WebSocket connection. Triggers session.created and session.updated. Throws NoApiKeyException, InterruptedException.
void updateSession(OmniRealtimeConfig config)Updates session configuration. Triggers session.updated. Omitted parameters use defaults.
void appendAudio(String audioBase64)Sends a Base64-encoded audio chunk. The server detects speech boundaries and triggers translation automatically.
void endSession()Ends the session. The server finishes in-progress translations before sending session.finished. Throws InterruptedException.
void close(int code, String reason)Stops the task and closes the WebSocket connection.
String getSessionId()Returns the session ID.
String getResponseId()Returns the response ID of the latest server response.
long getFirstTextDelay()Returns the first text delay of the latest response in milliseconds.
long getFirstAudioDelay()Returns the first audio delay of the latest response in milliseconds.

OmniRealtimeCallback

Handles server events over WebSocket. Extend this class and implement each method. Import: com.alibaba.dashscope.audio.omni.OmniRealtimeCallback
MethodParametersDescription
void onOpen()NoneCalled when the WebSocket connection opens.
abstract void onEvent(JsonObject message)message: A JSON object containing a server event.Called for each server event. Parse the type field to identify the event.
abstract void onClose(int code, String reason)code: WebSocket status code. reason: Closure description.Called when the WebSocket closes.
Common event types in onEvent:
Event typeDescription
input_audio_buffer.speech_startedSpeech detected in the audio stream.
input_audio_buffer.speech_stoppedEnd of a speech segment detected.
conversation.item.input_audio_transcription.completedSource-language transcription ready. Read message.get("transcript"). Requires InputAudioTranscription.
response.audio_transcript.doneTranslated text ready. Read message.get("transcript").
response.audio.deltaTranslated audio chunk available. Read message.get("delta") for Base64-encoded audio.
errorAn error occurred. Read message.get("error").getAsJsonObject().get("message") for details.

References