LiveTranslate Java SDK
Prerequisites
1
Install the SDK
Install the DashScope SDK, version 2.22.5 or later.
2
Get an API key
3
Set the API key
- Linux / macOS
- Windows
4
Review the model overview
Review supported languages and voices.
Getting started
Connect, stream audio, and receive translations:
Configuration overview
Three builder objects control a translation session:
OmniRealtimeParam to the constructor. After connecting, call updateSession() with OmniRealtimeConfig to set audio and translation options. Defaults apply if you skip updateSession().
Request parameters
OmniRealtimeParam
Build connection parameters with OmniRealtimeParam.builder().
Sample code
Sample code
| Parameter | Type | Required | Description |
|---|---|---|---|
model | String | Yes | Model name. Use qwen3-livetranslate-flash-realtime. |
url | String | Yes | WebSocket endpoint. Use wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime. |
apikey | String | No | API key. Defaults to the DASHSCOPE_API_KEY environment variable. |
OmniRealtimeConfig
Build session parameters with OmniRealtimeConfig.builder(), then call conversation.updateSession(config).
Sample code
Sample code
| Parameter | Type | Required | Description |
|---|---|---|---|
modalities | List<OmniRealtimeModality> | No | Output modalities. Default: [AUDIO, TEXT]. Set [TEXT] for text only. |
voice | String | No | Voice for synthesized speech. Default: Cherry. See supported voices. |
inputAudioFormat | OmniRealtimeAudioFormat | No | Input audio format. Default: PCM_16000HZ_MONO_16BIT. |
outputAudioFormat | OmniRealtimeAudioFormat | No | Output audio format. Default: PCM_24000HZ_MONO_16BIT. |
InputAudioTranscription | String | No | ASR model for transcribing input speech. Set to qwen3-asr-flash-realtime to receive source-language transcription with translation. |
translationConfig | OmniRealtimeTranslationParam | No | Translation settings. See OmniRealtimeTranslationParam below. |
OmniRealtimeTranslationParam
Build translation parameters with OmniRealtimeTranslationParam.builder().
Sample code
Sample code
| Parameter | Type | Required | Description |
|---|---|---|---|
language | String | No | Target language code. Default: en. See supported languages. |
corpus | Corpus | No | Custom terminology for domain-specific terms. |
corpus.phrases | Map<String, Object> | No | Term mappings. Keys: source terms; values: target translations. Example: {"Inteligencia Artificial": "Artificial Intelligence"} |
Key interfaces
OmniRealtimeConversation
Manages the WebSocket connection and audio streaming.
Import: com.alibaba.dashscope.audio.omni.OmniRealtimeConversation
| Method | Description |
|---|---|
OmniRealtimeConversation(OmniRealtimeParam param, OmniRealtimeCallback callback) | Creates a conversation with connection parameters and an event callback. |
void connect() | Opens the WebSocket connection. Triggers session.created and session.updated. Throws NoApiKeyException, InterruptedException. |
void updateSession(OmniRealtimeConfig config) | Updates session configuration. Triggers session.updated. Omitted parameters use defaults. |
void appendAudio(String audioBase64) | Sends a Base64-encoded audio chunk. The server detects speech boundaries and triggers translation automatically. |
void endSession() | Ends the session. The server finishes in-progress translations before sending session.finished. Throws InterruptedException. |
void close(int code, String reason) | Stops the task and closes the WebSocket connection. |
String getSessionId() | Returns the session ID. |
String getResponseId() | Returns the response ID of the latest server response. |
long getFirstTextDelay() | Returns the first text delay of the latest response in milliseconds. |
long getFirstAudioDelay() | Returns the first audio delay of the latest response in milliseconds. |
OmniRealtimeCallback
Handles server events over WebSocket. Extend this class and implement each method.
Import: com.alibaba.dashscope.audio.omni.OmniRealtimeCallback
| Method | Parameters | Description |
|---|---|---|
void onOpen() | None | Called when the WebSocket connection opens. |
abstract void onEvent(JsonObject message) | message: A JSON object containing a server event. | Called for each server event. Parse the type field to identify the event. |
abstract void onClose(int code, String reason) | code: WebSocket status code. reason: Closure description. | Called when the WebSocket closes. |
onEvent:
| Event type | Description |
|---|---|
input_audio_buffer.speech_started | Speech detected in the audio stream. |
input_audio_buffer.speech_stopped | End of a speech segment detected. |
conversation.item.input_audio_transcription.completed | Source-language transcription ready. Read message.get("transcript"). Requires InputAudioTranscription. |
response.audio_transcript.done | Translated text ready. Read message.get("transcript"). |
response.audio.delta | Translated audio chunk available. Read message.get("delta") for Base64-encoded audio. |
error | An error occurred. Read message.get("error").getAsJsonObject().get("message") for details. |
Complete example
This example captures microphone audio, translates it in real time, and plays the translated speech through the speaker.
What it does:
- Connects to Qwen-LiveTranslate over WebSocket.
- Sets up Spanish-to-English translation with custom terminology.
- Streams microphone audio in 100 ms chunks.
- Prints the original transcription and translation.
- Plays translated audio.
Real-time microphone translation
Real-time microphone translation
| Placeholder | Description | Example |
|---|---|---|
YOUR_API_KEY | Your API key | YOUR_API_KEY |
Learn more
- Qwen-LiveTranslate model overview -- Supported languages, voices, and capabilities
- Server events reference -- Event types, JSON schemas, and error codes
- Install the DashScope SDK -- Installation and dependency setup
- Get an API key -- API key creation and management