Skip to main content
Realtime

Qwen-TTS realtime Java SDK

Real-time TTS Java SDK

For models and voice options, see Realtime streaming TTS.

Getting started

Both interaction modes follow the same pattern: build QwenTtsRealtimeParam, create a QwenTtsRealtime instance with a callback, connect, configure the session, send text, and close. For complete examples and tutorials, see Real-time text-to-speech.
  • Server commit mode (server_commit): The server decides when to synthesize buffered text. Best for most use cases.
  • Commit mode (commit): The client calls commit() to trigger synthesis. Lowest latency, but you must manage sentence boundaries.

Request parameters

QwenTtsRealtimeParam

Build a QwenTtsRealtimeParam and pass it to the QwenTtsRealtime constructor.
ParameterTypeRequiredDescription
modelStringYesModel name. See Supported models.
urlStringYesWebSocket endpoint: wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime.

QwenTtsRealtimeConfig

Build a QwenTtsRealtimeConfig and pass it to updateSession().
ParameterTypeRequiredDescription
voiceStringYesVoice for synthesis. System voices: available for Qwen3-TTS-Instruct-Flash-Realtime, Qwen3-TTS-Flash-Realtime, and Qwen-TTS-Realtime. See Supported voices for samples. Custom voices: created through Voice cloning (Qwen) (Qwen3-TTS-VC-Realtime only) or Voice design (Qwen) (Qwen3-TTS-VD-Realtime only).
languageTypeStringNoLanguage of the output audio. Default: Auto (the model detects language per segment; accuracy is not guaranteed). Set a specific language to improve quality for single-language text. Valid values: Chinese, English, German, Italian, Portuguese, Spanish, Japanese, Korean, French, Russian.
modeStringNoInteraction mode. server_commit (default): the server decides when to synthesize buffered text, balancing latency and quality. commit: the client triggers synthesis for the lowest latency; requires manual sentence boundary management.
formatStringNoAudio output format. Valid values: pcm (default), wav, mp3, opus. Qwen-TTS-Realtime supports only pcm. See Supported models.
sampleRateintNoSample rate in Hz. Valid values: 8000, 16000, 24000 (default), 48000. Qwen-TTS-Realtime supports only 24000. See Supported models.
speechRatefloatNoSpeech rate multiplier. Range: [0.5, 2.0]. Default: 1.0. Not supported by Qwen-TTS-Realtime.
volumeintNoAudio volume. Range: [0, 100]. Default: 50. Not supported by Qwen-TTS-Realtime.
pitchRatefloatNoPitch multiplier. Range: [0.5, 2.0]. Default: 1.0. Not supported by Qwen-TTS-Realtime.
bitRateintNoAudio bitrate in kbps. Range: [6, 510]. Default: 128. Only for opus format. Higher values produce better quality at larger file sizes. Not supported by Qwen-TTS-Realtime.
instructionsStringNoInstruction text for controlling speech style. Max 1,600 tokens. Chinese and English only. Only for Qwen3-TTS-Instruct-Flash-Realtime. See Instruction control.
optimizeInstructionsbooleanNoOptimizes instructions for more natural synthesis. Default: false. No effect if instructions is not set. Only for Qwen3-TTS-Instruct-Flash-Realtime.

API reference

QwenTtsRealtime class

import com.alibaba.dashscope.audio.qwen_tts_realtime.QwenTtsRealtime;
MethodSignatureServer eventsDescription
connectpublic void connect() throws NoApiKeyException, InterruptedExceptionsession.created, session.updatedOpens a WebSocket connection.
updateSessionpublic void updateSession(QwenTtsRealtimeConfig config)session.updatedUpdates the session configuration. Call immediately after connect() to override defaults. The server validates parameters and returns an error if invalid.
appendTextpublic void appendText(String text)NoneAppends text to the server-side input buffer. In server_commit mode, the server decides when to synthesize. In commit mode, call commit() to trigger synthesis.
clearAppendedTextpublic void clearAppendedText()input_text_buffer.clearedClears all text in the server-side input buffer.
commitpublic void commit()input_text_buffer.committed, response.output_item.added, response.content_part.added, response.audio.delta, response.audio.done, response.content_part.done, response.output_item.done, response.doneCommits buffered text and starts synthesis. Returns an error if the buffer is empty. In server_commit mode, the server commits automatically. In commit mode, call this to trigger synthesis.
finishpublic void finish()session.finishedStops the current task.
closepublic void close()NoneCloses the WebSocket connection.
getSessionIdpublic String getSessionId()NoneReturns the current session ID.
getResponseIdpublic String getResponseId()NoneReturns the most recent response ID.
getFirstAudioDelaypublic long getFirstAudioDelay()NoneReturns the first audio packet latency in milliseconds.

QwenTtsRealtimeCallback interface

import com.alibaba.dashscope.audio.qwen_tts_realtime.QwenTtsRealtimeCallback;
MethodParametersDescription
onOpen()NoneCalled when the WebSocket connection opens.
onEvent(JsonObject message)message: The server event payload.Called when a server event arrives. For event types, see Server-side events.
onClose(int code, String reason)code: WebSocket close status code. reason: Close reason.Called when the server closes the connection.

Next steps