Skip to main content
Realtime

Qwen-TTS realtime Java SDK

Real-time TTS Java SDK

For models and voice options, see Realtime streaming TTS.

Prerequisites

DashScope Java SDK 2.22.7 or later.

How it works

Both examples follow this pattern:
  1. Build a QwenTtsRealtimeParam with the model name, WebSocket endpoint, and API key.
  2. Create a QwenTtsRealtime instance with a callback for server events (session.created, response.audio.delta, response.done).
  3. Call connect() to open the WebSocket.
  4. Call updateSession() to set voice, audio format, and interaction mode.
  5. Send text with appendText() and receive Base64-encoded audio chunks through response.audio.delta.
  6. Call finish() and close() to end the session.
The two interaction modes differ in step 5:
  • Server commit mode (server_commit): The server decides when to synthesize buffered text. Best for most use cases.
  • Commit mode (commit): The client calls commit() to trigger synthesis. Lowest latency, but you must manage sentence boundaries.

Getting started

  • Server commit mode
  • Commit mode
The server synthesizes text as it accumulates. Append text with appendText() and let the server handle timing.
import com.alibaba.dashscope.audio.qwen_tts_realtime.*;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.google.gson.JsonObject;
import javax.sound.sampled.LineUnavailableException;
import javax.sound.sampled.SourceDataLine;
import javax.sound.sampled.AudioFormat;
import javax.sound.sampled.DataLine;
import javax.sound.sampled.AudioSystem;
import java.io.*;
import java.util.Base64;
import java.util.Queue;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.atomic.AtomicReference;
import java.util.concurrent.ConcurrentLinkedQueue;
import java.util.concurrent.atomic.AtomicBoolean;

public class Main {
  static String[] textToSynthesize = {
      "Right? I really love this kind of supermarket.",
      "Especially during the Chinese New Year.",
      "Going to the supermarket.",
      "It just makes me feel.",
      "Super, super happy!",
      "I want to buy so many things!"
  };
  public static QwenTtsRealtimeAudioFormat ttsFormat = QwenTtsRealtimeAudioFormat.PCM_24000HZ_MONO_16BIT;

  // Real-time PCM audio player
  public static class RealtimePcmPlayer {
    private int sampleRate;
    private SourceDataLine line;
    private AudioFormat audioFormat;
    private Thread decoderThread;
    private Thread playerThread;
    private AtomicBoolean stopped = new AtomicBoolean(false);
    private Queue<String> b64AudioBuffer = new ConcurrentLinkedQueue<>();
    private Queue<byte[]> RawAudioBuffer = new ConcurrentLinkedQueue<>();
    private ByteArrayOutputStream totalAudioStream = new ByteArrayOutputStream();

    // Initialize the audio format and audio line.
    public RealtimePcmPlayer(int sampleRate) throws LineUnavailableException {
      this.sampleRate = sampleRate;
      this.audioFormat = new AudioFormat(this.sampleRate, 16, 1, true, false);
      DataLine.Info info = new DataLine.Info(SourceDataLine.class, audioFormat);
      line = (SourceDataLine) AudioSystem.getLine(info);
      line.open(audioFormat);
      line.start();
      decoderThread = new Thread(new Runnable() {
        @Override
        public void run() {
          while (!stopped.get()) {
            String b64Audio = b64AudioBuffer.poll();
            if (b64Audio != null) {
              byte[] rawAudio = Base64.getDecoder().decode(b64Audio);
              RawAudioBuffer.add(rawAudio);
              // Write audio data to totalAudioStream.
              try {
                totalAudioStream.write(rawAudio);
              } catch (IOException e) {
                throw new RuntimeException(e);
              }
            } else {
              try {
                Thread.sleep(100);
              } catch (InterruptedException e) {
                throw new RuntimeException(e);
              }
            }
          }
        }
      });
      playerThread = new Thread(new Runnable() {
        @Override
        public void run() {
          while (!stopped.get()) {
            byte[] rawAudio = RawAudioBuffer.poll();
            if (rawAudio != null) {
              try {
                playChunk(rawAudio);
              } catch (IOException e) {
                throw new RuntimeException(e);
              } catch (InterruptedException e) {
                throw new RuntimeException(e);
              }
            } else {
              try {
                Thread.sleep(100);
              } catch (InterruptedException e) {
                throw new RuntimeException(e);
              }
            }
          }
        }
      });
      decoderThread.start();
      playerThread.start();
    }

    // Play an audio chunk and block until playback completes.
    private void playChunk(byte[] chunk) throws IOException, InterruptedException {
      if (chunk == null || chunk.length == 0) return;

      int bytesWritten = 0;
      while (bytesWritten < chunk.length) {
        bytesWritten += line.write(chunk, bytesWritten, chunk.length - bytesWritten);
      }
      int audioLength = chunk.length / (this.sampleRate*2/1000);
      // Wait for the buffered audio to finish playing.
      Thread.sleep(audioLength - 10);
    }

    public void write(String b64Audio) {
      b64AudioBuffer.add(b64Audio);
    }

    public void cancel() {
      b64AudioBuffer.clear();
      RawAudioBuffer.clear();
    }

    public void waitForComplete() throws InterruptedException {
      while (!b64AudioBuffer.isEmpty() || !RawAudioBuffer.isEmpty()) {
        Thread.sleep(100);
      }
      line.drain();
    }

    public void shutdown() throws InterruptedException, IOException {
      stopped.set(true);
      decoderThread.join();
      playerThread.join();

      // Save the complete audio file.
      File file = new File("TotalAudio_"+ttsFormat.getSampleRate()+"."+ttsFormat.getFormat());
      try (FileOutputStream fos = new FileOutputStream(file)) {
        fos.write(totalAudioStream.toByteArray());
      }

      if (line != null && line.isRunning()) {
        line.drain();
        line.close();
      }
    }
  }

  public static void main(String[] args) throws InterruptedException, LineUnavailableException, IOException {
    QwenTtsRealtimeParam param = QwenTtsRealtimeParam.builder()
        // To use instruction control, replace the model with qwen3-tts-instruct-flash-realtime.
        .model("qwen3-tts-flash-realtime")
        .url("wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime")
        .apikey(System.getenv("DASHSCOPE_API_KEY"))
        .build();
    AtomicReference<CountDownLatch> completeLatch = new AtomicReference<>(new CountDownLatch(1));
    final AtomicReference<QwenTtsRealtime> qwenTtsRef = new AtomicReference<>(null);

    // Create a real-time audio player instance.
    RealtimePcmPlayer audioPlayer = new RealtimePcmPlayer(24000);

    QwenTtsRealtime qwenTtsRealtime = new QwenTtsRealtime(param, new QwenTtsRealtimeCallback() {
      @Override
      public void onOpen() {
        // Handle connection establishment.
      }
      @Override
      public void onEvent(JsonObject message) {
        String type = message.get("type").getAsString();
        switch(type) {
          case "session.created":
            // Handle session creation.
            if (message.has("session")) {
              String eventId = message.get("event_id").getAsString();
              String sessionId = message.get("session").getAsJsonObject().get("id").getAsString();
              System.out.println("[onEvent] session.created, session_id: "
                  + sessionId + ", event_id: " + eventId);
            }
            break;
          case "response.audio.delta":
            String recvAudioB64 = message.get("delta").getAsString();
            // Play audio in real time.
            audioPlayer.write(recvAudioB64);
            break;
          case "response.done":
            // Handle response completion.
            break;
          case "session.finished":
            // Handle session termination.
            completeLatch.get().countDown();
          default:
            break;
        }
      }
      @Override
      public void onClose(int code, String reason) {
        // Handle connection closure.
      }
    });
    qwenTtsRef.set(qwenTtsRealtime);
    try {
      qwenTtsRealtime.connect();
    } catch (NoApiKeyException e) {
      throw new RuntimeException(e);
    }
    QwenTtsRealtimeConfig config = QwenTtsRealtimeConfig.builder()
        .voice("Cherry")
        .responseFormat(ttsFormat)
        .mode("server_commit")
        // To use instruction control, uncomment the following lines and replace the model with qwen3-tts-instruct-flash-realtime.
        // .instructions("")
        // .optimizeInstructions(true)
        .build();
    qwenTtsRealtime.updateSession(config);
    for (String text:textToSynthesize) {
      qwenTtsRealtime.appendText(text);
      Thread.sleep(100);
    }
    qwenTtsRealtime.finish();
    completeLatch.get().await();
    qwenTtsRealtime.close();

    // Wait for audio playback to complete, then shut down the player.
    audioPlayer.waitForComplete();
    audioPlayer.shutdown();
    System.exit(0);
  }
}

Request parameters

QwenTtsRealtimeParam

Build a QwenTtsRealtimeParam and pass it to the QwenTtsRealtime constructor.
ParameterTypeRequiredDescription
modelStringYesModel name. See Supported models.
urlStringYesWebSocket endpoint: wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime.

QwenTtsRealtimeConfig

Build a QwenTtsRealtimeConfig and pass it to updateSession().
ParameterTypeRequiredDescription
voiceStringYesVoice for synthesis. System voices: available for Qwen3-TTS-Instruct-Flash-Realtime, Qwen3-TTS-Flash-Realtime, and Qwen-TTS-Realtime. See Supported voices for samples. Custom voices: created through Voice cloning (Qwen) (Qwen3-TTS-VC-Realtime only) or Voice design (Qwen) (Qwen3-TTS-VD-Realtime only).
languageTypeStringNoLanguage of the output audio. Default: Auto (the model detects language per segment; accuracy is not guaranteed). Set a specific language to improve quality for single-language text. Valid values: Chinese, English, German, Italian, Portuguese, Spanish, Japanese, Korean, French, Russian.
modeStringNoInteraction mode. server_commit (default): the server decides when to synthesize buffered text, balancing latency and quality. commit: the client triggers synthesis for the lowest latency; requires manual sentence boundary management.
formatStringNoAudio output format. Valid values: pcm (default), wav, mp3, opus. Qwen-TTS-Realtime supports only pcm. See Supported models.
sampleRateintNoSample rate in Hz. Valid values: 8000, 16000, 24000 (default), 48000. Qwen-TTS-Realtime supports only 24000. See Supported models.
speechRatefloatNoSpeech rate multiplier. Range: [0.5, 2.0]. Default: 1.0. Not supported by Qwen-TTS-Realtime.
volumeintNoAudio volume. Range: [0, 100]. Default: 50. Not supported by Qwen-TTS-Realtime.
pitchRatefloatNoPitch multiplier. Range: [0.5, 2.0]. Default: 1.0. Not supported by Qwen-TTS-Realtime.
bitRateintNoAudio bitrate in kbps. Range: [6, 510]. Default: 128. Only for opus format. Higher values produce better quality at larger file sizes. Not supported by Qwen-TTS-Realtime.
instructionsStringNoInstruction text for controlling speech style. Max 1,600 tokens. Chinese and English only. Only for Qwen3-TTS-Instruct-Flash-Realtime. See Instruction control.
optimizeInstructionsbooleanNoOptimizes instructions for more natural synthesis. Default: false. No effect if instructions is not set. Only for Qwen3-TTS-Instruct-Flash-Realtime.

API reference

QwenTtsRealtime class

import com.alibaba.dashscope.audio.qwen_tts_realtime.QwenTtsRealtime;
MethodSignatureServer eventsDescription
connectpublic void connect() throws NoApiKeyException, InterruptedExceptionsession.created, session.updatedOpens a WebSocket connection.
updateSessionpublic void updateSession(QwenTtsRealtimeConfig config)session.updatedUpdates the session configuration. Call immediately after connect() to override defaults. The server validates parameters and returns an error if invalid.
appendTextpublic void appendText(String text)NoneAppends text to the server-side input buffer. In server_commit mode, the server decides when to synthesize. In commit mode, call commit() to trigger synthesis.
clearAppendedTextpublic void clearAppendedText()input_text_buffer.clearedClears all text in the server-side input buffer.
commitpublic void commit()input_text_buffer.committed, response.output_item.added, response.content_part.added, response.audio.delta, response.audio.done, response.content_part.done, response.output_item.done, response.doneCommits buffered text and starts synthesis. Returns an error if the buffer is empty. In server_commit mode, the server commits automatically. In commit mode, call this to trigger synthesis.
finishpublic void finish()session.finishedStops the current task.
closepublic void close()NoneCloses the WebSocket connection.
getSessionIdpublic String getSessionId()NoneReturns the current session ID.
getResponseIdpublic String getResponseId()NoneReturns the most recent response ID.
getFirstAudioDelaypublic long getFirstAudioDelay()NoneReturns the first audio packet latency in milliseconds.

QwenTtsRealtimeCallback interface

import com.alibaba.dashscope.audio.qwen_tts_realtime.QwenTtsRealtimeCallback;
MethodParametersDescription
onOpen()NoneCalled when the WebSocket connection opens.
onEvent(JsonObject message)message: The server event payload.Called when a server event arrives. For event types, see Server-side events.
onClose(int code, String reason)code: WebSocket close status code. reason: Close reason.Called when the server closes the connection.

Next steps