CosyVoice WebSocket API

Parameters and protocol for CosyVoice text to speech over WebSocket. The DashScope SDK supports Java and Python only -- use WebSocket for other languages. User guide: For model overviews and voice selection, see Speech synthesis. WebSocket enables full-duplex communication. The client and server establish a persistent connection with a single handshake, then push data to each other in real time. Common WebSocket libraries:

Go: gorilla/websocket
PHP: Ratchet
Node.js: ws

CosyVoice models support only WebSocket -- not HTTP REST APIs. HTTP requests (POST, GET) return InvalidParameter or URL errors.

Text and format limits

Text length limits

Send up to 20,000 characters per continue-task instruction. The total across all continue-task instructions must not exceed 200,000 characters.

Character counting rules

Chinese characters (simplified, traditional, Japanese Kanji, Korean Hanja) count as two characters. All others (punctuation, letters, numbers, Kana/Hangul) count as one.
SSML tags are excluded from the character count.
Examples:
- "你好" → 2 + 2 = 4 characters
- "中A文123" → 2 + 1 + 2 + 1 + 1 + 1 = 8 characters
- "中文。" → 2 + 2 + 1 = 5 characters
- "中文。" → 2 + 1 + 2 + 1 = 6 characters
- "<speak>你好</speak>" → 2 + 2 = 4 characters

Encoding format

Use UTF-8 encoding.

Math expression support

Math expression parsing is available for cosyvoice-v3-flash and cosyvoice-v3-plus. It covers common primary and secondary school math including basic operations, algebra, and geometry.

This feature supports Chinese only.

See Convert LaTeX formulas to speech (Chinese only).

SSML support

SSML requires all of the following:

Model: Only cosyvoice-v3-flash and cosyvoice-v3-plus support SSML.
Voice: Use an SSML-enabled voice:
- All cloned voices (created through the Voice Cloning API).
- System voices marked as SSML-enabled in the voice list.
System voices without SSML support (such as some basic voices) return the error "SSML text is not supported at the moment!" even with enable_ssml enabled.
Parameter: Set enable_ssml to true in the run-task instruction.

Then send SSML-formatted text through the continue-task instruction. For a complete example, see Getting started.

Interaction flow

Client-to-server messages are instructions. Server-to-client messages are JSON events or binary audio streams. The sequence:

Open a WebSocket connection.
Send the run-task instruction to start a task.
Wait for the task-started event before proceeding.
Send text: Send one or more continue-task instructions in order. After receiving a complete sentence, the server returns a result-generated event and the audio stream. For text length constraints, see the text field in the continue-task instruction.
Send multiple continue-task instructions to submit text fragments in order. The server segments text into sentences automatically:
- Complete sentences are synthesized immediately.
- Incomplete sentences are buffered until complete. No audio is returned for incomplete sentences.
After receiving the finish-task instruction, the server force-synthesizes all buffered content.
Receive the audio stream through the binary channel.
After sending all text, send the finish-task instruction. Continue receiving the audio stream. Do not skip this step, or the ending portion of the audio may be lost.
Receive the task-finished event from the server.
Close the WebSocket connection.

Reuse a WebSocket connection for multiple tasks instead of creating a new connection each time. See Connection overhead and reuse.

Keep the task_id consistent: run-task, all continue-task, and finish-task instructions in a single task must use the same task_id.Mismatched task_ids cause:

Disordered audio delivery.
Misaligned speech content.
Abnormal task state, possibly preventing receipt of the task-finished event.
Billing failures or inaccurate usage statistics.

Best practice:

Generate a unique task_id (for example, UUID) when sending run-task.
Store the task_id in a variable.
Use this task_id for all subsequent continue-task and finish-task instructions.
After receiving task-finished, generate a new task_id for the next task.

Client implementation tips

The server delivers audio as multiple binary frames in order. Concatenate all frames to produce the complete audio file:

# Python: Concatenate audio chunks
with open("output.mp3", "ab") as f:  # Append mode
  f.write(audio_chunk)  # audio_chunk is each received binary audio chunk

// JavaScript: Concatenate audio chunks
const audioChunks = [];
ws.onmessage = (event) => {
  if (event.data instanceof Blob) {
    audioChunks.push(event.data);  // Collect all audio chunks
  }
};
// Merge audio after task completes
const audioBlob = new Blob(audioChunks, { type: 'audio/mp3' });

Common mistakes:

Closing the connection before all audio chunks arrive, resulting in incomplete audio.
Forgetting to send finish-task, leaving text buffered and unprocessed.
In ASR-to-LLM-to-TTS pipelines, streaming text character-by-character instead of sending complete sentences via continue-task.

Mobile and browser apps: The OS may pause WebSocket connections when the app enters the background. Use a background task or service to keep the connection active, or reinitialize it on foreground return. In browsers, use beforeunload to close the connection before the page unloads.

Platform-specific tips

Flutter: Close the connection in the dispose method to prevent memory leaks when using web_socket_channel. Handle app lifecycle events (such as AppLifecycleState.paused) for background transitions.
Web (browser): Some browsers limit WebSocket connections. Reuse a single connection for multiple tasks. Use beforeunload to close the connection before the page closes.
Mobile (iOS/Android native): The OS may pause or terminate network connections when the app enters the background. Use a background task or foreground service to keep the WebSocket active, or reinitialize the task on foreground return.

URL

wss://dashscope-intl.aliyuncs.com/api-ws/v1/inference

Common URL errors:

Wrong protocol: Use wss://, not http:// or https://.
Auth in query string: Do not put Authorization in the URL (such as ?Authorization=bearer YOUR_API_KEY). Set it in the HTTP handshake headers. See Headers.
Extra path segments: Do not append model names or other parameters to the URL. Specify the model in payload.model in the run-task instruction.

Headers

Parameter	Type	Required	Description
Authorization	string	Yes	Authentication token. Format: `Bearer $DASHSCOPE_API_KEY`.
user-agent	string	No	Client identifier for source tracking.
X-DashScope-WorkSpace	string	No	Your Qwen Code workspace ID.
X-DashScope-DataInspection	string	No	Data compliance inspection. Default: `enable`. Do not set unless necessary.

Authentication timingAuthentication occurs during the WebSocket handshake, not when sending run-task. If the Authorization header is missing or invalid, the server rejects the handshake with an HTTP 401 or 403 error. Client libraries typically report this as a WebSocketBadStatus exception.

Troubleshoot authentication failures

If the WebSocket connection fails:

Check API key format: Confirm the Authorization header uses bearer YOUR_API_KEY with a space between bearer and the key.
Verify API key validity: Check your API keys page to confirm the key is active and authorized for CosyVoice models.
Check header placement: Set the Authorization header during the WebSocket handshake. Examples by language:
- Python (websockets): extra_headers={"Authorization": f"bearer {api_key}"}
- JavaScript: The browser WebSocket API does not support custom headers. Use a server-side proxy or another library such as ws.
- Go (gorilla/websocket): header.Add("Authorization", fmt.Sprintf("bearer %s", apiKey))
Test network connectivity: Use curl or Postman to verify the API key by calling other HTTP-supported DashScope APIs.

Using WebSocket in browsers

The browser new WebSocket(url) API does not support custom request headers (including Authorization) during the handshake. You cannot authenticate directly from frontend code. Solution: Use a backend proxy

Connect to CosyVoice from your backend (Node.js, Java, or Python), where you can set the Authorization header.
Have the frontend connect to your backend via WebSocket, which forwards messages to CosyVoice.
This keeps the API key hidden and lets you add authentication, logging, or rate limiting.

Never hardcode your API key in frontend code. A leaked API key can lead to account compromise, unexpected charges, or data breaches.

Example code: For other languages, implement the same logic or use AI tools to convert these examples.

Frontend (native web) + Backend (Node.js Express): cosyvoiceNodeJs_en.zip
Frontend (native web) + Backend (Python Flask): cosyvoiceFlask_en.zip

Instructions (client to server)

Instructions are JSON messages sent as WebSocket text frames. They control the task lifecycle. Send instructions in this order:

Send run-task
- Starts the task.
- Use the same task_id in all subsequent continue-task and finish-task instructions.
Send continue-task
- Sends text to synthesize.
- Send only after receiving task-started.
Send finish-task
- Ends the task.
- Send after all continue-task instructions are sent.

1. run-task instruction: Start a task

Starts a text to speech task. Configure voice, sample rate, and other parameters here.

Timing: Send after the WebSocket connection is established.
Do not send text here. Send text using continue-task instead.
The input field is required but must be {}. Omitting it causes the "task can not be null" error.

Example:

{
  "header": {
    "action": "run-task",
    "task_id": "2bf83b9a-baeb-4fda-8d9a-xxxxxxxxxxxx",
    "streaming": "duplex"
  },
  "payload": {
    "task_group": "audio",
    "task": "tts",
    "function": "SpeechSynthesizer",
    "model": "cosyvoice-v3-flash",
    "parameters": {
      "text_type": "PlainText",
      "voice": "longanyang",
      "format": "mp3",
      "sample_rate": 22050,
      "volume": 50,
      "rate": 1,
      "pitch": 1
    },
    "input": {}
  }
}

header parameters:

Parameter	Type	Required	Description
header.action	string	Yes	Fixed value: "run-task".
header.task_id	string	Yes	A 32-character UUID. Hyphens are optional (such as `"2bf83b9a-baeb-4fda-8d9a-xxxxxxxxxxxx"` or `"2bf83b9abaeb4fda8d9axxxxxxxxxxxx"`). Most languages provide built-in UUID APIs.
header.streaming	string	Yes	Fixed value: "duplex".

Python example for generating a task ID:

import uuid

def generateTaskId(self):
  # Generate random UUID
  return uuid.uuid4().hex

Use the same task_id for all subsequent continue-task and finish-task instructions. payload parameters:

Parameter	Type	Required	Description
payload.task_group	string	Yes	Fixed value: "audio".
payload.task	string	Yes	Fixed value: "tts".
payload.function	string	Yes	Fixed value: "SpeechSynthesizer".
payload.model	string	Yes	The text to speech model. See Voice list.
payload.input	object	Yes	Required but must be empty (`{}`) in run-task. Send text using continue-task.

Common error: Omitting the input field or adding unexpected fields (like mode or content) causes "InvalidParameter: task can not be null" or connection close (WebSocket code 1007).

payload.parameters:

Parameter	Type	Required	Description
text_type	string	Yes	Fixed value: "PlainText".
voice	string	Yes	Voice for synthesis. See Voice list for available system voices.
format	string	No	Audio format. Supports pcm, wav, mp3 (default), and opus. For opus, adjust bitrate with `bit_rate`.
sample_rate	integer	No	Sample rate in Hz. Default: 22050. Valid values: 8000, 16000, 22050, 24000, 44100, 48000.
volume	integer	No	Volume. Default: 50. Range: [0, 100]. Scales linearly. 0 is silent, 100 is maximum.
rate	float	No	Speech rate. Default: 1.0. Range: [0.5, 2.0]. Below 1.0 slows speech; above 1.0 speeds it up.
pitch	float	No	Pitch multiplier. Default: 1.0. Range: [0.5, 2.0]. The relationship with perceived pitch is not strictly linear. Test to find a suitable value.
enable_ssml	boolean	No	Enable SSML. When `true`, only one continue-task instruction is allowed.
bit_rate	int	No	Audio bitrate in kbps (for Opus format). Default: 32. Range: [6, 510].
word_timestamp_enabled	boolean	No	Enable word-level timestamps. Default: false. Available for system voices marked as supported in the voice list.
seed	int	No	Random seed for generation. Same seed with identical parameters reproduces the same output. Default: 0. Range: [0, 65535].
language_hints	array[string]	No	Target language for synthesis. Valid values: zh, en, fr, de, ja, ko, ru, pt, th, id, vi. This is an array, but only the first element is processed.
instruction	string	No	Controls synthesis effects such as dialect, emotion, or speaking style. Available for system voices marked as supporting Instruct in the voice list. Max length: 100 characters.
enable_aigc_tag	boolean	No	Add an invisible AIGC identifier to generated audio. When true, an identifier is embedded in WAV, MP3, and Opus formats. Default: false. Supported by cosyvoice-v3-flash and cosyvoice-v3-plus.
aigc_propagator	string	No	Sets the `ContentPropagator` field in the AIGC identifier. Takes effect only when `enable_aigc_tag` is `true`. Default: UID. Supported by cosyvoice-v3-flash and cosyvoice-v3-plus.
aigc_propagate_id	string	No	Sets the `PropagateID` field in the AIGC identifier. Takes effect only when `enable_aigc_tag` is `true`. Default: the current request ID. Supported by cosyvoice-v3-flash and cosyvoice-v3-plus.
hot_fix	object	No	Text hotpatching configuration. Customize pronunciation or replace text before synthesis. Available only for cosyvoice-v3-flash.
enable_markdown_filter	boolean	No	Enable Markdown filtering. Removes Markdown symbols from input text before synthesis. Default: false. Available only for cosyvoice-v3-flash.

When word_timestamp_enabled is enabled, timestamps appear in the result-generated event:

{
  "header": {
    "task_id": "3f39be22-efbd-4844-91d5-xxxxxxxxxxxx",
    "event": "result-generated",
    "attributes": {}
  },
  "payload": {
    "output": {
      "sentence": {
        "index": 0,
        "words": [
          {
            "text": "bed",
            "begin_index": 0,
            "end_index": 1,
            "begin_time": 280,
            "end_time": 640
          }
        ]
      }
    }
  }
}

Instruction examples for cosyvoice-v3-flash with cloned voices:

Please speak in Cantonese. (Supported dialects: Cantonese, Northeastern, Gansu, Guizhou, Henan, Hubei, Jiangxi, Minnan, Ningxia, Shanxi, Shaanxi, Shandong, Shanghainese, Sichuan, Tianjin, Yunnan.)
Please say a sentence as loudly as possible.
Please say a sentence as slowly as possible.
Please say a sentence as quickly as possible.
Please say a sentence very softly.
Can you speak a little slower?
Can you speak very quickly?
Can you speak very slowly?
Can you speak a little faster?
Please say a sentence very angrily.
Please say a sentence very happily.
Please say a sentence very fearfully.
Please say a sentence very sadly.
Please say a sentence very surprisedly.
Please try to sound as firm as possible.
Please try to sound as angry as possible.
Please try an approachable tone.
Please speak in a cold tone.
Please speak in a majestic tone.
I want to experience a natural tone.
I want to see how you express a threat.
I want to see how you express wisdom.
I want to see how you express seduction.
I want to hear you speak in a lively way.
I want to hear you speak with passion.
I want to hear you speak in a steady manner.
I want to hear you speak with confidence.
Can you talk to me with excitement?
Can you show an arrogant emotion?
Can you show an elegant emotion?
Can you answer the question happily?
Can you give a gentle emotional demonstration?
Can you talk to me in a calm tone?
Can you answer me in a deep way?
Can you talk to me with a gruff attitude?
Tell me the answer in a sinister voice.
Tell me the answer in a resilient voice.
Narrate in a natural and friendly chat style.
Speak in the tone of a radio drama podcaster.

For cosyvoice-v3-flash with system voices, the instruction must use a fixed format. See the voice list. hot_fix example:

"hot_fix": {
  "pronunciation": [
  {"weather": "tian1 qi4"}
  ],
  "replace": [
  {"today": "jin1 tian1"}
  ]
}

2. continue-task instruction

Sends text to synthesize. Send all text in one instruction, or split it across multiple instructions in order.

When to send: After receiving task-started.

Do not wait longer than 23 seconds between text fragments, or a "request timeout after 23 seconds" error occurs. If no more text remains, send finish-task to end the task. The 23-second timeout is server-enforced and cannot be modified.

Example:

{
  "header": {
    "action": "continue-task",
    "task_id": "2bf83b9a-baeb-4fda-8d9a-xxxxxxxxxxxx",
    "streaming": "duplex"
  },
  "payload": {
    "input": {
      "text": "Before my bed, moonlight shines bright, I suspect it's frost upon the ground."
    }
  }
}

header parameters:

Parameter	Type	Required	Description
header.action	string	Yes	Fixed value: "continue-task".
header.task_id	string	Yes	Must match the task_id from run-task.
header.streaming	string	Yes	Fixed value: "duplex".

payload parameters:

Parameter	Type	Required	Description
input.text	string	Yes	Text to synthesize.

3. finish-task instruction: End task

Ends the task. Always send this instruction. Otherwise:

Incomplete audio: The server won't force-synthesize cached sentences, causing missing endings.
Connection timeout: Waiting more than 23 seconds after the last continue-task triggers a timeout.
Billing issues: Usage information may be inaccurate.

When to send: Send immediately after all continue-task instructions. Do not wait for audio to finish -- this may trigger timeouts.

Example:

{
  "header": {
    "action": "finish-task",
    "task_id": "2bf83b9a-baeb-4fda-8d9a-xxxxxxxxxxxx",
    "streaming": "duplex"
  },
  "payload": {
    "input": {}
  }
}

header parameters:

Parameter	Type	Required	Description
header.action	string	Yes	Fixed value: "finish-task".
header.task_id	string	Yes	Must match the task_id from run-task.
header.streaming	string	Yes	Fixed value: "duplex".

payload parameters:

Parameter	Type	Required	Description
payload.input	object	Yes	Fixed value: `{}`.

Events (server to client)

Events are JSON messages from the server. Each marks a stage in the task lifecycle.

Binary audio is sent separately -- not included in any event.

1. task-started event: Task started

Confirms the task has started. Send continue-task or finish-task only after receiving this event. Otherwise, the task fails. The task-started event's payload is empty. Example:

{
  "header": {
    "task_id": "2bf83b9a-baeb-4fda-8d9a-xxxxxxxxxxxx",
    "event": "task-started",
    "attributes": {}
  },
  "payload": {}
}

header parameters:

Parameter	Type	Description
header.event	string	Fixed value: "task-started".
header.task_id	string	Task ID generated by the client.

2. result-generated event

While you send continue-task and finish-task instructions, the server returns result-generated events and binary audio frames. Each result-generated event contains the current sentence index. Audio data arrives as binary frames between events. One sentence produces multiple binary audio frames. Receive frames in order and append to the same file. Example:

{
  "header": {
    "task_id": "3f2d5c86-0550-45c0-801f-xxxxxxxxxx",
    "event": "result-generated",
    "attributes": {}
  },
  "payload": {
    "output": {
      "sentence": {
        "index": 0,
        "words": []
      }
    },
    "usage": {
      "characters": 11
    }
  }
}

header parameters:

Parameter	Type	Description
header.event	string	Fixed value: "result-generated".
header.task_id	string	Task ID generated by the client.
header.attributes	object	Additional attributes -- usually empty.

payload parameters:

Parameter	Type	Description
payload.output.sentence.index	integer	Sentence number, starting from 0.
payload.output.sentence.words	array	Array of word information.
payload.output.sentence.words.text	string	Word text.
payload.output.sentence.words.begin_index	integer	Starting position of the word in the sentence, counting from 0.
payload.output.sentence.words.end_index	integer	Ending position of the word in the sentence, counting from 1.
payload.output.sentence.words.begin_time	integer	Start timestamp of the word's audio, in milliseconds.
payload.output.sentence.words.end_time	integer	End timestamp of the word's audio, in milliseconds.
payload.usage.characters	integer	Cumulative billed characters so far. The `usage` field appears in some `result-generated` events. Use the last occurrence.

3. task-finished event: Task finished

Marks the end of the task. After the task ends, close the WebSocket connection or reuse it to send a new run-task instruction (see Connection overhead and reuse). Example:

{
  "header": {
    "task_id": "2bf83b9a-baeb-4fda-8d9a-xxxxxxxxxxxx",
    "event": "task-finished",
    "attributes": {}
  },
  "payload": {
    "output": {}
  }
}

header parameters:

Parameter	Type	Description
header.event	string	Fixed value: "task-finished".
header.task_id	string	Task ID generated by the client.

4. task-failed event: Task failed

Indicates the task has failed. Close the WebSocket connection and review the error message. Example:

{
  "header": {
    "task_id": "2bf83b9a-baeb-4fda-8d9a-xxxxxxxxxxxx",
    "event": "task-failed",
    "error_code": "InvalidParameter",
    "error_message": "[tts:]Engine return error code: 418",
    "attributes": {}
  },
  "payload": {}
}

header parameters:

Parameter	Type	Description
header.event	string	Fixed value: "task-failed".
header.task_id	string	Task ID generated by the client.
header.error_code	string	Error type.
header.error_message	string	Detailed error reason.

Task interruption

During streaming synthesis, you can interrupt the current task early (for example, if the user cancels playback) using one of these methods:

Interrupt method	Server behavior	Use case
Close the connection	Stops synthesis immediately. Discards unsent audio. No task-finished event. Connection cannot be reused.	Immediate stop: User cancels playback, switches content, or exits app.
Send finish-task	Forces synthesis of cached text. Returns remaining audio and task-finished event. Connection stays reusable.	Graceful end: Stop sending text but receive all cached audio.

Connection overhead and reuse

The WebSocket service supports connection reuse. Send run-task to start a task and finish-task to end it. After task-finished, reuse the same connection by sending a new run-task instruction.

Send a new run-task only after receiving task-finished.
Use different task_ids for different tasks on the same connection.
Failed tasks trigger task-failed and close the connection (cannot reuse).
Connections timeout after 60 seconds of inactivity.

Performance and concurrency

Concurrency limits

See Rate limits. To increase your concurrency quota, contact customer support. Quota adjustments require review and typically take 1 to 3 business days.

Best practice: Reuse a WebSocket connection for multiple tasks. See Connection overhead and reuse.

Connection latency

Typical connection time:

Cross-border connections: 1 to 3 seconds. In rare cases, 10 to 30 seconds.

Troubleshoot slow connections (>30 seconds):

Network latency: Check cross-border connection quality or ISP performance.
Slow DNS: Try public DNS (8.8.8.8) or configure a local hosts file for dashscope-intl.aliyuncs.com.
TLS handshake: Update to TLS 1.2 or later.
Proxy/firewall: Corporate networks may block or slow WebSocket connections.

Troubleshooting tools:

Use Wireshark or tcpdump to analyze TCP handshake, TLS handshake, and WebSocket Upgrade timing.
Test HTTP latency with curl: curl -w "@curl-format.txt" -o /dev/null -s https://dashscope-intl.aliyuncs.com

Audio generation speed

Real-time factor (RTF): 0.1 to 0.5x real-time (1 second of audio takes 0.1 to 0.5 seconds to generate). Actual speed varies by model, text length, and server load.
First packet latency: 200 to 800 ms from sending continue-task to receiving the first audio chunk.

Example code

Basic connectivity example. Implement production-ready logic for your use case. Use asynchronous programming to send and receive simultaneously:

Connect: Call your WebSocket library's connect function with Headers and URL.
Listen for messages: The server sends binary audio and events: Events:
- task-started: Task started. Send continue-task or finish-task only after this.
- result-generated: Returned continuously after you send continue-task or finish-task.
- task-finished: Task complete. Close connection.
- task-failed: Task failed. Close connection and check error.
Binary audio:
- For MP3/Opus streaming: Use a streaming player (FFmpeg, PyAudio, AudioFormat, MediaSource). Do not play frame by frame.
- To save complete audio: Write frames to the same file in append mode.
- For WAV/MP3: Only the first frame has header info; subsequent frames are audio data only.
Send instructions: From a separate thread, send instructions to the server.
Close connection: Close when done, on error, or after task-finished/task-failed.

package main

import (
  "encoding/json"
  "fmt"
  "net/http"
  "os"
  "strings"
  "time"

  "github.com/google/uuid"
  "github.com/gorilla/websocket"
)

const (
  wsURL      = "wss://dashscope-intl.aliyuncs.com/api-ws/v1/inference/"
  outputFile = "output.mp3"
)

func main() {
  // If no environment variable is set, replace next line with: apiKey := "YOUR_API_KEY"
  apiKey := os.Getenv("DASHSCOPE_API_KEY")

  // Clear output file
  os.Remove(outputFile)
  os.Create(outputFile)

  // Connect WebSocket
  header := make(http.Header)
  header.Add("X-DashScope-DataInspection", "enable")
  header.Add("Authorization", fmt.Sprintf("bearer %s", apiKey))

  conn, resp, err := websocket.DefaultDialer.Dial(wsURL, header)
  if err != nil {
    if resp != nil {
      fmt.Printf("Connection failed HTTP status code: %d\n", resp.StatusCode)
    }
    fmt.Println("Connection failed:", err)
    return
  }
  defer conn.Close()

  // Generate task ID
  taskID := uuid.New().String()
  fmt.Printf("Generated task ID: %s\n", taskID)

  // Send run-task instruction
  runTaskCmd := map[string]interface{}{
    "header": map[string]interface{}{
      "action":    "run-task",
      "task_id":   taskID,
      "streaming": "duplex",
    },
    "payload": map[string]interface{}{
      "task_group": "audio",
      "task":       "tts",
      "function":   "SpeechSynthesizer",
      "model":      "cosyvoice-v3-flash",
      "parameters": map[string]interface{}{
        "text_type":   "PlainText",
        "voice":       "longanyang",
        "format":      "mp3",
        "sample_rate": 22050,
        "volume":      50,
        "rate":        1,
        "pitch":       1,
        // If enable_ssml is true, only one continue-task instruction is allowed. Otherwise, it returns “Text request limit violated, expected 1.”
        "enable_ssml": false,
      },
      "input": map[string]interface{}{},
    },
  }

  runTaskJSON, _ := json.Marshal(runTaskCmd)
  fmt.Printf("Sent run-task instruction: %s\n", string(runTaskJSON))

  err = conn.WriteMessage(websocket.TextMessage, runTaskJSON)
  if err != nil {
    fmt.Println("Failed to send run-task:", err)
    return
  }

  textSent := false

  // Process messages
  for {
    messageType, message, err := conn.ReadMessage()
    if err != nil {
      fmt.Println("Failed to read message:", err)
      break
    }

    // Process binary message
    if messageType == websocket.BinaryMessage {
      fmt.Printf("Received binary message, length: %d\n", len(message))
      file, _ := os.OpenFile(outputFile, os.O_APPEND|os.O_WRONLY|os.O_CREATE, 0644)
      file.Write(message)
      file.Close()
      continue
    }

    // Process text message
    messageStr := string(message)
    fmt.Printf("Received text message: %s\n", strings.ReplaceAll(messageStr, "\n", ""))

    // Simple JSON parse to get event type
    var msgMap map[string]interface{}
    if json.Unmarshal(message, &msgMap) == nil {
      if header, ok := msgMap["header"].(map[string]interface{}); ok {
        if event, ok := header["event"].(string); ok {
          fmt.Printf("Event type: %s\n", event)

          switch event {
          case "task-started":
            fmt.Println("=== Received task-started event ===")

            if !textSent {
              // Send continue-task instruction

              texts := []string{"Before my bed, moonlight shines bright, I suspect it's frost upon the ground.", "I raise my eyes to gaze at the bright moon, then bow my head, thinking of home."}

              for _, text := range texts {
                continueTaskCmd := map[string]interface{}{
                  "header": map[string]interface{}{
                    "action":    "continue-task",
                    "task_id":   taskID,
                    "streaming": "duplex",
                  },
                  "payload": map[string]interface{}{
                    "input": map[string]interface{}{
                      "text": text,
                    },
                  },
                }

                continueTaskJSON, _ := json.Marshal(continueTaskCmd)
                fmt.Printf("Sent continue-task instruction: %s\n", string(continueTaskJSON))

                err = conn.WriteMessage(websocket.TextMessage, continueTaskJSON)
                if err != nil {
                  fmt.Println("Failed to send continue-task:", err)
                  return
                }
              }

              textSent = true

              // Delay before sending finish-task
              time.Sleep(500 * time.Millisecond)

              // Send finish-task instruction
              finishTaskCmd := map[string]interface{}{
                "header": map[string]interface{}{
                  "action":    "finish-task",
                  "task_id":   taskID,
                  "streaming": "duplex",
                },
                "payload": map[string]interface{}{
                  "input": map[string]interface{}{},
                },
              }

              finishTaskJSON, _ := json.Marshal(finishTaskCmd)
              fmt.Printf("Sent finish-task instruction: %s\n", string(finishTaskJSON))

              err = conn.WriteMessage(websocket.TextMessage, finishTaskJSON)
              if err != nil {
                fmt.Println("Failed to send finish-task:", err)
                return
              }
            }

          case "task-finished":
            fmt.Println("=== Task completed ===")
            return

          case "task-failed":
            fmt.Println("=== Task failed ===")
            if header["error_message"] != nil {
              fmt.Printf("Error message: %s\n", header["error_message"])
            }
            return

          case "result-generated":
            fmt.Println("Received result-generated event")
          }
        }
      }
    }
  }
}

using System.Net.WebSockets;
using System.Text;
using System.Text.Json;

class Program {
  // If no environment variable is set, replace next line with: private static readonly string ApiKey = "YOUR_API_KEY"
  private static readonly string ApiKey = Environment.GetEnvironmentVariable("DASHSCOPE_API_KEY") ?? throw new InvalidOperationException("DASHSCOPE_API_KEY environment variable is not set.");

  private const string WebSocketUrl = "wss://dashscope-intl.aliyuncs.com/api-ws/v1/inference/";
  // Output file path
  private const string OutputFilePath = "output.mp3";

  // WebSocket client
  private static ClientWebSocket _webSocket = new ClientWebSocket();
  // Cancellation token source
  private static CancellationTokenSource _cancellationTokenSource = new CancellationTokenSource();
  // Task ID
  private static string? _taskId;
  // Task started flag
  private static TaskCompletionSource<bool> _taskStartedTcs = new TaskCompletionSource<bool>();

  static async Task Main(string[] args) {
    try {
      // Clear output file
      ClearOutputFile(OutputFilePath);

      // Connect to WebSocket service
      await ConnectToWebSocketAsync(WebSocketUrl);

      // Start receiving messages
      Task receiveTask = ReceiveMessagesAsync();

      // Send run-task instruction
      _taskId = GenerateTaskId();
      await SendRunTaskCommandAsync(_taskId);

      // Wait for task-started event
      await _taskStartedTcs.Task;

      // Send continue-task instructions
      string[] texts = {
        "Before my bed, moonlight shines bright",
        "I suspect it's frost upon the ground",
        "I raise my eyes to gaze at the bright moon",
        "then bow my head, thinking of home"
      };
      foreach (string text in texts) {
        await SendContinueTaskCommandAsync(text);
      }

      // Send finish-task instruction
      await SendFinishTaskCommandAsync(_taskId);

      // Wait for receive task to complete
      await receiveTask;

      Console.WriteLine("Task completed. Connection closed.");
    } catch (OperationCanceledException) {
      Console.WriteLine("Task canceled.");
    } catch (Exception ex) {
      Console.WriteLine($"Error: {ex.Message}");
    } finally {
      _cancellationTokenSource.Cancel();
      _webSocket.Dispose();
    }
  }

  private static void ClearOutputFile(string filePath) {
    if (File.Exists(filePath)) {
      File.WriteAllText(filePath, string.Empty);
      Console.WriteLine("Output file cleared.");
    } else {
      Console.WriteLine("Output file does not exist. No action needed.");
    }
  }

  private static async Task ConnectToWebSocketAsync(string url) {
    var uri = new Uri(url);
    if (_webSocket.State == WebSocketState.Connecting || _webSocket.State == WebSocketState.Open) {
      return;
    }

    // Set WebSocket request headers
    _webSocket.Options.SetRequestHeader("Authorization", $"bearer {ApiKey}");
    _webSocket.Options.SetRequestHeader("X-DashScope-DataInspection", "enable");

    try {
      await _webSocket.ConnectAsync(uri, _cancellationTokenSource.Token);
      Console.WriteLine("Successfully connected to WebSocket service.");
    } catch (OperationCanceledException) {
      Console.WriteLine("WebSocket connection canceled.");
    } catch (Exception ex) {
      Console.WriteLine($"WebSocket connection failed: {ex.Message}");
      throw;
    }
  }

  private static async Task SendRunTaskCommandAsync(string taskId) {
    var command = CreateCommand("run-task", taskId, "duplex", new {
      task_group = "audio",
      task = "tts",
      function = "SpeechSynthesizer",
      model = "cosyvoice-v3-flash",
      parameters = new
      {
        text_type = "PlainText",
        voice = "longanyang",
        format = "mp3",
        sample_rate = 22050,
        volume = 50,
        rate = 1,
        pitch = 1,
        // If enable_ssml is true, only one continue-task instruction is allowed. Otherwise, it returns “Text request limit violated, expected 1.”
        enable_ssml = false
      },
      input = new { }
    });

    await SendJsonMessageAsync(command);
    Console.WriteLine("Sent run-task instruction.");
  }

  private static async Task SendContinueTaskCommandAsync(string text) {
    if (_taskId == null) {
      throw new InvalidOperationException("Task ID not initialized.");
    }

    var command = CreateCommand("continue-task", _taskId, "duplex", new {
      input = new {
        text
      }
    });

    await SendJsonMessageAsync(command);
    Console.WriteLine("Sent continue-task instruction.");
  }

  private static async Task SendFinishTaskCommandAsync(string taskId) {
    var command = CreateCommand("finish-task", taskId, "duplex", new {
      input = new { }
    });

    await SendJsonMessageAsync(command);
    Console.WriteLine("Sent finish-task instruction.");
  }

  private static async Task SendJsonMessageAsync(string message) {
    var buffer = Encoding.UTF8.GetBytes(message);
    try {
      await _webSocket.SendAsync(new ArraySegment<byte>(buffer), WebSocketMessageType.Text, true, _cancellationTokenSource.Token);
    } catch (OperationCanceledException) {
      Console.WriteLine("Message send canceled.");
    }
  }

  private static async Task ReceiveMessagesAsync() {
    while (_webSocket.State == WebSocketState.Open) {
      var response = await ReceiveMessageAsync();
      if (response != null) {
        var eventStr = response.RootElement.GetProperty("header").GetProperty("event").GetString();
        switch (eventStr) {
          case "task-started":
            Console.WriteLine("Task started.");
            _taskStartedTcs.TrySetResult(true);
            break;
          case "task-finished":
            Console.WriteLine("Task completed.");
            _cancellationTokenSource.Cancel();
            break;
          case "task-failed":
            Console.WriteLine("Task failed: " + response.RootElement.GetProperty("header").GetProperty("error_message").GetString());
            _cancellationTokenSource.Cancel();
            break;
          default:
            // Handle result-generated here
            break;
        }
      }
    }
  }

  private static async Task<JsonDocument?> ReceiveMessageAsync() {
    var buffer = new byte[1024 * 4];
    var segment = new ArraySegment<byte>(buffer);

    try {
      WebSocketReceiveResult result = await _webSocket.ReceiveAsync(segment, _cancellationTokenSource.Token);

      if (result.MessageType == WebSocketMessageType.Close) {
        await _webSocket.CloseAsync(WebSocketCloseStatus.NormalClosure, "Closing", _cancellationTokenSource.Token);
        return null;
      }

      if (result.MessageType == WebSocketMessageType.Binary) {
        // Process binary data
        Console.WriteLine("Received binary data...");

        // Save binary data to file
        using (var fileStream = new FileStream(OutputFilePath, FileMode.Append)) {
          fileStream.Write(buffer, 0, result.Count);
        }

        return null;
      }

      string message = Encoding.UTF8.GetString(buffer, 0, result.Count);
      return JsonDocument.Parse(message);
    } catch (OperationCanceledException) {
      Console.WriteLine("Message receive canceled.");
      return null;
    }
  }

  private static string GenerateTaskId() {
    return Guid.NewGuid().ToString("N").Substring(0, 32);
  }

  private static string CreateCommand(string action, string taskId, string streaming, object payload) {
    var command = new {
      header = new {
        action,
        task_id = taskId,
        streaming
      },
      payload
    };

    return JsonSerializer.Serialize(command);
  }
}

composer.json:

{
  "require": {
    "react/event-loop": "^1.3",
    "react/socket": "^1.11",
    "react/stream": "^1.2",
    "react/http": "^1.1",
    "ratchet/pawl": "^0.4"
  },
  "autoload": {
    "psr-4": {
      "App\\": "src/"
    }
  }
}

Code:

<?php

require __DIR__ . '/vendor/autoload.php';

use Ratchet\Client\Connector;
use React\EventLoop\Loop;
use React\Socket\Connector as SocketConnector;

// If no environment variable is set, replace next line with: $api_key = "YOUR_API_KEY"
$api_key = getenv("DASHSCOPE_API_KEY");
$websocket_url = 'wss://dashscope-intl.aliyuncs.com/api-ws/v1/inference/'; // WebSocket server address
$output_file = 'output.mp3'; // Output file path

$loop = Loop::get();

if (file_exists($output_file)) {
    // Clear file content
    file_put_contents($output_file, '');
}

// Create custom connector
$socketConnector = new SocketConnector($loop, [
    'tcp' => [
      'bindto' => '0.0.0.0:0',
    ],
    'tls' => [
      'verify_peer' => false,
      'verify_peer_name' => false,
    ],
]);

$connector = new Connector($loop, $socketConnector);

$headers = [
    'Authorization' => 'bearer ' . $api_key,
    'X-DashScope-DataInspection' => 'enable'
];

$connector($websocket_url, [], $headers)->then(function ($conn) use ($loop, $output_file) {
    echo "Connected to WebSocket server\n";

    // Generate task ID
    $taskId = generateTaskId();

    // Send run-task instruction
    sendRunTaskMessage($conn, $taskId);

    // Define function to send continue-task instruction
    $sendContinueTask = function() use ($conn, $loop, $taskId) {
      // Text to send
      $texts = ["Before my bed, moonlight shines bright", "I suspect it's frost upon the ground", "I raise my eyes to gaze at the bright moon", "then bow my head, thinking of home"];
      $continueTaskCount = 0;
      foreach ($texts as $text) {
        $continueTaskMessage = json_encode([
          "header" => [
            "action" => "continue-task",
            "task_id" => $taskId,
            "streaming" => "duplex"
          ],
          "payload" => [
            "input" => [
              "text" => $text
            ]
          ]
        ]);
        echo "Preparing to send continue-task instruction: " . $continueTaskMessage . "\n";
        $conn->send($continueTaskMessage);
        $continueTaskCount++;
      }
      echo "Number of continue-task instructions sent: " . $continueTaskCount . "\n";

      // Send finish-task instruction
      sendFinishTaskMessage($conn, $taskId);
    };

    // Flag for task-started event
    $taskStarted = false;

    // Listen for messages
    $conn->on('message', function($msg) use ($conn, $sendContinueTask, $loop, &$taskStarted, $taskId, $output_file) {
      if ($msg->isBinary()) {
        // Write binary data to local file
        file_put_contents($output_file, $msg->getPayload(), FILE_APPEND);
      } else {
        // Process non-binary message
        $response = json_decode($msg, true);

        if (isset($response['header']['event'])) {
          handleEvent($conn, $response, $sendContinueTask, $loop, $taskId, $taskStarted);
        } else {
          echo "Unknown message format\n";
        }
      }
    });

    // Listen for connection close
    $conn->on('close', function($code = null, $reason = null) {
      echo "Connection closed\n";
      if ($code !== null) {
        echo "Close code: " . $code . "\n";
      }
      if ($reason !== null) {
        echo "Close reason: " . $reason . "\n";
      }
    });
}, function ($e) {
    echo "Cannot connect: {$e->getMessage()}\n";
});

$loop->run();

/**
  * Generate task ID
  * @return string
  */
function generateTaskId(): string {
    return bin2hex(random_bytes(16));
}

/**
  * Send run-task instruction
  * @param $conn
  * @param $taskId
  */
function sendRunTaskMessage($conn, $taskId) {
    $runTaskMessage = json_encode([
      "header" => [
        "action" => "run-task",
        "task_id" => $taskId,
        "streaming" => "duplex"
      ],
      "payload" => [
        "task_group" => "audio",
        "task" => "tts",
        "function" => "SpeechSynthesizer",
        "model" => "cosyvoice-v3-flash",
        "parameters" => [
          "text_type" => "PlainText",
          "voice" => "longanyang",
          "format" => "mp3",
          "sample_rate" => 22050,
          "volume" => 50,
          "rate" => 1,
          "pitch" => 1,
          // If enable_ssml is true, only one continue-task instruction is allowed. Otherwise, it returns “Text request limit violated, expected 1.”
          "enable_ssml" => false
        ],
        "input" => (object) []
      ]
    ]);
    echo "Preparing to send run-task instruction: " . $runTaskMessage . "\n";
    $conn->send($runTaskMessage);
    echo "run-task instruction sent\n";
}

/**
  * Read audio file
  * @param string $filePath
  * @return bool|string
  */
function readAudioFile(string $filePath) {
    $voiceData = file_get_contents($filePath);
    if ($voiceData === false) {
      echo "Cannot read audio file\n";
    }
    return $voiceData;
}

/**
  * Split audio data
  * @param string $data
  * @param int $chunkSize
  * @return array
  */
function splitAudioData(string $data, int $chunkSize): array {
    return str_split($data, $chunkSize);
}

/**
  * Send finish-task instruction
  * @param $conn
  * @param $taskId
  */
function sendFinishTaskMessage($conn, $taskId) {
    $finishTaskMessage = json_encode([
      "header" => [
        "action" => "finish-task",
        "task_id" => $taskId,
        "streaming" => "duplex"
      ],
      "payload" => [
        "input" => (object) []
      ]
    ]);
    echo "Preparing to send finish-task instruction: " . $finishTaskMessage . "\n";
    $conn->send($finishTaskMessage);
    echo "finish-task instruction sent\n";
}

/**
  * Handle event
  * @param $conn
  * @param $response
  * @param $sendContinueTask
  * @param $loop
  * @param $taskId
  * @param $taskStarted
  */
function handleEvent($conn, $response, $sendContinueTask, $loop, $taskId, &$taskStarted) {
    switch ($response['header']['event']) {
      case 'task-started':
        echo "Task started. Sending continue-task instructions...\n";
        $taskStarted = true;
        // Send continue-task instruction
        $sendContinueTask();
        break;
      case 'result-generated':
        // Received result-generated event
        break;
      case 'task-finished':
        echo "Task completed\n";
        $conn->close();
        break;
      case 'task-failed':
        echo "Task failed\n";
        echo "Error code: " . $response['header']['error_code'] . "\n";
        echo "Error message: " . $response['header']['error_message'] . "\n";
        $conn->close();
        break;
      case 'error':
        echo "Error: " . $response['payload']['message'] . "\n";
        break;
      default:
        echo "Unknown event: " . $response['header']['event'] . "\n";
        break;
    }

    // Close connection if task completed
    if ($response['header']['event'] == 'task-finished') {
      // Wait 1 second to ensure all data is transmitted
      $loop->addTimer(1, function() use ($conn) {
        $conn->close();
        echo "Client closed connection\n";
      });
    }

    // Close connection if no task-started event received
    if (!$taskStarted && in_array($response['header']['event'], ['task-failed', 'error'])) {
      $conn->close();
    }
}

Install dependencies:

npm install ws
npm install uuid

const WebSocket = require('ws');
const fs = require('fs');
const uuid = require('uuid').v4;

// If no environment variable is set, replace next line with: const apiKey = "YOUR_API_KEY"
const apiKey = process.env.DASHSCOPE_API_KEY;
const url = 'wss://dashscope-intl.aliyuncs.com/api-ws/v1/inference/';
// Output file path
const outputFilePath = 'output.mp3';

// Clear output file
fs.writeFileSync(outputFilePath, '');

// Create WebSocket client
const ws = new WebSocket(url, {
  headers: {
  Authorization: `bearer ${apiKey}`,
  'X-DashScope-DataInspection': 'enable'
  }
});

let taskStarted = false;
let taskId = uuid();

ws.on('open', () => {
  console.log('Connected to WebSocket server');

  // Send run-task instruction
  const runTaskMessage = JSON.stringify({
  header: {
      action: 'run-task',
      task_id: taskId,
      streaming: 'duplex'
  },
  payload: {
      task_group: 'audio',
      task: 'tts',
      function: 'SpeechSynthesizer',
      model: 'cosyvoice-v3-flash',
      parameters: {
    text_type: 'PlainText',
    voice: 'longanyang', // Voice
    format: 'mp3', // Audio format
    sample_rate: 22050, // Sample rate
    volume: 50, // Volume
    rate: 1, // Speech rate
    pitch: 1, // Pitch
    enable_ssml: false // Enable SSML. If true, only one continue-task instruction is allowed. Otherwise, it returns “Text request limit violated, expected 1.”
      },
      input: {}
  }
  });
  ws.send(runTaskMessage);
  console.log('Sent run-task message');
});

const fileStream = fs.createWriteStream(outputFilePath, { flags: 'a' });
ws.on('message', (data, isBinary) => {
  if (isBinary) {
  // Write binary data to file
  fileStream.write(data);
  } else {
  const message = JSON.parse(data);

  switch (message.header.event) {
      case 'task-started':
    taskStarted = true;
    console.log('Task started');
    // Send continue-task instruction
    sendContinueTasks(ws);
    break;
      case 'task-finished':
    console.log('Task completed');
    ws.close();
    fileStream.end(() => {
      console.log('File stream closed');
    });
    break;
      case 'task-failed':
    console.error('Task failed: ', message.header.error_message);
    ws.close();
    fileStream.end(() => {
      console.log('File stream closed');
    });
    break;
      default:
    // Handle result-generated here
    break;
  }
  }
});

function sendContinueTasks(ws) {
  const texts = [
  'Before my bed, moonlight shines bright,',
  'I suspect it\'s frost upon the ground.',
  'I raise my eyes to gaze at the bright moon,',
  'then bow my head, thinking of home.'
  ];

  texts.forEach((text, index) => {
  setTimeout(() => {
      if (taskStarted) {
    const continueTaskMessage = JSON.stringify({
      header: {
      action: 'continue-task',
      task_id: taskId,
      streaming: 'duplex'
      },
      payload: {
      input: {
      text: text
      }
      }
    });
    ws.send(continueTaskMessage);
    console.log(`Sent continue-task, text: ${text}`);
      }
  }, index * 1000); // Send every second
  });

  // Send finish-task instruction
  setTimeout(() => {
  if (taskStarted) {
      const finishTaskMessage = JSON.stringify({
    header: {
      action: 'finish-task',
      task_id: taskId,
      streaming: 'duplex'
    },
    payload: {
      input: {}
    }
      });
      ws.send(finishTaskMessage);
      console.log('Sent finish-task');
  }
  }, texts.length * 1000 + 1000); // Send 1 second after last continue-task
}

ws.on('close', () => {
  console.log('Disconnected from WebSocket server');
});

pom.xml:

<dependencies>
  <!-- WebSocket Client -->
  <dependency>
    <groupId>org.java-websocket</groupId>
    <artifactId>Java-WebSocket</artifactId>
    <version>1.5.3</version>
  </dependency>

  <!-- JSON Processing -->
  <dependency>
    <groupId>com.fasterxml.jackson.core</groupId>
    <artifactId>jackson-databind</artifactId>
    <version>2.13.0</version>
  </dependency>
</dependencies>

build.gradle:

// Omit other code
dependencies {
  // WebSocket Client
  implementation 'org.java-websocket:Java-WebSocket:1.5.3'
  // JSON Processing
  implementation 'com.fasterxml.jackson.core:jackson-databind:2.13.0'
}
// Omit other code

Code:

import com.fasterxml.jackson.databind.ObjectMapper;

import org.java_websocket.client.WebSocketClient;
import org.java_websocket.handshake.ServerHandshake;

import java.io.FileOutputStream;
import java.io.IOException;
import java.net.URI;
import java.nio.ByteBuffer;
import java.util.*;

public class TTSWebSocketClient extends WebSocketClient {
  private final String taskId = UUID.randomUUID().toString();
  private final String outputFile = "output_" + System.currentTimeMillis() + ".mp3";
  private boolean taskFinished = false;

  public TTSWebSocketClient(URI serverUri, Map<String, String> headers) {
    super(serverUri, headers);
  }

  @Override
  public void onOpen(ServerHandshake serverHandshake) {
    System.out.println("Connection successful");

    // Send run-task instruction
    // If enable_ssml is true, only one continue-task instruction is allowed. Otherwise, it returns “Text request limit violated, expected 1.”
    String runTaskCommand = "{ \"header\": { \"action\": \"run-task\", \"task_id\": \"" + taskId + "\", \"streaming\": \"duplex\" }, \"payload\": { \"task_group\": \"audio\", \"task\": \"tts\", \"function\": \"SpeechSynthesizer\", \"model\": \"cosyvoice-v3-flash\", \"parameters\": { \"text_type\": \"PlainText\", \"voice\": \"longanyang\", \"format\": \"mp3\", \"sample_rate\": 22050, \"volume\": 50, \"rate\": 1, \"pitch\": 1, \"enable_ssml\": false }, \"input\": {} }}";
    send(runTaskCommand);
  }

  @Override
  public void onMessage(String message) {
    System.out.println("Received server message: " + message);
    try {
      // Parse JSON message
      Map<String, Object> messageMap = new ObjectMapper().readValue(message, Map.class);

      if (messageMap.containsKey("header")) {
        Map<String, Object> header = (Map<String, Object>) messageMap.get("header");

        if (header.containsKey("event")) {
          String event = (String) header.get("event");

          if ("task-started".equals(event)) {
            System.out.println("Received task-started event");

            List<String> texts = Arrays.asList(
                "Before my bed, moonlight shines bright, I suspect it's frost upon the ground",
                "I raise my eyes to gaze at the bright moon, then bow my head, thinking of home"
            );

            for (String text : texts) {
              // Send continue-task instruction
              sendContinueTask(text);
            }

            // Send finish-task instruction
            sendFinishTask();
          } else if ("task-finished".equals(event)) {
            System.out.println("Received task-finished event");
            taskFinished = true;
            closeConnection();
          } else if ("task-failed".equals(event)) {
            System.out.println("Task failed: " + message);
            closeConnection();
          }
        }
      }
    } catch (Exception e) {
      System.err.println("Exception occurred: " + e.getMessage());
    }
  }

  @Override
  public void onMessage(ByteBuffer message) {
    System.out.println("Received binary audio data size: " + message.remaining());

    try (FileOutputStream fos = new FileOutputStream(outputFile, true)) {
      byte[] buffer = new byte[message.remaining()];
      message.get(buffer);
      fos.write(buffer);
      System.out.println("Audio data written to local file " + outputFile);
    } catch (IOException e) {
      System.err.println("Failed to write audio data to local file: " + e.getMessage());
    }
  }

  @Override
  public void onClose(int code, String reason, boolean remote) {
    System.out.println("Connection closed: " + reason + " (" + code + ")");
  }

  @Override
  public void onError(Exception ex) {
    System.err.println("Error: " + ex.getMessage());
    ex.printStackTrace();
  }

  private void sendContinueTask(String text) {
    String command = "{ \"header\": { \"action\": \"continue-task\", \"task_id\": \"" + taskId + "\", \"streaming\": \"duplex\" }, \"payload\": { \"input\": { \"text\": \"" + text + "\" } }}";
    send(command);
  }

  private void sendFinishTask() {
    String command = "{ \"header\": { \"action\": \"finish-task\", \"task_id\": \"" + taskId + "\", \"streaming\": \"duplex\" }, \"payload\": { \"input\": {} }}";
    send(command);
  }

  private void closeConnection() {
    if (!isClosed()) {
      close();
    }
  }

  public static void main(String[] args) {
    try {
      // If no environment variable is set, replace next line with: String apiKey = "YOUR_API_KEY"
      String apiKey = System.getenv("DASHSCOPE_API_KEY");
      if (apiKey == null || apiKey.isEmpty()) {
        System.err.println("Set DASHSCOPE_API_KEY environment variable");
        return;
      }

      Map<String, String> headers = new HashMap<>();
      headers.put("Authorization", "bearer " + apiKey);
      TTSWebSocketClient client = new TTSWebSocketClient(new URI("wss://dashscope-intl.aliyuncs.com/api-ws/v1/inference/"), headers);

      client.connect();

      while (!client.isClosed() && !client.taskFinished) {
        Thread.sleep(1000);
      }
    } catch (Exception e) {
      System.err.println("Failed to connect to WebSocket service: " + e.getMessage());
      e.printStackTrace();
    }
  }
}

Install dependencies:

pip uninstall websocket-client
pip uninstall websocket
pip install websocket-client

import websocket
import json
import uuid
import os
import time


class TTSClient:
  def __init__(self, api_key, uri):
    """
  Initialize TTSClient instance

  Parameters:
    api_key (str): API key for authentication
    uri (str): WebSocket service address
  """
    self.api_key = api_key  # Replace with your API key
    self.uri = uri  # Replace with your WebSocket address
    self.task_id = str(uuid.uuid4())  # Generate unique task ID
    self.output_file = f"output_{int(time.time())}.mp3"  # Output audio file path
    self.ws = None  # WebSocketApp instance
    self.task_started = False  # Whether task-started was received
    self.task_finished = False  # Whether task-finished or task-failed was received

  def on_open(self, ws):
    """
  Callback when WebSocket connection opens
  Send run-task instruction to start speech synthesis
  """
    print("WebSocket connected")

    # Build run-task instruction
    run_task_cmd = {
      "header": {
        "action": "run-task",
        "task_id": self.task_id,
        "streaming": "duplex"
      },
      "payload": {
        "task_group": "audio",
        "task": "tts",
        "function": "SpeechSynthesizer",
        "model": "cosyvoice-v3-flash",
        "parameters": {
          "text_type": "PlainText",
          "voice": "longanyang",
          "format": "mp3",
          "sample_rate": 22050,
          "volume": 50,
          "rate": 1,
          "pitch": 1,
          # If enable_ssml is True, only one continue-task instruction is allowed. Otherwise, it returns “Text request limit violated, expected 1.”
          "enable_ssml": False
        },
        "input": {}
      }
    }

    # Send run-task instruction
    ws.send(json.dumps(run_task_cmd))
    print("Sent run-task instruction")

  def on_message(self, ws, message):
    """
  Callback when message is received
  Handle text and binary messages separately
  """
    if isinstance(message, str):
      # Handle JSON text message
      try:
        msg_json = json.loads(message)
        print(f"Received JSON message: {msg_json}")

        if "header" in msg_json:
          header = msg_json["header"]

          if "event" in header:
            event = header["event"]

            if event == "task-started":
              print("Task started")
              self.task_started = True

              # Send continue-task instruction
              texts = [
                "Before my bed, moonlight shines bright, I suspect it's frost upon the ground",
                "I raise my eyes to gaze at the bright moon, then bow my head, thinking of home"
              ]

              for text in texts:
                self.send_continue_task(text)

              # Send finish-task after all continue-task instructions
              self.send_finish_task()

            elif event == "task-finished":
              print("Task completed")
              self.task_finished = True
              self.close(ws)

            elif event == "task-failed":
              error_msg = msg_json.get("error_message", "Unknown error")
              print(f"Task failed: {error_msg}")
              self.task_finished = True
              self.close(ws)

      except json.JSONDecodeError as e:
        print(f"JSON parsing failed: {e}")
    else:
      # Handle binary message (audio data)
      print(f"Received binary message, size: {len(message)} bytes")
      with open(self.output_file, "ab") as f:
        f.write(message)
      print(f"Audio data written to local file {self.output_file}")

  def on_error(self, ws, error):
    """Callback on error"""
    print(f"WebSocket error: {error}")

  def on_close(self, ws, close_status_code, close_msg):
    """Callback on close"""
    print(f"WebSocket closed: {close_msg} ({close_status_code})")

  def send_continue_task(self, text):
    """Send continue-task instruction with text to synthesize"""
    cmd = {
      "header": {
        "action": "continue-task",
        "task_id": self.task_id,
        "streaming": "duplex"
      },
      "payload": {
        "input": {
          "text": text
        }
      }
    }

    self.ws.send(json.dumps(cmd))
    print(f"Sent continue-task instruction, text: {text}")

  def send_finish_task(self):
    """Send finish-task instruction to end speech synthesis"""
    cmd = {
      "header": {
        "action": "finish-task",
        "task_id": self.task_id,
        "streaming": "duplex"
      },
      "payload": {
        "input": {}
      }
    }

    self.ws.send(json.dumps(cmd))
    print("Sent finish-task instruction")

  def close(self, ws):
    """Close connection manually"""
    if ws and ws.sock and ws.sock.connected:
      ws.close()
      print("Manually closed connection")

  def run(self):
    """Start WebSocket client"""
    # Set request headers (authentication)
    header = {
      "Authorization": f"bearer {self.api_key}",
      "X-DashScope-DataInspection": "enable"
    }

    # Create WebSocketApp instance
    self.ws = websocket.WebSocketApp(
      self.uri,
      header=header,
      on_open=self.on_open,
      on_message=self.on_message,
      on_error=self.on_error,
      on_close=self.on_close
    )

    print("Listening for WebSocket messages...")
    self.ws.run_forever()  # Start long-lived connection


# Example usage
if __name__ == "__main__":
  # If no environment variable is set, replace next line with: API_KEY = "YOUR_API_KEY"
  API_KEY = os.environ.get("DASHSCOPE_API_KEY")
  SERVER_URI = "wss://dashscope-intl.aliyuncs.com/api-ws/v1/inference/"  # Replace with your WebSocket address

  client = TTSClient(API_KEY, SERVER_URI)
  client.run()

​Text and format limits

​Text length limits

​Character counting rules

​Encoding format

​Math expression support

​SSML support

​Interaction flow

​Client implementation tips

​Platform-specific tips

​URL

​Headers

​Troubleshoot authentication failures

​Using WebSocket in browsers

​Instructions (client to server)

​1. run-task instruction: Start a task

​2. continue-task instruction

​3. finish-task instruction: End task

​Events (server to client)

​1. task-started event: Task started

​2. result-generated event

​3. task-finished event: Task finished

​4. task-failed event: Task failed

​Task interruption

​Connection overhead and reuse

​Performance and concurrency

​Concurrency limits

​Connection latency

​Audio generation speed

​Example code

Text and format limits

Text length limits

Character counting rules

Encoding format

Math expression support

SSML support

Interaction flow

Client implementation tips

Platform-specific tips

URL

Headers

Troubleshoot authentication failures

Using WebSocket in browsers

Instructions (client to server)

1. run-task instruction: Start a task

2. continue-task instruction

3. finish-task instruction: End task

Events (server to client)

1. task-started event: Task started

2. result-generated event

3. task-finished event: Task finished

4. task-failed event: Task failed

Task interruption

Connection overhead and reuse

Performance and concurrency

Concurrency limits

Connection latency

Audio generation speed

Example code