Control pronunciation (SSML)

SSML (Speech Synthesis Markup Language) is an XML markup language that controls speech rate, pitch, pauses, volume, and background music in CosyVoice.

Limitations

Models: cosyvoice-v3-flash, cosyvoice-v3-plus.
Voices: Cloned voices and system voices marked as SSML-enabled in the Voice list.
APIs:
- Java SDK (2.20.3+): Non-streaming and unidirectional streaming only. See the Java SDK docs.
- Python SDK (1.23.4+): Non-streaming and unidirectional streaming only. See the Python SDK docs.
- WebSocket API: Set enable_ssml to true in run-task and send continue-task only once. See the WebSocket API docs.

Getting started

For prerequisites and tutorials, see Text-to-speech - CosyVoice. Check the Limitations section for supported models, voices, and APIs before using SSML.

Java SDK

import com.alibaba.dashscope.audio.ttsv2.SpeechSynthesisParam;
import com.alibaba.dashscope.audio.ttsv2.SpeechSynthesizer;
import com.alibaba.dashscope.utils.Constants;

import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.nio.ByteBuffer;

// See SSML support requirements in the note above
public class Main {
  private static String model = "cosyvoice-v3-flash";
  private static String voice = "longanyang";

  public static void main(String[] args) {
    Constants.baseWebsocketApiUrl = "wss://dashscope-intl.aliyuncs.com/api-ws/v1/inference";
    streamAudioDataToSpeaker();
    System.exit(0);
  }

  public static void streamAudioDataToSpeaker() {
    SpeechSynthesisParam param =
        SpeechSynthesisParam.builder()
            // If you have not configured an environment variable, replace the following line with: .apiKey("sk-xxx")
            .apiKey(System.getenv("DASHSCOPE_API_KEY"))
            .model(model)
            .voice(voice)
            .build();

    SpeechSynthesizer synthesizer = new SpeechSynthesizer(param, null);
    ByteBuffer audio = null;
    try {
      // Non-streaming call; blocks until audio is returned
      // Escape special characters
      audio = synthesizer.call("<speak rate=\"2\">My speaking rate is faster than a normal person's.</speak>");
    } catch (Exception e) {
      throw new RuntimeException(e);
    } finally {
      // Close the WebSocket connection after the task ends
      synthesizer.getDuplexApi().close(1000, "bye");
    }
    if (audio != null) {
      // Save the audio data to a local file named "output.mp3"
      File file = new File("output.mp3");
      try (FileOutputStream fos = new FileOutputStream(file)) {
        fos.write(audio.array());
      } catch (IOException e) {
        throw new RuntimeException(e);
      }
    }

    // The first packet latency includes the time required to establish the WebSocket connection
    System.out.println(
        "[Metric] Request ID: "
            + synthesizer.getLastRequestId()
            + ", First packet latency (ms): "
            + synthesizer.getFirstPackageDelay());
  }
}

Python SDK

# coding=utf-8
# See SSML support requirements in the note above

import dashscope
from dashscope.audio.tts_v2 import *
import os

# If you have not configured an environment variable, replace the following line with: dashscope.api_key = "sk-xxx"
dashscope.api_key = os.environ.get('DASHSCOPE_API_KEY')

dashscope.base_websocket_api_url='wss://dashscope-intl.aliyuncs.com/api-ws/v1/inference'

# Model
model = "cosyvoice-v3-flash"
# Voice
voice = "longanyang"

# Instantiate SpeechSynthesizer and pass model, voice, and other request parameters to the constructor
synthesizer = SpeechSynthesizer(model=model, voice=voice)
# Non-streaming call; blocks until audio is returned
# Escape special characters
audio = synthesizer.call("<speak rate=\"2\">My speaking rate is faster than a normal person's.</speak>")

# Save the audio locally
with open('output.mp3', 'wb') as f:
  f.write(audio)

# The first packet latency includes the time required to establish the WebSocket connection
print('[Metric] Request ID: {}, First packet latency: {} ms'.format(
  synthesizer.get_last_request_id(),
  synthesizer.get_first_package_delay()))

WebSocket API

// See SSML support requirements in the note above

package main

import (
  "encoding/json"
  "fmt"
  "net/http"
  "os"
  "strings"
  "time"

  "github.com/google/uuid"
  "github.com/gorilla/websocket"
)

const (
  wsURL      = "wss://dashscope-intl.aliyuncs.com/api-ws/v1/inference/"
  outputFile = "output.mp3"
)

func main() {
  // If you have not configured an environment variable, replace the following line with: apiKey := "sk-xxx"
  apiKey := os.Getenv("DASHSCOPE_API_KEY")

  // Clear the output file
  os.Remove(outputFile)
  os.Create(outputFile)

  // Connect to WebSocket
  header := make(http.Header)
  header.Add("X-DashScope-DataInspection", "enable")
  header.Add("Authorization", fmt.Sprintf("bearer %s", apiKey))

  conn, resp, err := websocket.DefaultDialer.Dial(wsURL, header)
  if err != nil {
    if resp != nil {
      fmt.Printf("Connection failed. HTTP status code: %d\n", resp.StatusCode)
    }
    fmt.Println("Connection failed:", err)
    return
  }
  defer conn.Close()

  // Generate task ID
  taskID := uuid.New().String()
  fmt.Printf("Generated task ID: %s\n", taskID)

  // Send run-task command
  runTaskCmd := map[string]interface{}{
    "header": map[string]interface{}{
      "action":    "run-task",
      "task_id":   taskID,
      "streaming": "duplex",
    },
    "payload": map[string]interface{}{
      "task_group": "audio",
      "task":       "tts",
      "function":   "SpeechSynthesizer",
      "model":      "cosyvoice-v3-flash",
      "parameters": map[string]interface{}{
        "text_type":   "PlainText",
        "voice":       "longanyang",
        "format":      "mp3",
        "sample_rate": 22050,
        "volume":      50,
        "rate":        1,
        "pitch":       1,
        // With enable_ssml: true, send continue-task only once
        "enable_ssml": true,
      },
      "input": map[string]interface{}{},
    },
  }

  runTaskJSON, _ := json.Marshal(runTaskCmd)
  fmt.Printf("Sending run-task command: %s\n", string(runTaskJSON))

  err = conn.WriteMessage(websocket.TextMessage, runTaskJSON)
  if err != nil {
    fmt.Println("Failed to send run-task:", err)
    return
  }

  textSent := false

  // Process messages
  for {
    messageType, message, err := conn.ReadMessage()
    if err != nil {
      fmt.Println("Failed to read message:", err)
      break
    }

    // Handle binary messages
    if messageType == websocket.BinaryMessage {
      fmt.Printf("Received binary message, length: %d\n", len(message))
      file, _ := os.OpenFile(outputFile, os.O_APPEND|os.O_WRONLY|os.O_CREATE, 0644)
      file.Write(message)
      file.Close()
      continue
    }

    // Handle text messages
    messageStr := string(message)
    fmt.Printf("Received text message: %s\n", strings.ReplaceAll(messageStr, "\n", ""))

    // Parse JSON to get event type
    var msgMap map[string]interface{}
    if json.Unmarshal(message, &msgMap) == nil {
      if header, ok := msgMap["header"].(map[string]interface{}); ok {
        if event, ok := header["event"].(string); ok {
          fmt.Printf("Event type: %s\n", event)

          switch event {
          case "task-started":
            fmt.Println("=== Received task-started event ===")

            if !textSent {
              // Send continue-task command; when using SSML, you can send this command only once
              continueTaskCmd := map[string]interface{}{
                "header": map[string]interface{}{
                  "action":    "continue-task",
                  "task_id":   taskID,
                  "streaming": "duplex",
                },
                "payload": map[string]interface{}{
                  "input": map[string]interface{}{
                    // Escape special characters
                    "text": "<speak rate=\"2\">My speaking rate is faster than a normal person's.</speak>",
                  },
                },
              }

              continueTaskJSON, _ := json.Marshal(continueTaskCmd)
              fmt.Printf("Sending continue-task command: %s\n", string(continueTaskJSON))

              err = conn.WriteMessage(websocket.TextMessage, continueTaskJSON)
              if err != nil {
                fmt.Println("Failed to send continue-task:", err)
                return
              }

              textSent = true

              // Delay sending finish-task
              time.Sleep(500 * time.Millisecond)

              // Send finish-task command
              finishTaskCmd := map[string]interface{}{
                "header": map[string]interface{}{
                  "action":    "finish-task",
                  "task_id":   taskID,
                  "streaming": "duplex",
                },
                "payload": map[string]interface{}{
                  "input": map[string]interface{}{},
                },
              }

              finishTaskJSON, _ := json.Marshal(finishTaskCmd)
              fmt.Printf("Sending finish-task command: %s\n", string(finishTaskJSON))

              err = conn.WriteMessage(websocket.TextMessage, finishTaskJSON)
              if err != nil {
                fmt.Println("Failed to send finish-task:", err)
                return
              }
            }

          case "task-finished":
            fmt.Println("=== Task finished ===")
            return

          case "task-failed":
            fmt.Println("=== Task failed ===")
            if header["error_message"] != nil {
              fmt.Printf("Error message: %s\n", header["error_message"])
            }
            return

          case "result-generated":
            fmt.Println("Received result-generated event")
          }
        }
      }
    }
  }
}

using System.Net.WebSockets;
using System.Text;
using System.Text.Json;

// See SSML support requirements in the note above
class Program {
  // If you have not configured an environment variable, replace the following line with: private static readonly string ApiKey = "sk-xxx"
  private static readonly string ApiKey = Environment.GetEnvironmentVariable("DASHSCOPE_API_KEY") ?? throw new InvalidOperationException("DASHSCOPE_API_KEY environment variable is not set.");

  private const string WebSocketUrl = "wss://dashscope-intl.aliyuncs.com/api-ws/v1/inference/";
  // Output file path
  private const string OutputFilePath = "output.mp3";

  // WebSocket client
  private static ClientWebSocket _webSocket = new ClientWebSocket();
  // Cancellation token source
  private static CancellationTokenSource _cancellationTokenSource = new CancellationTokenSource();
  // Task ID
  private static string? _taskId;
  // Whether the task has started
  private static TaskCompletionSource<bool> _taskStartedTcs = new TaskCompletionSource<bool>();

  static async Task Main(string[] args) {
    try {
      // Clear the output file
      ClearOutputFile(OutputFilePath);

      // Connect to WebSocket service
      await ConnectToWebSocketAsync(WebSocketUrl);

      // Start the task to receive messages
      Task receiveTask = ReceiveMessagesAsync();

      // Send run-task command
      _taskId = GenerateTaskId();
      await SendRunTaskCommandAsync(_taskId);

      // Wait for the task-started event
      await _taskStartedTcs.Task;

      // Send the continue-task command. When using the SSML feature, this command can be sent only once.
      // Special characters need to be escaped.
      await SendContinueTaskCommandAsync("<speak rate=\"2\">My speaking rate is faster than a normal person's.</speak>");

      // Send the finish-task command
      await SendFinishTaskCommandAsync(_taskId);

      // Wait for the receive task to complete
      await receiveTask;

      Console.WriteLine("Task completed, connection closed.");
    } catch (OperationCanceledException) {
      Console.WriteLine("The task was canceled.");
    } catch (Exception ex) {
      Console.WriteLine($"An error occurred: {ex.Message}");
    } finally {
      _cancellationTokenSource.Cancel();
      _webSocket.Dispose();
    }
  }

  private static void ClearOutputFile(string filePath) {
    if (File.Exists(filePath)) {
      File.WriteAllText(filePath, string.Empty);
      Console.WriteLine("The output file has been cleared.");
    } else {
      Console.WriteLine("The output file does not exist and does not need to be cleared.");
    }
  }

  private static async Task ConnectToWebSocketAsync(string url) {
    var uri = new Uri(url);
    if (_webSocket.State == WebSocketState.Connecting || _webSocket.State == WebSocketState.Open) {
      return;
    }

    // Set headers for the WebSocket connection
    _webSocket.Options.SetRequestHeader("Authorization", $"bearer {ApiKey}");
    _webSocket.Options.SetRequestHeader("X-DashScope-DataInspection", "enable");

    try {
      await _webSocket.ConnectAsync(uri, _cancellationTokenSource.Token);
      Console.WriteLine("Successfully connected to the WebSocket service.");
    } catch (OperationCanceledException) {
      Console.WriteLine("WebSocket connection was canceled.");
    } catch (Exception ex) {
      Console.WriteLine($"WebSocket connection failed: {ex.Message}");
      throw;
    }
  }

  private static async Task SendRunTaskCommandAsync(string taskId) {
    var command = CreateCommand("run-task", taskId, "duplex", new {
      task_group = "audio",
      task = "tts",
      function = "SpeechSynthesizer",
      model = "cosyvoice-v3-flash",
      parameters = new
      {
        text_type = "PlainText",
        voice = "longanyang",
        format = "mp3",
        sample_rate = 22050,
        volume = 50,
        rate = 1,
        pitch = 1,
        // With enable_ssml: true, send continue-task only once
        enable_ssml = true
      },
      input = new { }
    });

    await SendJsonMessageAsync(command);
    Console.WriteLine("Sent run-task command.");
  }

  private static async Task SendContinueTaskCommandAsync(string text) {
    if (_taskId == null) {
      throw new InvalidOperationException("Task ID is not initialized.");
    }

    var command = CreateCommand("continue-task", _taskId, "duplex", new {
      input = new {
        text
      }
    });

    await SendJsonMessageAsync(command);
    Console.WriteLine("Sent continue-task command.");
  }

  private static async Task SendFinishTaskCommandAsync(string taskId) {
    var command = CreateCommand("finish-task", taskId, "duplex", new {
      input = new { }
    });

    await SendJsonMessageAsync(command);
    Console.WriteLine("Sent finish-task command.");
  }

  private static async Task SendJsonMessageAsync(string message) {
    var buffer = Encoding.UTF8.GetBytes(message);
    try {
      await _webSocket.SendAsync(new ArraySegment<byte>(buffer), WebSocketMessageType.Text, true, _cancellationTokenSource.Token);
    } catch (OperationCanceledException) {
      Console.WriteLine("Message sending was canceled.");
    }
  }

  private static async Task ReceiveMessagesAsync() {
    while (_webSocket.State == WebSocketState.Open) {
      var response = await ReceiveMessageAsync();
      if (response != null) {
        var eventStr = response.RootElement.GetProperty("header").GetProperty("event").GetString();
        switch (eventStr) {
          case "task-started":
            Console.WriteLine("Task started.");
            _taskStartedTcs.TrySetResult(true);
            break;
          case "task-finished":
            Console.WriteLine("Task finished.");
            _cancellationTokenSource.Cancel();
            break;
          case "task-failed":
            Console.WriteLine("Task failed: " + response.RootElement.GetProperty("header").GetProperty("error_message").GetString());
            _cancellationTokenSource.Cancel();
            break;
          default:
            // result-generated can be handled here
            break;
        }
      }
    }
  }

  private static async Task<JsonDocument?> ReceiveMessageAsync() {
    var buffer = new byte[1024 * 4];
    var segment = new ArraySegment<byte>(buffer);

    try {
      WebSocketReceiveResult result = await _webSocket.ReceiveAsync(segment, _cancellationTokenSource.Token);

      if (result.MessageType == WebSocketMessageType.Close) {
        await _webSocket.CloseAsync(WebSocketCloseStatus.NormalClosure, "Closing", _cancellationTokenSource.Token);
        return null;
      }

      if (result.MessageType == WebSocketMessageType.Binary) {
        // Handle binary data
        Console.WriteLine("Receiving binary data...");

        // Save binary data to a file
        using (var fileStream = new FileStream(OutputFilePath, FileMode.Append)) {
          fileStream.Write(buffer, 0, result.Count);
        }

        return null;
      }

      string message = Encoding.UTF8.GetString(buffer, 0, result.Count);
      return JsonDocument.Parse(message);
    } catch (OperationCanceledException) {
      Console.WriteLine("Message reception was canceled.");
      return null;
    }
  }

  private static string GenerateTaskId() {
    return Guid.NewGuid().ToString("N").Substring(0, 32);
  }

  private static string CreateCommand(string action, string taskId, string streaming, object payload) {
    var command = new {
      header = new {
        action,
        task_id = taskId,
        streaming
      },
      payload
    };

    return JsonSerializer.Serialize(command);
  }
}

The example uses this directory structure:

my-php-project/
├── composer.json
├── vendor/
└── index.php

Contents of composer.json (adjust dependency versions as needed):

{
  "require": {
    "react/event-loop": "^1.3",
    "react/socket": "^1.11",
    "react/stream": "^1.2",
    "react/http": "^1.1",
    "ratchet/pawl": "^0.4"
  },
  "autoload": {
    "psr-4": {
      "App\\": "src/"
    }
  }
}

Contents of index.php:

<!-- See SSML support requirements in the note above -->

<?php

require __DIR__ . '/vendor/autoload.php';

use Ratchet\Client\Connector;
use React\EventLoop\Loop;
use React\Socket\Connector as SocketConnector;

// If you have not configured an environment variable, replace the following line with: $api_key = "sk-xxx"
$api_key = getenv("DASHSCOPE_API_KEY");
$websocket_url = 'wss://dashscope-intl.aliyuncs.com/api-ws/v1/inference/'; // WebSocket server address
$output_file = 'output.mp3'; // Output file path

$loop = Loop::get();

if (file_exists($output_file)) {
    // Clear the file content
    file_put_contents($output_file, '');
}

// Create a custom connector
$socketConnector = new SocketConnector($loop, [
    'tcp' => [
        'bindto' => '0.0.0.0:0',
    ],
    'tls' => [
        'verify_peer' => false,
        'verify_peer_name' => false,
    ],
]);

$connector = new Connector($loop, $socketConnector);

$headers = [
    'Authorization' => 'bearer ' . $api_key,
    'X-DashScope-DataInspection' => 'enable'
];

$connector($websocket_url, [], $headers)->then(function ($conn) use ($loop, $output_file) {
    echo "Connected to WebSocket server\n";

    // Generate task ID
    $taskId = generateTaskId();

    // Send run-task command
    sendRunTaskMessage($conn, $taskId);

    // Define the function to send the continue-task command
    $sendContinueTask = function() use ($conn, $loop, $taskId) {
        // Send the continue-task command. When using the SSML feature, this command can be sent only once.
        $continueTaskMessage = json_encode([
            "header" => [
                "action" => "continue-task",
                "task_id" => $taskId,
                "streaming" => "duplex"
            ],
            "payload" => [
                "input" => [
                    // Special characters need to be escaped
                    "text" => "<speak rate=\"2\">My speaking rate is faster than a normal person's.</speak>"
                ]
            ]
        ]);
        $conn->send($continueTaskMessage);

        // Send the finish-task command
        sendFinishTaskMessage($conn, $taskId);
    };

    // Flag to check if the task-started event is received
    $taskStarted = false;

    // Listen for messages
    $conn->on('message', function($msg) use ($conn, $sendContinueTask, $loop, &$taskStarted, $taskId, $output_file) {
        if ($msg->isBinary()) {
            // Write binary data to the local file
            file_put_contents($output_file, $msg->getPayload(), FILE_APPEND);
        } else {
            // Handle non-binary messages
            $response = json_decode($msg, true);

            if (isset($response['header']['event'])) {
                handleEvent($conn, $response, $sendContinueTask, $loop, $taskId, $taskStarted);
            } else {
                echo "Unknown message format\n";
            }
        }
    });

    // Listen for connection close
    $conn->on('close', function($code = null, $reason = null) {
        echo "Connection closed\n";
        if ($code !== null) {
            echo "Close code: " . $code . "\n";
        }
        if ($reason !== null) {
            echo "Close reason: " . $reason . "\n";
        }
    });
}, function ($e) {
    echo "Could not connect: {$e->getMessage()}\n";
});

$loop->run();

/**
 * Generate task ID
 * @return string
 */
function generateTaskId(): string {
    return bin2hex(random_bytes(16));
}

/**
 * Send run-task command
 * @param $conn
 * @param $taskId
 */
function sendRunTaskMessage($conn, $taskId) {
    $runTaskMessage = json_encode([
        "header" => [
            "action" => "run-task",
            "task_id" => $taskId,
            "streaming" => "duplex"
        ],
        "payload" => [
            "task_group" => "audio",
            "task" => "tts",
            "function" => "SpeechSynthesizer",
            "model" => "cosyvoice-v3-flash",
            "parameters" => [
                "text_type" => "PlainText",
                "voice" => "longanyang",
                "format" => "mp3",
                "sample_rate" => 22050,
                "volume" => 50,
                "rate" => 1,
                "pitch" => 1,
                // With enable_ssml: true, send continue-task only once
                "enable_ssml" => true
            ],
            "input" => (object) []
        ]
    ]);
    echo "Preparing to send run-task command: " . $runTaskMessage . "\n";
    $conn->send($runTaskMessage);
    echo "run-task command sent\n";
}

/**
 * Read audio file
 * @param string $filePath
 * @return bool|string
 */
function readAudioFile(string $filePath) {
    $voiceData = file_get_contents($filePath);
    if ($voiceData === false) {
        echo "Failed to read audio file\n";
    }
    return $voiceData;
}

/**
 * Split audio data
 * @param string $data
 * @param int $chunkSize
 * @return array
 */
function splitAudioData(string $data, int $chunkSize): array {
    return str_split($data, $chunkSize);
}

/**
 * Send finish-task command
 * @param $conn
 * @param $taskId
 */
function sendFinishTaskMessage($conn, $taskId) {
    $finishTaskMessage = json_encode([
        "header" => [
            "action" => "finish-task",
            "task_id" => $taskId,
            "streaming" => "duplex"
        ],
        "payload" => [
            "input" => (object) []
        ]
    ]);
    echo "Preparing to send finish-task command: " . $finishTaskMessage . "\n";
    $conn->send($finishTaskMessage);
    echo "finish-task command sent\n";
}

/**
 * Handle events
 * @param $conn
 * @param $response
 * @param $sendContinueTask
 * @param $loop
 * @param $taskId
 * @param $taskStarted
 */
function handleEvent($conn, $response, $sendContinueTask, $loop, $taskId, &$taskStarted) {
    switch ($response['header']['event']) {
        case 'task-started':
            echo "Task started, sending continue-task command...\n";
            $taskStarted = true;
            // Send continue-task command
            $sendContinueTask();
            break;
        case 'result-generated':
            // Ignore result-generated event
            break;
        case 'task-finished':
            echo "Task finished\n";
            $conn->close();
            break;
        case 'task-failed':
            echo "Task failed\n";
            echo "Error code: " . $response['header']['error_code'] . "\n";
            echo "Error message: " . $response['header']['error_message'] . "\n";
            $conn->close();
            break;
        case 'error':
            echo "Error: " . $response['payload']['message'] . "\n";
            break;
        default:
            echo "Unknown event: " . $response['header']['event'] . "\n";
            break;
    }

    // If the task is finished, close the connection
    if ($response['header']['event'] == 'task-finished') {
        // Wait for 1 second to ensure all data is transferred
        $loop->addTimer(1, function() use ($conn) {
            $conn->close();
            echo "Client closes connection\n";
        });
    }

    // If task-started event is not received, close the connection
    if (!$taskStarted && in_array($response['header']['event'], ['task-failed', 'error'])) {
        $conn->close();
    }
}

Install the required dependencies:

npm install ws
npm install uuid

Example code:

// See SSML support requirements in the note above

import fs from 'fs';
import WebSocket from 'ws';
import { v4 as uuid } from 'uuid'; // Used to generate UUIDs

// If you have not configured an environment variable, replace the following line with: const apiKey = "sk-xxx"
const apiKey = process.env.DASHSCOPE_API_KEY;
const url = 'wss://dashscope-intl.aliyuncs.com/api-ws/v1/inference/';
// Output file path
const outputFilePath = 'output.mp3';

// Clear the output file
fs.writeFileSync(outputFilePath, '');

// Create a WebSocket client
const ws = new WebSocket(url, {
  headers: {
    Authorization: `bearer ${apiKey}`,
    'X-DashScope-DataInspection': 'enable'
  }
});

let taskStarted = false;
let taskId = uuid();

ws.on('open', () => {
  console.log('Connected to WebSocket server');

  // Send the run-task command
  const runTaskMessage = JSON.stringify({
    header: {
      action: 'run-task',
      task_id: taskId,
      streaming: 'duplex'
    },
    payload: {
      task_group: 'audio',
      task: 'tts',
      function: 'SpeechSynthesizer',
      model: 'cosyvoice-v3-flash',
      parameters: {
        text_type: 'PlainText',
        voice: 'longanyang', // Voice
        format: 'mp3', // Audio format
        sample_rate: 22050, // Sample rate
        volume: 50, // Volume
        rate: 1, // Speech rate
        pitch: 1, // Pitch
        enable_ssml: true // Whether to enable the SSML feature. If enable_ssml is set to true, you can send the continue-task command only once. Otherwise, the error "Text request limit violated, expected 1." is reported.
      },
      input: {}
    }
  });
  ws.send(runTaskMessage);
  console.log('Sent run-task message');
});

const fileStream = fs.createWriteStream(outputFilePath, { flags: 'a' });
ws.on('message', (data, isBinary) => {
  if (isBinary) {
    // Write binary data to the file
    fileStream.write(data);
  } else {
    const message = JSON.parse(data);

    switch (message.header.event) {
      case 'task-started':
        taskStarted = true;
        console.log('Task has started');
        // Send continue-task command
        sendContinueTasks(ws);
        break;
      case 'task-finished':
        console.log('Task has finished');
        ws.close();
        fileStream.end(() => {
          console.log('File stream has been closed');
        });
        break;
      case 'task-failed':
        console.error('Task failed: ', message.header.error_message);
        ws.close();
        fileStream.end(() => {
          console.log('File stream has been closed');
        });
        break;
      default:
        // You can handle result-generated here
        break;
    }
  }
});

function sendContinueTasks(ws) {
  
  if (taskStarted) {
    // Send the continue-task command. When using the SSML feature, this command can be sent only once.
    const continueTaskMessage = JSON.stringify({
      header: {
        action: 'continue-task',
        task_id: taskId,
        streaming: 'duplex'
      },
      payload: {
        input: {
          // Special characters need to be escaped
          text: '<speak rate="2">My speaking rate is faster than a normal person\'s.</speak>'
        }
      }
    });
    ws.send(continueTaskMessage);
    
    // Send the finish-task command
    const finishTaskMessage = JSON.stringify({
      header: {
        action: 'finish-task',
        task_id: taskId,
        streaming: 'duplex'
      },
      payload: {
        input: {}
      }
    });
    ws.send(finishTaskMessage);
  }
}

ws.on('close', () => {
  console.log('Disconnected from the WebSocket server');
});

For Java, we recommend using the Java SDK instead.The following Java WebSocket example requires these dependencies:

Java-WebSocket
jackson-databind

Use Maven or Gradle to manage dependencies:

<dependencies>
  <!-- WebSocket Client -->
  <dependency>
    <groupId>org.java-websocket</groupId>
    <artifactId>Java-WebSocket</artifactId>
    <version>1.5.3</version>
  </dependency>

  <!-- JSON Processing -->
  <dependency>
    <groupId>com.fasterxml.jackson.core</groupId>
    <artifactId>jackson-databind</artifactId>
    <version>2.13.0</version>
  </dependency>
</dependencies>

Java code:

import com.fasterxml.jackson.databind.ObjectMapper;

import org.java_websocket.client.WebSocketClient;
import org.java_websocket.handshake.ServerHandshake;

import java.io.FileOutputStream;
import java.io.IOException;
import java.net.URI;
import java.nio.ByteBuffer;
import java.util.*;

/**
 * SSML feature notes:
 *     1. When sending the run-task command, set the enable_ssml parameter to true to enable SSML support.
 *     2. Send the text that contains SSML by using the continue-task command. You can send this command only once.
 *     3. SSML is supported only for cloned voices from cosyvoice-v3-flash and cosyvoice-v3-plus models, and system voices marked as SSML-supported in the voice list (for example, the longanyang voice for cosyvoice-v3-flash).
 */
public class TTSWebSocketClient extends WebSocketClient {
    private final String taskId = UUID.randomUUID().toString();
    private final String outputFile = "output_" + System.currentTimeMillis() + ".mp3";
    private boolean taskFinished = false;

    public TTSWebSocketClient(URI serverUri, Map<String, String> headers) {
        super(serverUri, headers);
    }

    @Override
    public void onOpen(ServerHandshake serverHandshake) {
        System.out.println("Connection successful");

        // Send run-task command
        // If enable_ssml is set to true, you can send the continue-task command only once.
        // Otherwise, you will get the error "Text request limit violated, expected 1."
        String runTaskCommand = "{ \"header\": { \"action\": \"run-task\", \"task_id\": \"" + taskId + "\", \"streaming\": \"duplex\" }, \"payload\": { \"task_group\": \"audio\", \"task\": \"tts\", \"function\": \"SpeechSynthesizer\", \"model\": \"cosyvoice-v3-flash\", \"parameters\": { \"text_type\": \"PlainText\", \"voice\": \"longanyang\", \"format\": \"mp3\", \"sample_rate\": 22050, \"volume\": 50, \"rate\": 1, \"pitch\": 1, \"enable_ssml\": true }, \"input\": {} }}";
        send(runTaskCommand);
    }

    @Override
    public void onMessage(String message) {
        System.out.println("Received message from server: " + message);
        try {
            // Parse JSON message
            Map<String, Object> messageMap = new ObjectMapper().readValue(message, Map.class);

            if (messageMap.containsKey("header")) {
                Map<String, Object> header = (Map<String, Object>) messageMap.get("header");

                if (header.containsKey("event")) {
                    String event = (String) header.get("event");

                    if ("task-started".equals(event)) {
                        System.out.println("Received task-started event from server");

                        // Send the continue-task command. When using the SSML feature, this command can be sent only once.
                        // Special characters need to be escaped.
                        sendContinueTask("<speak rate=\\\"2\\\">My speaking rate is faster than a normal person's.</speak>");

                        // Send the finish-task command
                        sendFinishTask();
                    } else if ("task-finished".equals(event)) {
                        System.out.println("Received task-finished event from server");
                        taskFinished = true;
                        closeConnection();
                    } else if ("task-failed".equals(event)) {
                        System.out.println("Task failed: " + message);
                        closeConnection();
                    }
                }
            }
        } catch (Exception e) {
            System.err.println("An exception occurred: " + e.getMessage());
        }
    }

    @Override
    public void onMessage(ByteBuffer message) {
        System.out.println("Size of received binary audio data: " + message.remaining());

        try (FileOutputStream fos = new FileOutputStream(outputFile, true)) {
            byte[] buffer = new byte[message.remaining()];
            message.get(buffer);
            fos.write(buffer);
            System.out.println("Audio data has been written to the local file " + outputFile);
        } catch (IOException e) {
            System.err.println("Failed to write audio data to local file: " + e.getMessage());
        }
    }

    @Override
    public void onClose(int code, String reason, boolean remote) {
        System.out.println("Connection closed: " + reason + " (" + code + ")");
    }

    @Override
    public void onError(Exception ex) {
        System.err.println("Error: " + ex.getMessage());
        ex.printStackTrace();
    }

    private void sendContinueTask(String text) {
        String command = "{ \"header\": { \"action\": \"continue-task\", \"task_id\": \"" + taskId + "\", \"streaming\": \"duplex\" }, \"payload\": { \"input\": { \"text\": \"" + text + "\" } }}";
        send(command);
    }

    private void sendFinishTask() {
        String command = "{ \"header\": { \"action\": \"finish-task\", \"task_id\": \"" + taskId + "\", \"streaming\": \"duplex\" }, \"payload\": { \"input\": {} }}";
        send(command);
    }

    private void closeConnection() {
        if (!isClosed()) {
            close();
        }
    }

    public static void main(String[] args) {
        try {
            // If you have not configured an environment variable, replace the following line with: String apiKey = "sk-xxx"
            String apiKey = System.getenv("DASHSCOPE_API_KEY");
            if (apiKey == null || apiKey.isEmpty()) {
                System.err.println("Please set the DASHSCOPE_API_KEY environment variable");
                return;
            }

            Map<String, String> headers = new HashMap<>();
            headers.put("Authorization", "bearer " + apiKey);
            TTSWebSocketClient client = new TTSWebSocketClient(new URI("wss://dashscope-intl.aliyuncs.com/api-ws/v1/inference/"), headers);

            client.connect();

            while (!client.isClosed() && !client.taskFinished) {
                Thread.sleep(1000);
            }
        } catch (Exception e) {
            System.err.println("Failed to connect to WebSocket service: " + e.getMessage());
            e.printStackTrace();
        }
    }
}

For Python, we recommend using the Python SDK instead.Install dependencies before running the example:

pip uninstall websocket-client
pip uninstall websocket
pip install websocket-client

Do not name your Python file websocket.py. This causes AttributeError: module 'websocket' has no attribute 'WebSocketApp'.

# SSML feature notes:
#     1. When sending the run-task command, set the enable_ssml parameter to true to enable SSML support.
#     2. Send the text that contains SSML by using the continue-task command. You can send this command only once.
#     3. SSML is supported only for cloned voices from cosyvoice-v3-flash and cosyvoice-v3-plus models, and system voices marked as SSML-supported in the voice list (for example, the longanyang voice for cosyvoice-v3-flash).

import websocket
import json
import uuid
import os
import time


class TTSClient:
  def __init__(self, api_key, uri):
    """
  Initializes the TTSClient instance.

  Parameters:
    api_key (str): The API key for authentication.
    uri (str): The WebSocket service address.
  """
    self.api_key = api_key  # Replace with your API key.
    self.uri = uri  # Replace with your WebSocket address.
    self.task_id = str(uuid.uuid4())  # Generate a unique task ID.
    self.output_file = f"output_{int(time.time())}.mp3"  # Output audio file path.
    self.ws = None  # WebSocketApp instance.
    self.task_started = False  # Whether task-started is received.
    self.task_finished = False  # Whether task-finished/task-failed is received.

  def on_open(self, ws):
    """
  Callback function when the WebSocket connection is established.
  Sends the run-task command to start the speech synthesis task.
  """
    print("WebSocket connection established")

    # Construct the run-task command.
    run_task_cmd = {
      "header": {
        "action": "run-task",
        "task_id": self.task_id,
        "streaming": "duplex"
      },
      "payload": {
        "task_group": "audio",
        "task": "tts",
        "function": "SpeechSynthesizer",
        "model": "cosyvoice-v3-flash",
        "parameters": {
          "text_type": "PlainText",
          "voice": "longanyang",
          "format": "mp3",
          "sample_rate": 22050,
          "volume": 50,
          "rate": 1,
          "pitch": 1,
          # With enable_ssml: true, send continue-task only once
          "enable_ssml": True
        },
        "input": {}
      }
    }

    # Send the run-task command.
    ws.send(json.dumps(run_task_cmd))
    print("run-task command sent")

  def on_message(self, ws, message):
    """
  Callback function when a message is received.
  Handles text and binary messages separately.
  """
    if isinstance(message, str):
      # Process JSON text messages.
      try:
        msg_json = json.loads(message)
        print(f"Received JSON message: {msg_json}")

        if "header" in msg_json:
          header = msg_json["header"]

          if "event" in header:
            event = header["event"]

            if event == "task-started":
              print("Task started")
              self.task_started = True

              # Send the continue-task command. When using the SSML feature, this command can be sent only once.
              # Special characters need to be escaped.
              self.send_continue_task("<speak rate=\"2\">My speaking rate is faster than a normal person's.</speak>")

              # Send finish-task after continue-task is sent.
              self.send_finish_task()

            elif event == "task-finished":
              print("Task finished")
              self.task_finished = True
              self.close(ws)

            elif event == "task-failed":
              error_msg = msg_json.get("error_message", "Unknown error")
              print(f"Task failed: {error_msg}")
              self.task_finished = True
              self.close(ws)

      except json.JSONDecodeError as e:
        print(f"JSON parsing failed: {e}")
    else:
      # Process binary messages (audio data).
      print(f"Received binary message, size: {len(message)} bytes")
      with open(self.output_file, "ab") as f:
        f.write(message)
      print(f"Audio data has been written to the local file {self.output_file}")

  def on_error(self, ws, error):
    """Callback on error."""
    print(f"WebSocket error: {error}")

  def on_close(self, ws, close_status_code, close_msg):
    """Callback on close."""
    print(f"WebSocket closed: {close_msg} ({close_status_code})")

  def send_continue_task(self, text):
    """Sends the continue-task command with the text to be synthesized."""
    cmd = {
      "header": {
        "action": "continue-task",
        "task_id": self.task_id,
        "streaming": "duplex"
      },
      "payload": {
        "input": {
          "text": text
        }
      }
    }

    self.ws.send(json.dumps(cmd))
    print(f"Sent continue-task command, text content: {text}")

  def send_finish_task(self):
    """Sends the finish-task command to end the speech synthesis task."""
    cmd = {
      "header": {
        "action": "finish-task",
        "task_id": self.task_id,
        "streaming": "duplex"
      },
      "payload": {
        "input": {}
      }
    }

    self.ws.send(json.dumps(cmd))
    print("Sent finish-task command")

  def close(self, ws):
    """Actively closes the connection."""
    if ws and ws.sock and ws.sock.connected:
      ws.close()
      print("Connection actively closed")

  def run(self):
    """Starts the WebSocket client."""
    # Set request headers (authentication).
    header = {
      "Authorization": f"bearer {self.api_key}",
      "X-DashScope-DataInspection": "enable"
    }

    # Create a WebSocketApp instance.
    self.ws = websocket.WebSocketApp(
      self.uri,
      header=header,
      on_open=self.on_open,
      on_message=self.on_message,
      on_error=self.on_error,
      on_close=self.on_close
    )

    print("Listening for WebSocket messages...")
    self.ws.run_forever()  # Start the persistent connection listener.


# Example usage
if __name__ == "__main__":
  # If you have not configured an environment variable, replace the following line with: API_KEY = "sk-xxx"
  API_KEY = os.environ.get("DASHSCOPE_API_KEY")
  SERVER_URI = "wss://dashscope-intl.aliyuncs.com/api-ws/v1/inference/"

  client = TTSClient(API_KEY, SERVER_URI)
  client.run()

Tags

CosyVoice SSML is based on W3C SSML 1.0 but supports only a subset of tags.Syntax rules:

Wrap all SSML content in <speak></speak> tags.
Use multiple <speak> tags consecutively, but do not nest them.
Escape XML special characters: " → ", ' → ', & → &, < → <, > → >.

`<speak>`: Root tag

Description Wrap all SSML content in <speak></speak> tags. Syntax

<speak>Text that requires SSML features</speak>

Properties

Property	Type	Required	Description
voice	String	No	Voice name. Overrides the `voice` API parameter. See Voice list.
rate	String	No	Speech rate. Overrides the `speech_rate` API parameter. Range: 0.5 to 2. Default: 1. Values above 1 are faster; below 1 are slower.
pitch	String	No	Pitch. Overrides the `pitch_rate` API parameter. Range: 0.5 to 2. Default: 1. Values above 1 are higher; below 1 are lower.
volume	String	No	Volume. Overrides the `volume` API parameter. Range: 0 to 100. Default: 50.
effect	String	No	Sound effect. Values: `robot`, `lolita` (lively female voice), `lowpass`, `echo`, `eq` (equalizer, advanced), `lpfilter` (low-pass filter, advanced), `hpfilter` (high-pass filter, advanced). Use `effectValue` to customize `eq`, `lpfilter`, and `hpfilter`. Only one effect per tag. Sound effects increase latency.
effectValue	String	No	Customizes the `effect`. For `eq`: a string of 8 space-separated integers (-20 to 20) for gain at `["40 Hz", "100 Hz", "200 Hz", "400 Hz", "800 Hz", "1600 Hz", "4000 Hz", "12000 Hz"]`. Example: `"1 1 1 1 1 1 1 1"`. For `lpfilter`: integer frequency in (0, sample_rate/2]. Example: `"800"`. For `hpfilter`: integer frequency in (0, sample_rate/2]. Example: `"1200"`.
bgm	String	No	Background music URL. The file must be in OSS with at least public-read permissions. Escape XML special characters in the URL. Requirements: 16 kHz sample rate, mono, WAV, 16-bit. If the synthesized audio is longer than the music, the music loops.
backgroundMusicVolume	String	No	Background music volume.

Examples Voice:

<speak voice="longcheng_v2">
  I am a male voice.
</speak>

Rate:

<speak rate="2">
  My speech rate is faster than normal.
</speak>

Pitch:

<speak pitch="0.5">
  However, my pitch is lower than others.
</speak>

Volume:

<speak volume="80">
  My volume is also very high.
</speak>

Effect:

<speak effect="robot">
  Do you like the robot WALL-E?
</speak>

Effect with effectValue:

<speak effect="eq" effectValue="1 -20 1 1 1 1 20 1">
  Do you like the robot WALL-E?
</speak>

<speak effect="lpfilter" effectValue="1200">
  Do you like the robot WALL-E?
</speak>

<speak effect="hpfilter" effectValue="1200">
  Do you like the robot WALL-E?
</speak>

If the audio is not in WAV format, convert it with ffmpeg:

ffmpeg -i input_audio -acodec pcm_s16le -ac 1 -ar 16000 output.wav

Background music (bgm):

<speak bgm="http://nls.alicdn.com/bgm/2.wav" backgroundMusicVolume="30" rate="-500" volume="40">
  <break time="2s"/>
  The old trees on the shady cliff are shrouded in mist
  <break time="700ms"/>
  The sound of rain is still in the bamboo forest
  <break time="700ms"/>
  I know that cotton contributes to the country's plan
  <break time="700ms"/>
  The scenery of Mianzhou is always pitiable
  <break time="2s"/>
</speak>

You are legally responsible for the copyright of the uploaded audio.

Combined properties:

<speak>
  Text that requires SSML tags
</speak>

<speak rate="200" pitch="-100" volume="80">
  So when put together, my voice sounds like this.
</speak>

`<break>`: Pause

Description Insert a pause. Set the duration in seconds (s) or milliseconds (ms). Syntax

# Empty attribute
<break/>
# With the time attribute
<break time="string"/>

Break tag behavior:

Without attributes, <break/> defaults to a 1-second pause.
Warning: Consecutive <break> tags are summed, but the total is capped at 10 seconds.

For example, these three tags total 15 seconds, but only the first 10 seconds take effect:

<speak>
  Please close your eyes and take a rest.<break time="5s"/><break time="5s"/><break time="5s"/>Okay, please open your eyes.
</speak>

Properties

Property	Type	Required	Description
time	String	No	Pause duration, such as `"2s"` or `"50ms"`. In seconds: 1 to 10. In milliseconds: 50 to 10000.

Example

<speak>
  Please close your eyes and take a rest.<break time="500ms"/>Okay, please open your eyes.
</speak>

`<sub>`: Replace text

Description Replace displayed text with a different pronunciation. Syntax

<sub alias="string"></sub>

Properties

Property	Type	Required	Description
alias	String	Yes	The text to read instead.

Example

<speak>
   <sub alias="network protocol">W3C</sub>
 </speak>

`<phoneme>`: Set pronunciation

Description Specify pronunciation using Pinyin (Chinese) or the CMU phonetic alphabet (English). Syntax

<phoneme alphabet="string" ph="string">text</phoneme>

Properties

Property	Type	Required	Description
alphabet	String	Yes	Pronunciation type: `"py"` (Pinyin) or `"cmu"` (phonetic alphabet). See The CMU Pronouncing Dictionary.
ph	String	Yes	The Pinyin or phonetic symbols. Separate each character's Pinyin with a space. The number of syllables must match the number of characters. Each syllable has a tone number (1 to 5, where 5 is neutral).

Example

<speak>
  去<phoneme alphabet="py" ph="dian3 dang4 hang2">典当行</phoneme>把这个玩意<phoneme alphabet="py" ph="dang4 diao4">当掉</phoneme>
</speak>

<speak>
  How to spell <phoneme alphabet="cmu" ph="S AY N">sin</phoneme>?
</speak>

`<soundEvent>`: Insert a sound effect

Description Insert an external sound file (prompt tones, ambient sounds) into synthesized speech. Syntax

<soundEvent src="URL"/>

Properties

Property	Type	Required	Description
src	String	Yes	Audio URL. The file must be in OSS with at least public-read permissions. Escape XML special characters in the URL. Requirements: 16 kHz sample rate, mono, WAV, 16-bit, max 2 MB.

If the audio is not in WAV format, convert it with ffmpeg:

ffmpeg -i input_audio -acodec pcm_s16le -ac 1 -ar 16000 output.wav

You are legally responsible for the copyright of the uploaded audio.

Example

<speak>
  A horse was frightened<soundEvent src="http://nls.alicdn.com/sound-event/horse-neigh.wav"/>and people scattered to avoid it.
</speak>

`<say-as>`: Set reading format

Description Specify how text is read (as numbers, dates, phone numbers, etc.). Syntax

<say-as interpret-as="string">text</say-as>

Properties

Property	Type	Required	Description
interpret-as	String	Yes	Text type. Values: `cardinal` (number), `digits` (individual digits), `telephone` (phone number), `name`, `address`, `id` (account name/nickname), `characters` (character by character), `punctuation`, `date`, `time`, `currency`, `measure` (unit of measure).

cardinal

Supported formats for cardinal:

Format	Example	English output	Description
Number string	145	one hundred forty five	Integer range: up to 13 digits, [-999999999999, 999999999999]. Decimal: up to 13-digit integer part, up to 10-digit decimal part.
Number string starting with zero	0145	one hundred forty five
Negative sign + number string	-145	minus hundred forty five
Three-digit number string separated by commas	60,000	sixty thousand
Negative sign + three-digit number string separated by commas	-208,000	minus two hundred eight thousand
Number string + decimal point + zero	12.00	twelve
Number string + decimal point + number string	12.34	twelve point three four
Three-digit number string separated by commas + decimal point + number string	1,000.1	one thousand point one
Negative sign + number string + decimal point + number string	-12.34	minus twelve point three four
Negative sign + three-digit number string separated by commas + decimal point + number string	-1,000.1	minus one thousand point one
(Three-digit comma-separated) number string + hyphen + (three-digit comma-separated) number	1-1,000	one to one thousand
Other default readings	012.34	twelve point three four
	1/2	one half
	-3/4	minus three quarters
	5.1/6	five point one over six
	-3 1/2	minus three and a half
	1,000.3^3	one thousand point three to the power of three
	3e9.1	three times ten to the power of nine point one
	23.10%	twenty three point one percent

Example

<speak>
  <say-as interpret-as="cardinal">12345</say-as>
</speak>

<speak>
  <say-as interpret-as="cardinal">10234</say-as>
</speak>

digits

Supported formats for digits:

Format	Example	English output	Description
Number string	12034	one two zero three four	No strict length limit, but keep under 20 characters.
Number string + space or hyphen + number string + ...	1-23-456 7890	one, two three, four five six, seven eight nine zero

Example

<speak>
  <say-as interpret-as="digits">12345</say-as>
</speak>

<speak>
  <say-as interpret-as="digits">10234</say-as>
</speak>

telephone

Supported formats for telephone:

Format	Example	English output	Description
Number string	12034	one two oh three four	No strict length limit, but keep under 20 characters.
Number string + space or hyphen + number string + ...	1-23-456 7890	one, two three, four five six, seven eight nine oh
Plus sign + number string + space or hyphen + number string	+43-211-0567	plus four three, two one one, oh five six seven
Left parenthesis + number string + right parenthesis + space + number string + space or hyphen + number string	(21) 654-3210	(two one) six five four, three two one oh

Example

<speak>
  <say-as interpret-as="telephone">12345</say-as>
</speak>

<speak>
  <say-as interpret-as="telephone">10234</say-as>
</speak>

name

Example

<speak>
  Her former name is <say-as interpret-as="name">Zeng Xiaofan</say-as>
</speak>

address

Not supported for English text.

Example

<speak>
  <say-as interpret-as="address">Fulu International, Building 1, Unit 3, Room 304</say-as>
</speak>

id

For English text, this works the same as characters.

Example

<speak>
  <say-as interpret-as="id">myid_1998</say-as>
</speak>

characters

Supported formats for characters:

Format	Example	English output	Description
string	*b+3$.c-0'=α	asterisk B plus three dollar dot C dash zero apostrophe equals alpha	Supports Chinese characters, English letters, digits 0-9, and common symbols.

Example

<speak>
  <say-as interpret-as="characters">Greek letters αβ</say-as>
</speak>

<speak>
  <say-as interpret-as="characters">*b+3.c$=α</say-as>
</speak>

punctuation

For English text, this works the same as characters.

Example

<speak>
  <say-as interpret-as="punctuation"> -./:;</say-as>
</speak>

date

Supported formats for date:

Format	Example	English output	Description
Four digits/two digits or four digits-two digits	2000/01	two thousand, oh one	Year spans.
	1900-01	nineteen hundred, oh one
	2001-02	twenty oh one, oh two
	2019-20	twenty nineteen, twenty
	1998-99	nineteen ninety eight, ninety nine
	1999-00	nineteen ninety nine, oh oh
Four-digit number starting with 1 or 2	2000	two thousand	Four-digit year.
	1900	nineteen hundred
	1905	nineteen oh five
	2021	twenty twenty one
Day of the week-Day of the week or Day of the week~Day of the week or Day of the week&Day of the week	mon-wed	monday to wednesday	Escape XML special characters in range separators.
	tue~fri	tuesday to friday
	sat&sun	saturday and sunday
DD-DD MMM, YYYY or DD~DD MMM, YYYY or DD&DD MMM, YYYY	19-20 Jan, 2000	the nineteen to the twentieth of january two thousand	DD = two-digit day. MMM = month abbreviation or full name. YYYY = four-digit year.
	01 ~ 10 Jul, 2020	the first to the tenth of july twenty twenty
	05&06 Apr, 2009	the fifth and the sixth of april two thousand nine
MMM DD-DD or MMM DD~DD or MMM DD&DD	Feb 01 - 03	february the first to the third	MMM = month. DD = day.
	Aug 10-20	august the tenth to the twentieth
	Dec 11&12	december the eleventh and the twelfth
MMM-MMM or MMM~MMM or MMM&MMM	Jan-Jun	january to june	MMM = month.
	Jul - Dec	july to december
	sep&oct	september and october
YYYY-YYYY or YYYY~YYYY	1990 - 2000	nineteen ninety to two thousand	YYYY = four-digit year starting with 1 or 2.
	2001-2021	two thousand one to twenty twenty one
WWW DD MMM YYYY	Sun 20 Nov 2011	sunday the twentieth of november twenty eleven	WWW = day of week (abbreviation or full). DD = day. MMM = month. YYYY = year.
WWW DD MMM	Sun 20 Nov	sunday the twentieth of november
WWW MMM DD YYYY	Sun Nov 20 2011	sunday november the twentieth twenty eleven
WWW MMM DD	Sun Nov 20	sunday november the twentieth
WWW YYYY-MM-DD	Sat 2010-10-01	saturday october the first twenty ten
WWW YYYY/MM/DD	Sat 2010/10/01	saturday october the first twenty ten
WWW MM/DD/YYYY	Sun 11/20/2011	sunday november the twentieth twenty eleven
MM/DD/YYYY	11/20/2011	november the twentieth twenty eleven
YYYY	1998	nineteen ninety eight
Other default readings	10 Mar, 2001	the tenth of march two thousand one
	10 Mar	the tenth of march
	Mar 2001	march two thousand one
	Fri. 10/Mar/2001	friday the tenth of march two thousand one
	Mar 10th, 2001	march the tenth two thousand one
	Mar 10	march the tenth
	2001/03/10	march the tenth two thousand one
	2001-03-10	march the tenth two thousand one
	2000s	two thousands
	2010's	twenty tens
	1900's	nineteen hundreds
	1990s	nineteen nineties

Example

<speak>
  <say-as interpret-as="date">1000-10-10</say-as>
</speak>

<speak>
  <say-as interpret-as="date">10-01-2020</say-as>
</speak>

time

Supported formats for time:

Format	Example	English output	Description
HH:MM AM or PM	09:00 AM	nine A M	HH = hour (1-2 digits). MM = minute (2 digits). AM/PM = morning or afternoon.
	09:03 PM	nine oh three P M
	09:13 p.m.	nine thirteen p m
HH:MM	21:00	twenty one hundred
HHMM	100	one oclock
Time point-Time point	8:00 am - 05:30 pm	eight a m to five p m	Time range formats.
	7:05~10:15 AM	seven oh five to ten fifteen A M
	09:00-13:00	nine oclock to thirteen hundred

Example

<speak>
  <say-as interpret-as="time">5:00am</say-as>
</speak>

<speak>
  <say-as interpret-as="time">0500</say-as>
</speak>

currency

Supported formats for currency:

Format	Example	English output	Description
Number + Currency identifier	1.00 RMB	one yuan	Supports integers, decimals, and comma-separated thousands.
	2.02 CNY	two point zero two yuan
	1,000.23 CN¥	one thousand point two three yuan
	1.01 SGD	one singapore dollar and one cent
	2.01 CAD	two canadian dollars and one cent
	3.1 HKD	three hong kong dollars and ten cents
	1,000.00 EUR	one thousand euros
Currency identifier + Number	US$ 1.00	one US dollar	Supports integers, decimals, and comma-separated thousands.
	$0.01	one cent
	JPY 1.01	one japanese yen and one sen
	£1.1	one pound and ten pence
	€2.01	two euros and one cent
	USD 1,000	one thousand united states dollars
Number + Quantifier + Currency identifier or Currency identifier + Number + Quantifier	1.23 Tn RMB	one point two three trillion yuan	Quantifiers: thousand, million, billion, trillion, Mil, mil, K, k, Bn, bn, Tn, tn.
	$1.2 K	one point two thousand dollars

Example

<speak>
  <say-as interpret-as="currency">13,000,000.00RMB</say-as>
</speak>

<speak>
  <say-as interpret-as="currency">$1,000.01</say-as>
</speak>

measure

Supported formats for measure:

Format	Example	English output	Description
Number + Unit of measurement	1.0 kg	one kilogram	Supports integers, decimals, and comma-separated thousands. Supports common unit abbreviations.
	1,234.01 km	one thousand two hundred thirty-four point zero one kilometers
Unit of measurement	mm2	square millimeter

Example

<speak>
  <say-as interpret-as="measure">100m12cm6mm</say-as>
</speak>

<speak>
  <say-as interpret-as="measure">1,000.01kg</say-as>
</speak>

Symbol pronunciations

Common symbol pronunciations for <say-as>:

Symbol	English pronunciation
!	exclamation mark
"	double quote
#	pound
$	dollar
%	percent
&	and
'	left quote
(	left parenthesis
)	right parenthesis
*	asterisk
+	plus
,	comma
-	dash
.	dot
/	slash
:	colon
;	semicolon
<	less than
=	equals
>	greater than
?	question mark
@	at
[	left bracket
\	backslash
]	right bracket
^	caret
_	underscore
`	backtick
`\{`	left brace
\|	vertical bar
`\}`	right brace
~	tilde

Full-width and special symbols:

Symbol	English pronunciation
！	exclamation mark
\u201c	left double quote
\u201d	right double quote
\u2018	left quote
\u2019	right quote
（	left parenthesis
）	right parenthesis
，	comma
。	full stop
—	em dash
：	colon
；	semicolon
？	question mark
、	enumeration comma
…	ellipsis
……	ellipsis
《	left guillemet
》	right guillemet
￥	yuan
≥	greater than or equal to
≤	less than or equal to
≠	not equal
≈	approximately equal
±	plus or minus
×	times
π	pi

Greek letters (uppercase):

Symbol	English pronunciation
Α	alpha
Β	beta
Γ	gamma
Δ	delta
Ε	epsilon
Ζ	zeta
Θ	theta
Ι	iota
Κ	kappa
∧	lambda
Μ	mu
Ν	nu
Ξ	ksi
Ο	omicron
∏	pi
Ρ	rho
∑	sigma
Τ	tau
Υ	upsilon
Φ	phi
Χ	chi
Ψ	psi
Ω	omega

Greek letters (lowercase):

Symbol	English pronunciation
α	alpha
β	beta
γ	gamma
δ	delta
ε	epsilon
ζ	zeta
η	eta
θ	theta
ι	iota
κ	kappa
λ	lambda
μ	mu
ν	nu
ξ	ksi
ο	omicron
π	pi
ρ	rho
σ	sigma
τ	tau
υ	upsilon
φ	phi
χ	chi
ψ	psi
ω	omega

Common units of measurement

Common units for <say-as>:

Category	Units
Length	nm (nanometer), μm (micrometer), mm (millimeter), cm (centimeter), m (meter), km (kilometer), ft (foot), in (inch)
Area	cm² (square centimeter), m² (square meter), km² (square kilometer), SqFt (square foot)
Volume	cm³ (cubic centimeter), m³ (cubic meter), km3 (cubic kilometer), mL (milliliter), L (liter), gal (gallon)
Weight	μg (microgram), mg (milligram), g (gram), kg (kilogram)
Time	min (minute), sec (second), ms (millisecond)
Electromagnetism	μA (microamp), mA (milliamp), Hz (hertz), kHz (kilohertz), MHz (megahertz), GHz (gigahertz), V (volt), kV (kilovolt), kWh (kilowatt hour)
Sound	dB (decibel)
Atmospheric pressure	Pa (pascal), kPa (kilopascal), MPa (megapascal)
Other	Also supports units like tsp (teaspoon), rpm (revolutions per minute), KB (kilobyte), mmHg (millimetre of mercury), and more.

​Limitations

​Getting started

​Java SDK

​Python SDK

​WebSocket API

​Tags

​<speak>: Root tag

​<break>: Pause

​<sub>: Replace text

​<phoneme>: Set pronunciation

​<soundEvent>: Insert a sound effect

​<say-as>: Set reading format

​cardinal

​digits

​telephone

​name

​address

​id

​characters

​punctuation

​date

​time

​currency

​measure

​Symbol pronunciations

​Common units of measurement

Limitations

Getting started

Java SDK

Python SDK

WebSocket API

Tags

`<speak>`: Root tag

`<break>`: Pause

`<sub>`: Replace text

`<phoneme>`: Set pronunciation

`<soundEvent>`: Insert a sound effect

`<say-as>`: Set reading format

cardinal

digits

telephone

name

address

id

characters

punctuation

date

time

currency

measure

Symbol pronunciations

Common units of measurement