Skip to main content
CosyVoice

CosyVoice server-side events

CosyVoice real-time speech synthesis WebSocket server event reference

User guide: For model introduction and selection recommendations, see Speech synthesis.

task-started

After the client sends the run-task command, the server returns a task-started event to signal that the task has started. The client can send subsequent commands only after receiving this event.
Example
{
  "header": {
    "task_id": "2bf83b9a-baeb-4fda-8d9a-xxxxxxxxxxxx",
    "event": "task-started",
    "attributes": {}
  },
  "payload": {}
}
FieldTypeDescription
header.task_idstringThe task ID generated by the client.
header.eventstringEvent type. Fixed value: task-started.
payloadobjectEmpty object.

result-generated

After the client sends text, the server continuously returns result-generated events. Each event carries sentence-level metadata.
  • sentence-begin
  • sentence-synthesis
  • sentence-end
{
  "header": {
    "task_id": "3f2d5c86-0550-45c0-801f-xxxxxxxxxx",
    "event": "result-generated",
    "attributes": {}
  },
  "payload": {
    "output": {
      "sentence": {
        "index": 0,
        "words": []
      },
      "type": "sentence-begin",
      "original_text": "Before my bed, moonlight shines bright,"
    }
  }
}
FieldTypeDescription
header.task_idstringThe task ID generated by the client.
header.eventstringEvent type. Fixed value: result-generated.
payload.output.typestringSub-event type. Valid values: sentence-begin (sentence start, returns the text to be synthesized), sentence-synthesis (marks an audio frame, one audio frame is transmitted over the WebSocket binary channel immediately after each event), sentence-end (sentence end, returns the text content and cumulative character count).
payload.output.sentence.indexintegerSentence index, starting from 0.
payload.output.sentence.wordsarrayWord-level timestamp array.
payload.output.sentence.words[].textstringText content of the word.
payload.output.sentence.words[].begin_indexintegerStart character index of the word within the sentence. Starts at 0.
payload.output.sentence.words[].end_indexintegerEnd character index of the word within the sentence. Starts at 1.
payload.output.sentence.words[].begin_timeintegerStart time of the word's corresponding audio, in milliseconds.
payload.output.sentence.words[].end_timeintegerEnd time of the word's corresponding audio, in milliseconds.
payload.output.original_textstringText of the sentence as segmented for synthesis.
payload.usage.charactersintegerCumulative number of billed characters (returned in the sentence-end event).

task-finished

The server returns a task-finished event when the task completes. The client can then close the WebSocket connection or reuse it to start a new task.
Example
{
  "header": {
    "task_id": "2bf83b9a-baeb-4fda-8d9a-xxxxxxxxxxxx",
    "event": "task-finished",
    "attributes": {
      "request_uuid": "0a9dba9e-d3a6-45a4-be6d-xxxxxxxxxxxx"
    }
  },
  "payload": {
    "usage": {
      "characters": 13
    }
  }
}
FieldTypeDescription
header.task_idstringThe task ID generated by the client.
header.eventstringEvent type. Fixed value: task-finished.
payload.usage.charactersintegerCumulative number of billed characters.

task-failed

The server returns a task-failed event when the task fails. On receiving this event, the client must close the WebSocket connection and handle the error.
Example
{
  "header": {
    "task_id": "2bf83b9a-baeb-4fda-8d9a-xxxxxxxxxxxx",
    "event": "task-failed",
    "error_code": "InvalidParameter",
    "error_message": "[tts:]Engine return error code: 418",
    "attributes": {}
  },
  "payload": {}
}
FieldTypeDescription
header.task_idstringThe task ID generated by the client.
header.eventstringEvent type. Fixed value: task-failed.
header.error_codestringError code.
header.error_messagestringDetailed error message.