CosyVoice real-time speech synthesis WebSocket client event reference
User guide: For model introduction and selection recommendations, see Speech synthesis.
Description: Starts a speech synthesis task and configures the model, voice, sample rate, and other parameters.
When to send: Immediately after the WebSocket connection is established.
Response event: The server returns a task-started event. Wait for this event before sending subsequent commands.
Description: Sends the text to synthesize. The text can be sent all at once or in multiple segments.
When to send: After receiving the task-started event from the server.
Limits:
Description: Notifies the server that all text has been sent and requests task completion.
When to send: Immediately after all text has been sent.
Response event: The server returns a task-finished event.
run-task
Description: Starts a speech synthesis task and configures the model, voice, sample rate, and other parameters.
When to send: Immediately after the WebSocket connection is established.
Response event: The server returns a task-started event. Wait for this event before sending subsequent commands.
Example
object
body
required
Message header.
object
body
required
Request body.
continue-task
Description: Sends the text to synthesize. The text can be sent all at once or in multiple segments.
When to send: After receiving the task-started event from the server.
Limits:
- Maximum of 20,000 characters per message
- Maximum of 200,000 characters cumulatively
- The send interval must not exceed 23 seconds; otherwise, the connection times out.
Example
object
body
required
Message header.
object
body
required
Request body.
finish-task
Description: Notifies the server that all text has been sent and requests task completion.
When to send: Immediately after all text has been sent.
Response event: The server returns a task-finished event.
Example
object
body
required
Message header.
object
body
required
Request body.