Rate, pitch, pauses, volume
SSML (Speech Synthesis Markup Language) is an XML markup language that controls speech rate, pitch, pauses, volume, and background music in CosyVoice.
Before you run the code:
Description
Wrap all SSML content in
Properties
Examples
Voice:
Rate:
Pitch:
Volume:
Effect:
Effect with effectValue:
If the audio is not in WAV format, convert it with
Background music (bgm):
Combined properties:
Description
Insert a pause. Set the duration in seconds (s) or milliseconds (ms).
Syntax
Properties
Example
Description
Replace displayed text with a different pronunciation.
Syntax
Properties
Example
Description
Specify pronunciation using Pinyin (Chinese) or the CMU phonetic alphabet (English).
Syntax
Properties
Example
Description
Insert an external sound file (prompt tones, ambient sounds) into synthesized speech.
Syntax
Properties
If the audio is not in WAV format, convert it with
Example
Description
Specify how text is read (as numbers, dates, phone numbers, etc.).
Syntax
Properties
Supported formats for
Example
Supported formats for
Example
Supported formats for
Example
Example
Example
Example
Supported formats for
Example
Example
Supported formats for
Example
Supported formats for
Example
Supported formats for
Example
Supported formats for
Example
Common symbol pronunciations for
Full-width and special symbols:
Greek letters (uppercase):
Greek letters (lowercase):
Common units for
Limitations
- Models: cosyvoice-v3-flash, cosyvoice-v3-plus.
- Voices: Cloned voices and system voices marked as SSML-enabled in the Voice list.
- APIs:
- Java SDK (2.20.3+): Non-streaming and unidirectional streaming only. See the Java SDK docs.
- Python SDK (1.23.4+): Non-streaming and unidirectional streaming only. See the Python SDK docs.
- WebSocket API: Set
enable_ssmltotruein run-task and send continue-task only once. See the WebSocket API docs.
Getting started
Check the Limitations section for supported models, voices, and APIs before using SSML.
- Get an API key
- Install the SDK (for Java/Python examples)
Java SDK
Python SDK
WebSocket API
- Go
- C#
- PHP
- Node.js
- Java (WebSocket)
- Python (WebSocket)
Tags
CosyVoice SSML is based on W3C SSML 1.0 but supports only a subset of tags.Syntax rules:
- Wrap all SSML content in
<speak></speak>tags. - Use multiple
<speak>tags consecutively, but do not nest them. - Escape XML special characters:
"→",'→',&→&,<→<,>→>.
<speak>: Root tag
Description
Wrap all SSML content in <speak></speak> tags.
Syntax
| Property | Type | Required | Description |
|---|---|---|---|
| voice | String | No | Voice name. Overrides the voice API parameter. See Voice list. |
| rate | String | No | Speech rate. Overrides the speech_rate API parameter. Range: 0.5 to 2. Default: 1. Values above 1 are faster; below 1 are slower. |
| pitch | String | No | Pitch. Overrides the pitch_rate API parameter. Range: 0.5 to 2. Default: 1. Values above 1 are higher; below 1 are lower. |
| volume | String | No | Volume. Overrides the volume API parameter. Range: 0 to 100. Default: 50. |
| effect | String | No | Sound effect. Values: robot, lolita (lively female voice), lowpass, echo, eq (equalizer, advanced), lpfilter (low-pass filter, advanced), hpfilter (high-pass filter, advanced). Use effectValue to customize eq, lpfilter, and hpfilter. Only one effect per tag. Sound effects increase latency. |
| effectValue | String | No | Customizes the effect. For eq: a string of 8 space-separated integers (-20 to 20) for gain at ["40 Hz", "100 Hz", "200 Hz", "400 Hz", "800 Hz", "1600 Hz", "4000 Hz", "12000 Hz"]. Example: "1 1 1 1 1 1 1 1". For lpfilter: integer frequency in (0, sample_rate/2]. Example: "800". For hpfilter: integer frequency in (0, sample_rate/2]. Example: "1200". |
| bgm | String | No | Background music URL. The file must be in OSS with at least public-read permissions. Escape XML special characters in the URL. Requirements: 16 kHz sample rate, mono, WAV, 16-bit. If the synthesized audio is longer than the music, the music loops. |
| backgroundMusicVolume | String | No | Background music volume. |
ffmpeg:
You are legally responsible for the copyright of the uploaded audio.
<break>: Pause
Description
Insert a pause. Set the duration in seconds (s) or milliseconds (ms).
Syntax
Break tag behavior:
- Without attributes,
<break/>defaults to a 1-second pause. - Warning: Consecutive
<break>tags are summed, but the total is capped at 10 seconds.
| Property | Type | Required | Description |
|---|---|---|---|
| time | String | No | Pause duration, such as "2s" or "50ms". In seconds: 1 to 10. In milliseconds: 50 to 10000. |
<sub>: Replace text
Description
Replace displayed text with a different pronunciation.
Syntax
| Property | Type | Required | Description |
|---|---|---|---|
| alias | String | Yes | The text to read instead. |
<phoneme>: Set pronunciation
Description
Specify pronunciation using Pinyin (Chinese) or the CMU phonetic alphabet (English).
Syntax
| Property | Type | Required | Description |
|---|---|---|---|
| alphabet | String | Yes | Pronunciation type: "py" (Pinyin) or "cmu" (phonetic alphabet). See The CMU Pronouncing Dictionary. |
| ph | String | Yes | The Pinyin or phonetic symbols. Separate each character's Pinyin with a space. The number of syllables must match the number of characters. Each syllable has a tone number (1 to 5, where 5 is neutral). |
<soundEvent>: Insert a sound effect
Description
Insert an external sound file (prompt tones, ambient sounds) into synthesized speech.
Syntax
| Property | Type | Required | Description |
|---|---|---|---|
| src | String | Yes | Audio URL. The file must be in OSS with at least public-read permissions. Escape XML special characters in the URL. Requirements: 16 kHz sample rate, mono, WAV, 16-bit, max 2 MB. |
ffmpeg:
You are legally responsible for the copyright of the uploaded audio.
<say-as>: Set reading format
Description
Specify how text is read (as numbers, dates, phone numbers, etc.).
Syntax
| Property | Type | Required | Description |
|---|---|---|---|
| interpret-as | String | Yes | Text type. Values: cardinal (number), digits (individual digits), telephone (phone number), name, address, id (account name/nickname), characters (character by character), punctuation, date, time, currency, measure (unit of measure). |
cardinal
Supported formats for cardinal:
| Format | Example | English output | Description |
|---|---|---|---|
| Number string | 145 | one hundred forty five | Integer range: up to 13 digits, [-999999999999, 999999999999]. Decimal: up to 13-digit integer part, up to 10-digit decimal part. |
| Number string starting with zero | 0145 | one hundred forty five | |
| Negative sign + number string | -145 | minus hundred forty five | |
| Three-digit number string separated by commas | 60,000 | sixty thousand | |
| Negative sign + three-digit number string separated by commas | -208,000 | minus two hundred eight thousand | |
| Number string + decimal point + zero | 12.00 | twelve | |
| Number string + decimal point + number string | 12.34 | twelve point three four | |
| Three-digit number string separated by commas + decimal point + number string | 1,000.1 | one thousand point one | |
| Negative sign + number string + decimal point + number string | -12.34 | minus twelve point three four | |
| Negative sign + three-digit number string separated by commas + decimal point + number string | -1,000.1 | minus one thousand point one | |
| (Three-digit comma-separated) number string + hyphen + (three-digit comma-separated) number | 1-1,000 | one to one thousand | |
| Other default readings | 012.34 | twelve point three four | |
| 1/2 | one half | ||
| -3/4 | minus three quarters | ||
| 5.1/6 | five point one over six | ||
| -3 1/2 | minus three and a half | ||
| 1,000.3^3 | one thousand point three to the power of three | ||
| 3e9.1 | three times ten to the power of nine point one | ||
| 23.10% | twenty three point one percent |
digits
Supported formats for digits:
| Format | Example | English output | Description |
|---|---|---|---|
| Number string | 12034 | one two zero three four | No strict length limit, but keep under 20 characters. |
| Number string + space or hyphen + number string + ... | 1-23-456 7890 | one, two three, four five six, seven eight nine zero |
telephone
Supported formats for telephone:
| Format | Example | English output | Description |
|---|---|---|---|
| Number string | 12034 | one two oh three four | No strict length limit, but keep under 20 characters. |
| Number string + space or hyphen + number string + ... | 1-23-456 7890 | one, two three, four five six, seven eight nine oh | |
| Plus sign + number string + space or hyphen + number string | +43-211-0567 | plus four three, two one one, oh five six seven | |
| Left parenthesis + number string + right parenthesis + space + number string + space or hyphen + number string | (21) 654-3210 | (two one) six five four, three two one oh |
name
Example
address
Not supported for English text.
id
For English text, this works the same as
characters.characters
Supported formats for characters:
| Format | Example | English output | Description |
|---|---|---|---|
| string | *b+3$.c-0'=α | asterisk B plus three dollar dot C dash zero apostrophe equals alpha | Supports Chinese characters, English letters, digits 0-9, and common symbols. |
punctuation
For English text, this works the same as
characters.date
Supported formats for date:
| Format | Example | English output | Description |
|---|---|---|---|
| Four digits/two digits or four digits-two digits | 2000/01 | two thousand, oh one | Year spans. |
| 1900-01 | nineteen hundred, oh one | ||
| 2001-02 | twenty oh one, oh two | ||
| 2019-20 | twenty nineteen, twenty | ||
| 1998-99 | nineteen ninety eight, ninety nine | ||
| 1999-00 | nineteen ninety nine, oh oh | ||
| Four-digit number starting with 1 or 2 | 2000 | two thousand | Four-digit year. |
| 1900 | nineteen hundred | ||
| 1905 | nineteen oh five | ||
| 2021 | twenty twenty one | ||
| Day of the week-Day of the week or Day of the week~Day of the week or Day of the week&Day of the week | mon-wed | monday to wednesday | Escape XML special characters in range separators. |
| tue~fri | tuesday to friday | ||
| sat&sun | saturday and sunday | ||
| DD-DD MMM, YYYY or DD~DD MMM, YYYY or DD&DD MMM, YYYY | 19-20 Jan, 2000 | the nineteen to the twentieth of january two thousand | DD = two-digit day. MMM = month abbreviation or full name. YYYY = four-digit year. |
| 01 ~ 10 Jul, 2020 | the first to the tenth of july twenty twenty | ||
| 05&06 Apr, 2009 | the fifth and the sixth of april two thousand nine | ||
| MMM DD-DD or MMM DD~DD or MMM DD&DD | Feb 01 - 03 | february the first to the third | MMM = month. DD = day. |
| Aug 10-20 | august the tenth to the twentieth | ||
| Dec 11&12 | december the eleventh and the twelfth | ||
| MMM-MMM or MMM~MMM or MMM&MMM | Jan-Jun | january to june | MMM = month. |
| Jul - Dec | july to december | ||
| sep&oct | september and october | ||
| YYYY-YYYY or YYYY~YYYY | 1990 - 2000 | nineteen ninety to two thousand | YYYY = four-digit year starting with 1 or 2. |
| 2001-2021 | two thousand one to twenty twenty one | ||
| WWW DD MMM YYYY | Sun 20 Nov 2011 | sunday the twentieth of november twenty eleven | WWW = day of week (abbreviation or full). DD = day. MMM = month. YYYY = year. |
| WWW DD MMM | Sun 20 Nov | sunday the twentieth of november | |
| WWW MMM DD YYYY | Sun Nov 20 2011 | sunday november the twentieth twenty eleven | |
| WWW MMM DD | Sun Nov 20 | sunday november the twentieth | |
| WWW YYYY-MM-DD | Sat 2010-10-01 | saturday october the first twenty ten | |
| WWW YYYY/MM/DD | Sat 2010/10/01 | saturday october the first twenty ten | |
| WWW MM/DD/YYYY | Sun 11/20/2011 | sunday november the twentieth twenty eleven | |
| MM/DD/YYYY | 11/20/2011 | november the twentieth twenty eleven | |
| YYYY | 1998 | nineteen ninety eight | |
| Other default readings | 10 Mar, 2001 | the tenth of march two thousand one | |
| 10 Mar | the tenth of march | ||
| Mar 2001 | march two thousand one | ||
| Fri. 10/Mar/2001 | friday the tenth of march two thousand one | ||
| Mar 10th, 2001 | march the tenth two thousand one | ||
| Mar 10 | march the tenth | ||
| 2001/03/10 | march the tenth two thousand one | ||
| 2001-03-10 | march the tenth two thousand one | ||
| 2000s | two thousands | ||
| 2010's | twenty tens | ||
| 1900's | nineteen hundreds | ||
| 1990s | nineteen nineties |
time
Supported formats for time:
| Format | Example | English output | Description |
|---|---|---|---|
| HH:MM AM or PM | 09:00 AM | nine A M | HH = hour (1-2 digits). MM = minute (2 digits). AM/PM = morning or afternoon. |
| 09:03 PM | nine oh three P M | ||
| 09:13 p.m. | nine thirteen p m | ||
| HH:MM | 21:00 | twenty one hundred | |
| HHMM | 100 | one oclock | |
| Time point-Time point | 8:00 am - 05:30 pm | eight a m to five p m | Time range formats. |
| 7:05~10:15 AM | seven oh five to ten fifteen A M | ||
| 09:00-13:00 | nine oclock to thirteen hundred |
currency
Supported formats for currency:
| Format | Example | English output | Description |
|---|---|---|---|
| Number + Currency identifier | 1.00 RMB | one yuan | Supports integers, decimals, and comma-separated thousands. |
| 2.02 CNY | two point zero two yuan | ||
| 1,000.23 CN¥ | one thousand point two three yuan | ||
| 1.01 SGD | one singapore dollar and one cent | ||
| 2.01 CAD | two canadian dollars and one cent | ||
| 3.1 HKD | three hong kong dollars and ten cents | ||
| 1,000.00 EUR | one thousand euros | ||
| Currency identifier + Number | US$ 1.00 | one US dollar | Supports integers, decimals, and comma-separated thousands. |
| $0.01 | one cent | ||
| JPY 1.01 | one japanese yen and one sen | ||
| £1.1 | one pound and ten pence | ||
| €2.01 | two euros and one cent | ||
| USD 1,000 | one thousand united states dollars | ||
| Number + Quantifier + Currency identifier or Currency identifier + Number + Quantifier | 1.23 Tn RMB | one point two three trillion yuan | Quantifiers: thousand, million, billion, trillion, Mil, mil, K, k, Bn, bn, Tn, tn. |
| $1.2 K | one point two thousand dollars |
measure
Supported formats for measure:
| Format | Example | English output | Description |
|---|---|---|---|
| Number + Unit of measurement | 1.0 kg | one kilogram | Supports integers, decimals, and comma-separated thousands. Supports common unit abbreviations. |
| 1,234.01 km | one thousand two hundred thirty-four point zero one kilometers | ||
| Unit of measurement | mm2 | square millimeter |
Symbol pronunciations
Common symbol pronunciations for <say-as>:
| Symbol | English pronunciation |
|---|---|
| ! | exclamation mark |
| " | double quote |
| # | pound |
| $ | dollar |
| % | percent |
| & | and |
| ' | left quote |
| ( | left parenthesis |
| ) | right parenthesis |
| * | asterisk |
| + | plus |
| , | comma |
| - | dash |
| . | dot |
| / | slash |
| : | colon |
| ; | semicolon |
| < | less than |
| = | equals |
| > | greater than |
| ? | question mark |
| @ | at |
| [ | left bracket |
| \ | backslash |
| ] | right bracket |
| ^ | caret |
| _ | underscore |
| ` | backtick |
\{ | left brace |
| | | vertical bar |
\} | right brace |
| ~ | tilde |
| Symbol | English pronunciation |
|---|---|
| ! | exclamation mark |
| \u201c | left double quote |
| \u201d | right double quote |
| \u2018 | left quote |
| \u2019 | right quote |
| ( | left parenthesis |
| ) | right parenthesis |
| , | comma |
| 。 | full stop |
| — | em dash |
| : | colon |
| ; | semicolon |
| ? | question mark |
| 、 | enumeration comma |
| … | ellipsis |
| …… | ellipsis |
| 《 | left guillemet |
| 》 | right guillemet |
| ¥ | yuan |
| ≥ | greater than or equal to |
| ≤ | less than or equal to |
| ≠ | not equal |
| ≈ | approximately equal |
| ± | plus or minus |
| × | times |
| π | pi |
| Symbol | English pronunciation |
|---|---|
| Α | alpha |
| Β | beta |
| Γ | gamma |
| Δ | delta |
| Ε | epsilon |
| Ζ | zeta |
| Θ | theta |
| Ι | iota |
| Κ | kappa |
| ∧ | lambda |
| Μ | mu |
| Ν | nu |
| Ξ | ksi |
| Ο | omicron |
| ∏ | pi |
| Ρ | rho |
| ∑ | sigma |
| Τ | tau |
| Υ | upsilon |
| Φ | phi |
| Χ | chi |
| Ψ | psi |
| Ω | omega |
| Symbol | English pronunciation |
|---|---|
| α | alpha |
| β | beta |
| γ | gamma |
| δ | delta |
| ε | epsilon |
| ζ | zeta |
| η | eta |
| θ | theta |
| ι | iota |
| κ | kappa |
| λ | lambda |
| μ | mu |
| ν | nu |
| ξ | ksi |
| ο | omicron |
| π | pi |
| ρ | rho |
| σ | sigma |
| τ | tau |
| υ | upsilon |
| φ | phi |
| χ | chi |
| ψ | psi |
| ω | omega |
Common units of measurement
Common units for <say-as>:
| Category | Units |
|---|---|
| Length | nm (nanometer), μm (micrometer), mm (millimeter), cm (centimeter), m (meter), km (kilometer), ft (foot), in (inch) |
| Area | cm² (square centimeter), m² (square meter), km² (square kilometer), SqFt (square foot) |
| Volume | cm³ (cubic centimeter), m³ (cubic meter), km3 (cubic kilometer), mL (milliliter), L (liter), gal (gallon) |
| Weight | μg (microgram), mg (milligram), g (gram), kg (kilogram) |
| Time | min (minute), sec (second), ms (millisecond) |
| Electromagnetism | μA (microamp), mA (milliamp), Hz (hertz), kHz (kilohertz), MHz (megahertz), GHz (gigahertz), V (volt), kV (kilovolt), kWh (kilowatt hour) |
| Sound | dB (decibel) |
| Atmospheric pressure | Pa (pascal), kPa (kilopascal), MPa (megapascal) |
| Other | Also supports units like tsp (teaspoon), rpm (revolutions per minute), KB (kilobyte), mmHg (millimetre of mercury), and more. |