How does SSML work?
We show you how to use SSML to customize your voices
Here we would like to explain what you can do with Speech Synthesis Markup Language (SSML). With SSML it is possible to customize the generated language. For example, you can specify details about pauses and audio formatting for acronyms, dates, times, abbreviations or text to be censored. To demonstrate this in an example, open VoiceOverMaker and the audio editor:
The <break>
element
There you enter the following text as shown in the screenshot:
This is a pause <break time="3s"/> and now I'll continue.
As you can see here, the break element inserts a break of 3 seconds. It would also be possible to insert a pause with SSML in milliseconds, e.g. 500ms.
Normally, the `
The <say-as>
element
Use this element to specify information about the type of text construction contained in the element. This also allows you to determine the level of detail of the representation of the text contained in the element.
The <say-as>
element has the required interpret-as attribute, which determines the pronunciation of the value. Depending on the value in interpret-as, you can use the optional attributes format and detail.
The following example is spoken as an integer:
<say-as interpret-as="cardinal">12345</say-as>
The following example is spoken as "First":
<say-as interpret-as="ordinal">1</say-as>
The following example is spoken as "C A N" (English):
<say-as interpret-as="characters">can</say-as>
In the following example, a beep is emitted as for censoring:
<say-as interpret-as="expletive">censor this</say-as>
Adjusts units to the number when distinguishing between singular or plural. The following example is spoken as "10 feet":
<say-as interpret-as="unit">20 foot</say-as>
The following example is spoken letter by letter (in English)
<say-as interpret-as="verbatim">abcdefg</say-as>
The following example is spoken as "The tenth of September, nineteen sixty":
<say-as interpret-as="date" format="yyyymmdd" detail="1"> 1960-09-10 </say-as>
The following example is spoken as "The tenth of September":
<say-as interpret-as="date" format="dm">10-9</say-as>
The following example is spoken as "Two thirty P.M.":
<say-as interpret-as="time" format="hms12">2:30pm</say-as>
These were examples of how numbers can be pronounced differently. The following options are available as parameters for the attribute 'interpret-as':
cardinal
ordinal
characters
fraction
expletive / bleep
unit
verbatim / spell-out
date
time
telephone
The <audio>
element
Supports the insertion of recorded audio files and other audio formats in conjunction with synthesized voice output.
Attribute:
src
clipBegin
clipEnd
speed
repeatCount
repeatDur
soundLevel
The paragraph <p>,<s>
elements
Example:
<p><s>This is sentence one.</s><s>This is sentence two.</s></p>
If you want a voice break to be long enough for you to hear it, use <s></s>
tags and insert the appropriate pause between sentences.
The alias <sub>
element
<sub alias="World Wide Web Consortium">W3C</sub>
Specifies that the contained text is replaced by the text in the attribute value "alias" when pronounced.
The <prosody>
element
This adjusts the pitch, speaking rate and volume for the text in the element. The attributes rate, pitch and volume are currently supported.
The <emphasis>
element
This is used to emphasize the text of the element or remove the emphasis. With the element <emphasis>
you change the language similar to <prosody>
, but without having to specify individual language attributes.
The level attribute can have the following values:
strong
moderate
none
reduced
This was an excerpt of the most common SSML elements. Try it out now with VoiceOverMaker.