Important Notice: Our web hosting provider recently started charging us for additional visits, which was unexpected. In response, we're seeking donations. Depending on the situation, we may explore different monetization options for our Community and Expert Contributors. It's crucial to provide more returns for their expertise and offer more Expert Validated Answers or AI Validated Answers. Learn more about our hosting issue here.

How does Text to Speech (TTS) work?

April 26, 2017speech text TTS

0

Posted

How does Text to Speech (TTS) work?

1 Answer

0

Posted

TTS engines generally work in three steps: normalization, phonetization and speech synthesis. In the normalization phase, special expressions like abbreviations, email addresses or URLs are written out in full. In the phonetization phase, the correct phonetic transcription of a word is then looked up from a pronunciation dictionary (i.e., glossary). For words absent from the dictionary, the pronunciation is automatically computed using spelling-to-pronunciation rules. Homographs like ‘record’ or ‘lives’ are disambiguated by their part of speech (verb vs. noun). Finally, the speech synthesis phase uses the phonetic information and prosodic cues (pitch, duration) to produce the actual audio. Odiogo’s linguistic team keeps improving the normalization and phonetization processes.