How does Text to Speech (TTS) work?
TTS engines generally work in three steps: normalization, phonetization and speech synthesis. In the normalization phase, special expressions like abbreviations, email addresses or URLs are written out in full. In the phonetization phase, the correct phonetic transcription of a word is then looked up from a pronunciation dictionary (i.e., glossary). For words absent from the dictionary, the pronunciation is automatically computed using spelling-to-pronunciation rules. Homographs like ‘record’ or ‘lives’ are disambiguated by their part of speech (verb vs. noun). Finally, the speech synthesis phase uses the phonetic information and prosodic cues (pitch, duration) to produce the actual audio. Odiogo’s linguistic team keeps improving the normalization and phonetization processes.