What is Automatic Speech Recognition?
Automatic Speech Recognition (ASR) is technology that allows a computer to identify the words that a person speaks into a microphone or telephone. The “holy grail” of ASR research is to allow a computer to recognize in real-time with 100% accuracy all words that are intelligibly spoken by any person, independent of vocabulary size, noise, speaker characteristics and accent, or channel conditions. Despite several decades of research in this area, accuracy greater than 90% is only attained when the task is constrained in some way. Depending on how the task is constrained, different levels of performance can be attained; for example, recognition of continuous digits over a microphone channel (small vocabulary, no noise) can be greater than 99%. If the system is trained to learn an individual speaker’s voice, then much larger vocabularies are possible, although accuracy drops to somewhere between 90% and 95% for commercially-available systems. For large-vocabulary speech recognition of dif