Monday, August 26, 2013

Why Speech Recognition Falls Short

It's Not the Way the Brain Does it

Current speech recognition technology, while impressive, falls short of delivering on the promise of human-like performance. The biggest problem is that speech recognizers are sensitive to noise which makes them pretty much useless if there are several voices speaking at the same time. The reason, of course, is that they do not work like the human brain. We humans have no trouble listening to a friend in a noisy restaurant because, unlike speech recognizers, we have the ability to focus our attention on one voice at a time and we can change our focus in an instant, if we wish. The human brain can also easily adapt to a given situation. A new voice may have an unfamiliar foreign accent but the brain can quickly learn its peculiarities and do a good job at recognizing what is being said.

A New Approach, Rebel Speech

The main reason that current technology falls short is that speech recognizers, unlike the brain, do not learn to recognize speech. They are hand-programmed. In other words, their knowledge (phones, diphones, senones, syllables, words and other speech patterns) is painstakingly compiled and coded by a programmer. This approach, while effective to an extent, is forever doomed to be incomplete. There are important subtleties in speech sounds that can only be detected via direct learning. If we are to make any significant progress in computer speech recognition, then learning and paying attention are key capabilities that we must incorporate into our future recognizers. My hope is that its autonomous ability to learn and to focus its attention on a given voice is what will set Rebel Speech apart from the rest.

See Also:

The Holy Grail of Robotics
Goal Oriented Motor Learning
Raiders of the Holy Grail
Secrets of the Holy Grail
The Myth of the Bayesian Brain

No comments: