1990s: Automatic Speech Recognition Comes to the Masses
In 1990, Dragon launched the first consumer speech recognition product, Dragon Dictate, for an incredible price of $9000. Seven years later, the much-improved Dragon NaturallySpeaking arrived. The application recognized continuous speech, so you could speak, well, naturally, at about 100 words per minute. However, you had to train the program for 45 minutes, and it was still expensive at $695.
The advent of the first voice portal, VAL from BellSouth, was in 1996; VAL was a dial-in interactive voice recognition system that was supposed to give you information based on what you said on the phone. VAL paved the way for all the inaccurate voice-activated menus that would plague callers for the next 15 years and beyond.
2000s: Speech Recognition Plateaus--Until Google Comes Along
By 2001, computer speech recognition had topped out at 80 percent accuracy, and, near the end of the decade, the technology's progress seemed to be stalled. Recognition systems did well when the language universe was limited--but they were still "guessing," with the assistance of statistical models, among similar-sounding words, and the known language universe continued to grow as the Internet grew.
Did you know speech recognition and voice commands were built into Windows Vista and Mac OS X? Many computer users weren't aware that those features existed. Windows Speech Recognition and OS X's voice commands were interesting, but not as accurate or as easy to use as a plain old keyboard and mouse.
In short, the bottleneck with speech recognition has always been the availability of data, and the ability to process it efficiently. Google's app adds, to its analysis, the data from billions of search queries, to better predict what you're probably saying.
In 2010, Google added "personalized recognition" to Voice Search on Android phones, so that the software could record users' voice searches and produce a more accurate speech model. The company also added Voice Search to its Chrome browser in mid-2011. Remember how we started with 10 to 100 words, and then graduated to a few thousand? Google's English Voice Search system now incorporates 230 billion words from actual user queries.
Speech recognition has gone from utility to entertainment. The child seems all grown up.
The Future: Accurate, Ubiquitous Speech
The explosion of voice recognition apps indicates that speech recognition's time has come, and that you can expect plenty more apps in the future. These apps will not only let you control your PC by voice or convert voice to text--they'll also support multiple languages, offer assorted speaker voices for you to choose from, and integrate into every part of your mobile devices (that is, they'll overcome Siri's shortcomings).
The quality of speech recognition apps will improve, too. For instance, Sensory's Trulyhandsfree Voice Control can hear and understand you, even in noisy environments.
As everyone starts becoming more comfortable speaking aloud to their mobile gadgets, speech recognition technology will likely spill over into other types of devices. It isn't hard to imagine a near future when we'll be commanding our coffee makers, talking to our printers, and telling the lights to turn themselves off.
Follow Melanie Pinola (@melaniepinola) and Today@PCWorld on Twitter.