1990s: Automatic Speech Recognition Comes to the Masses
In the '90s, computers with faster processors finally arrived, and speech recognition software became viable for ordinary people.
In 1990, Dragon launched the first consumer speech recognition product, Dragon Dictate, for an incredible price of $9000. Seven years later, the much-improved Dragon NaturallySpeaking arrived. The application recognized continuous speech, so you could speak, well, naturally, at about 100 words per minute. However, you had to train the program for 45 minutes, and it was still expensive at $695.
The advent of the first voice portal, VAL from BellSouth, was in 1996; VAL was a dial-in interactive voice recognition system that was supposed to give you information based on what you said on the phone. VAL paved the way for all the inaccurate voice-activated menus that would plague callers for the next 15 years and beyond.
2000s: Speech Recognition Plateaus--Until Google Comes Along
By 2001, computer speech recognition had topped out at 80 percent accuracy, and, near the end of the decade, the technology's progress seemed to be stalled. Recognition systems did well when the language universe was limited--but they were still "guessing," with the assistance of statistical models, among similar-sounding words, and the known language universe continued to grow as the Internet grew.
Did you know speech recognition and voice commands were built into Windows Vista and Mac OS X? Many computer users weren't aware that those features existed. Windows Speech Recognition and OS X's voice commands were interesting, but not as accurate or as easy to use as a plain old keyboard and mouse.
Speech recognition technology development began to edge back into the forefront with one major event: the arrival of the Google Voice Search app for the iPhone. The impact of Google's app is significant for two reasons. First, cell phones and other mobile devices are ideal vehicles for speech recognition, as the desire to replace their tiny on-screen keyboards serves as an incentive to develop better, alternative input methods. Second, Google had the ability to offload the processing for its app to its cloud data centers, harnessing all that computing power to perform the large-scale data analysis necessary to make matches between the user's words and the enormous number of human-speech examples it gathered.
In short, the bottleneck with speech recognition has always been the availability of data, and the ability to process it efficiently. Google's app adds, to its analysis, the data from billions of search queries, to better predict what you're probably saying.
In 2010, Google added "personalized recognition" to Voice Search on Android phones, so that the software could record users' voice searches and produce a more accurate speech model. The company also added Voice Search to its Chrome browser in mid-2011. Remember how we started with 10 to 100 words, and then graduated to a few thousand? Google's English Voice Search system now incorporates 230 billion words from actual user queries.
And now along comes Siri. Like Google's Voice Search, Siri relies on cloud-based processing. It draws what it knows about you to generate a contextual reply, and it responds to your voice input with personality. (As my PCWorld colleague David Daw points out: "It's not just fun but funny. When you ask Siri the meaning of life, it tells you '42' or 'All evidence to date points to chocolate.' If you tell it you want to hide a body, it helpfully volunteers nearby dumps and metal foundries.")
Speech recognition has gone from utility to entertainment. The child seems all grown up.
The Future: Accurate, Ubiquitous Speech
The explosion of voice recognition apps indicates that speech recognition's time has come, and that you can expect plenty more apps in the future. These apps will not only let you control your PC by voice or convert voice to text--they'll also support multiple languages, offer assorted speaker voices for you to choose from, and integrate into every part of your mobile devices (that is, they'll overcome Siri's shortcomings).
The quality of speech recognition apps will improve, too. For instance, Sensory's Trulyhandsfree Voice Control can hear and understand you, even in noisy environments.
As everyone starts becoming more comfortable speaking aloud to their mobile gadgets, speech recognition technology will likely spill over into other types of devices. It isn't hard to imagine a near future when we'll be commanding our coffee makers, talking to our printers, and telling the lights to turn themselves off.