If you ask Siri, the virtual personal assistant on the iPhone 4S, why it’s so great, it answers with disarming humility: “I am what I am.”
But industry insiders say there’s a little more to it than that. Siri goes well beyond voice recognition, they say, by applying powerful artificial intelligence and statistical analysis to decipher the meaning behind questioners’ sometimes jumbled sentences. Add to that Siri’s dry wit and you have the kind of breakout hit that will propel new uses of similar technology on your phone, tablet, and even your PC, experts say.
This is Siri’s moment because the complex technologies it uses are finally ready for consumers. When you ask Siri to find a nearby restaurant, Siri doesn’t just use speech recognition to deal with the request; it sends the question to the cloud, where a powerful artificial intelligence algorithm can analyze the wording, figure out what exactly you want, and send the answer back to your phone. Then Siri dutifully follows through and searches for, say, a nearby Mexican restaurant. Even three years ago, this kind of cloud-based analysis wasn’t possible. We’re just starting to explore what we can do with the technology.
More Than Just Speech Recognition
Services like Siri are “natural language processing” apps that use statistical models to figure out what you probably meant to say when your pronunciation or word choice is garbled. Natural language programs can tell, for instance, that a sentence that sounds like “I like two sailboats around eBay” is probably “I like to sail boats around the bay.”
This technology has been around for years. Every time you’ve “talked” to your bank’s robotic bill paying system, you’ve been using natural language processing (though at many banks, the language processing has been pretty bad in the past).
Android phones have used cloud-based language processing for years. But Google’s Voice Actions app, for instance, requires you to use a limited set of commands such as “listen to…” or “note to self…” to initiate a communication.
Siri uses a combination of artificial intelligence and its continually growing knowledge of you to understand not only what you say, but what you mean. As a result, you can ask for things in many different ways. Because Siri is tied into your iPhone 4S, it knows where you are and who you contact most often. That context helps it understand what you mean when you say “Find me a cab near here” or “Call my mother.” Siri doesn’t respond only to “Call Mark Smith”; it will also respond correctly to “call my best friend” or “I want to talk to Mark.”
Users not only can talk to Siri as if it were a person–they seem to want to. Beyond merely understanding what you have to say, Siri works because it has a personality.
Speech recognition programs are sometimes annoying because of the errors they stubbornly make in interpreting what we say. Even with all of the advanced technology under the hood, Siri still makes mistakes. With Siri, however, speech recognition is a conversation, and people are accustomed to dealing with misunderstandings in conversation. Siri’s interface gives the artificial intelligence a way to have its speech recognition fail gracefully.
Norman Winarsky, vice president of SRI Ventures–the venture capital arm of the Stanford Research Institute–worked with the Siri team before Apple purchased the technology. Winarsky says that the team’s most difficult task may have been creating Siri’s “voice.”
“The personality that you’re starting to see become a phenomenon is a great way to enchant people without offending them,” Winarsky says. “We were very, very concerned that attitudes could turn people off. The team worked closely to create a dialogue that would respond to your needs, but not annoy you.”
Siri’s personality is one of its biggest draws. It’s not just fun but funny. When you ask Siri the meaning of life, it tells you “42” or “All Evidence to date points to Chocolate.” If you tell it you want to hide a body, it helpfully volunteers nearby dumps and metal foundries. There are whole blogs such as STSS that collect Siri’s funny sayings.
The original Siri team built the app with a personality, and Apple has likely made it even more of an emphasis, Winarsky says. Dan Miller, senior analyst and founder of Opus Research, says that in some ways Siri’s interface is so much fun that it’s almost like a game. That engaging quality will help make the remaining foibles of speech recognition forgivable for many users.
Where Is Speech Recognition Going?
The next few years are likely to see an explosion in natural language apps and services.
“The best way to summarize what’s happening now is that the industry is waking up to the fact that speech and language processing can serve as a basic building block of your user interface,” says Vlad Sejnoha, chief technology officer of Nuance, makers of the popular speech recognition software, Dragon Naturally Speaking.
Dragon recently released Dragon Go, the company’s own mobile natural language app for iOS devices. Go is focused on helping users with specific tasks. A doctor might give it a complex patient history, for instance, and it would respond by summarizing the important symptoms.
You can expect Siri to add more functions to its repertoire soon. Apple developers have “actually taken some of the capabilities out currently, so I fully expect it to evolve quickly,” SRI’s Winarsky says. One of the features Apple removed was the ability to book a table at a restaurant automatically. It seems likely that Apple is inking deals and polishing features as fast as it can to restore that functionality to the app.
After that? Perhaps Siri will hook into your streaming music account or look up flights online. There are probably as many suggestions as there are Siri users.
Apple developers had better act quickly, though, because it won’t be long before they see a lot of competition. “What you’re going to see over the next few years is a lot of innovation around how to use this new building block,” says Sejnoha. “We’re really at this point of renaissance that isn’t the end, it’s really just the beginning of what we can do with this.”