“I want to learn everything about everything,” Scarlett Johansson murmurs.
Those aren’t the words of an ambitious Hollywood actress. They’re spoken by “Samantha,” the world’s “first intelligent operating system.” With nothing more than language, curiosity, and a zest for virtual life, Samantha entices her shy, awkward owner to fall in love.
Ridiculous? Not in the mind of Spike Jonze, who wrote Her, a movie opening this November and starring Johansson and Joaquin Phoenix. Completely coincidentally, forging emotional bonds with users is key to the strategy that Nuance Communications is employing to compete with Apple’s Siri and Google’s Google Now, today’s dominant digital assistants.
Apple, Google, Nuance, and other companies envision a service that “knows” the weather, your calendar, traffic conditions, and other information, and can deliver it to you across your phone, your computer, your TV, and eventually your car. At Apple and Google, the approach has focused on data: contributing it, collecting it, and collating it.
But at Nuance, they’re flipping that argument on its head. Data can wait. It’s the relationship that needs to be forged first. While Apple and Google are attempting to create intelligent agents, Nuance is aiming to build an intelligent persona. Its emphasis is on “person,” and the technology is powered by the speech-recognition and natural-language tools that Nuance has bought or developed over the years.
”For me to be in the car, listening to the 49ers game, it’s halftime, I arrive home, I tell my TV to ‘put on the game’—that shouldn’t be that big of a deal,” says Gary Clayton, the chief creative officer at speech pioneer TellMe Networks, who holds the same position at Nuance. “This notion [of] where the intelligence comes from: It’s a system we can interact with in a conversational way. Because once you start interacting with the system in a conversational way, there’s almost an understanding that there’s a sentient being on the other end of the conversation. And the closer you can get to that point, the deeper the faith, and the stronger the relationship.”
It’s a rather granola concept for Silicon Valley, and, to be honest, I didn’t get it at first. Google, with its army of Google Street View cars, Android phones, and Chrome browsers, has established that data rules—and if you don’t have it, you’re doomed to fail. That’s not a position that Nuance necessarily disagrees with. It just doesn’t believe that it has to master the data space itself.
ScanSoft, which had bought out speech pioneer Lernout & Hauspie and Dragon Systems, merged with Nuance Communications in 2005, taking its name. Since then, the new Nuance has made more than 40 acquisitions, many of them speech-related, but also including Swype and an early digital assistant, Vlingo.
Nuance now maintains divisions focusing on healthcare, mobile, enterprise, and imaging, with customers including Amazon, Apple, Ford, HTC, Samsung, and Subaru. All use Nuance for speech recognition. Even the “virtual sales representatives” that engage you in online chat on popular websites may draw from a Nuance technology called NINA. Its talents include interpreting all-caps communications as frustration and “representing the brand” through various personas.
If you’d like to try Nuance’s first-generation digital assistant for Android, you can: Dragon Mobile Assistant is available for free on the Google Play Store. Using a trigger command (“Hey, Dragon”), you can ask for the weather; send texts, tweets, and status updates; and also share your location and find that of your friends.
The problem is that Google, of course, dominates its own Android platform, so Nuance will have to go a step farther. Enter Wintermute.
Wintermute, the first Nuance AI
Nuance began showing off its intelligent-agent technology in an early form at this year’s Consumer Electronics Show in January. Wintermute, unfortunately named for the psychotic AI of Stephen Gibson’s Neuromancer, combines the Dragon Mobile Assistant with the related Dragon TV service for televisions, and straddles the PC as well. In this sense, it has a leg up over Apple and Google in its hardware reach, although Wintermute is still in something akin to an alpha stage.
Regrettably, the demo doesn’t work all that well even in the artificial environment of Nuance’s labs. Yes, the concept is sound. In the demo, Tony Sheeder, a user experience designer for Nuance, began with a Windows computer, asking it to show off some basic capabilities, purely through his voice. One moment stood out when Sheeder asked the system to start playing The Rolling Stones: “It’s only rock and roll, but I like it,” Wintermute replied, before launching the song. The line is scripted, Sheeder admitted, but it serves a purpose.
“On the one hand, if deployed sensitively, it can be a delightful way to imply that the persona is intelligent enough to understand the content of the conversation, and to sketch a perspective relative to the content—a point of view,” Sheeder said in a followup email. “On the other hand, I think that this sort of thing helps to soften the edges of the application, as it were—both in the sense that it makes it feel less mechanical, and in that it helps to sort of camouflage the perimeter of the application, and make the experience feel a bit more expansive than it really is from a purely functional standpoint.”
In the future, Sheeder says, users will be able to log on just through their voice, which the system will recognize, with a personalized profile to go along with it. That’s not too hard on a smartphone, which is assumed to be owned by a single person. But on a public PC, the task is far more complicated.
Unfortunately, when asked in the demo to play back the Stones music playlist on another device, Wintermute stumbled. A couple of other data-driven queries did as well.
But data is just the product that Nuance is buying and selling. Marketers talk about customer service. Nuance wants to build a relationship with an intelligence on the other side.
According to Clayton, Nuance isn’t necessarily relying on its own technology to acquire the data it needs. It’s willing to tap into databases and public sources of knowledge, and to work with its existing customers. That puts its existing partnerships in the healthcare industries and car technology, for example, in a new light. One question is whether Nuance wants to take the lead in developing a framework that data could reside in, an issue that the company is in discussions about, Clayton says.
From a technical standpoint, what Nuance needs to integrate falls into four buckets, Clayton says: input/output, including conversational speech, touch, and gestures; channels, or routing the persona and the data it acts upon from device to device, through the cloud; the endpoints, such as medical equipment; and signal capture, such as GPS-based location.
“Do we have to become the lingua franca of all of these things? I don’t think that’s a decision that’s been made yet,” Clayton says.
The emotional connection
At this stage, Wintermute still “appears” as a spinning circle, much in the same way as Siri appears as a microphone. Victor Chen, Nuance’s vice president of strategy and design, who oversees the teams building out Wintermute and related technologies, says the company is sensitive to the “uncanny valley” phenomenon, in which people shy away from a construct that edges too close to appearing human.
According to Chen, the next step is emotion, both in interpreting the user’s emotional state and in being able to respond or sympathize in kind. That could be as simple as changing the font or background color of the persona’s interface to reflect a change in mood. In that case, Chen’s model might be virtual actors similar to Pixar’s Wall-E, who never utters a word throughout his movie.
Granted, there’s a fine line between a personality a user can relate to and one that ends up as a punch line on a late-night talk show. Americans may just not get it. “In Asia, we expect they’ll be more receptive to something like this,” Chen says, referring to the land of digital pets.
But that doesn’t mean that a virtual assistant can’t begin to interject a personality, such as that of an affectionate mother. “In a mapping use case, we can add something like ‘Don’t forget money for a bridge toll,’” Chen says. Or, in another scenario, the assistant could say, “Don’t worry, you’ll get there in plenty of time.”
“If you do that more and more, the user may get the feeling that the assistant is looking out for them,” Chen says.
Nuance could stop there. But Sheeder says there’s a camp that believes a digital assistant should—believe it or not—disagree with you. Yes, a majority still think that assistants should be inoffensive, but that’s an inherently shallow experience, Sheeder says.
“For what it’s worth, I respectfully disagree. The pleasure we take in interacting with another person is, I’d submit, in significant measure a function of discovering and exploring the unique qualities and characteristics of that other person,” Sheeder says. “To the extent that we mold that other person—or persona—in an image of our own devising, we take away the capacity for delight. It’s the difference between playing poker and playing solitaire.”
“Giving that little persona a distinct and discrete point of view, I think, gives you a lot of power,” Sheeder says. “Not only does it help you strengthen that bond between the service and the user, it gives you a vector on the world.”
Will Nuance’s vision come to pass? The technology market has heard a number of boasts that never quite came to fruition, from how the Segway would take over the world, to the emergent gameplay of Black & White and Spore, to the promises of the Infinium team claiming that its Phantom console would make the PlayStation and Xbox irrelevant.
But Nuance isn’t boasting. Company executives seem well aware of the hard work ahead. Nuance can’t go it alone—it simply doesn’t have the resources. But whether it will successfully build out an interface to convince consumers and device makers to contribute data and dollars is an open question. For now, we’ll leave the final word to Joaquin Phoenix’s lovelorn character, Theodore:
“I love the way that you think about the world,” he says to his OS.
Note: When you purchase something after clicking links in our articles, we may earn a small commission. Read ouraffiliate link policyfor more details.
As PCWorld's senior editor, Mark focuses on Microsoft news and chip technology, among other beats. He has formerly written for PCMag, BYTE, Slashdot, eWEEK, and ReadWrite.