Craig Mundie, Microsoft’s chief research and strategy officer, doesn’t speak Chinese. But on Tuesday he did, via a life-like virtual avatar shown at the company’s offices in Beijing that can simulate his voice and speak in other languages.
“What was spoken in Mandarin today I never recorded,” he said after seeing the demonstration. “But it is my voice. They have a computer model of my voice box.”
The virtual avatar, which was designed for language translation, is just one of the new technologies the company has been working on at Microsoft Research.
The research group started on Microsoft’s Redmond campus, expanding to five other labs around the world, and now has more than 850 researchers with PhDs. It celebrated its 20th anniversary on Tuesday, beginning with an event at the Microsoft Research Asia facility in Beijing at which Mundie spoke.
Mundie oversees Microsoft’s research group, and is also charged with planning the company’s long-term strategy, as far as 20 years in the future. He has frequently promoted the idea of so-called natural user interfaces that operate via touch, voice commands, or even reading a person’s facial expressions.
“When we think of natural interaction, we think of emulating all of the human senses,” Mundie said in an interview. “Touch, vision, speech synthesis and recognition, the ability for all of these to be operated together, they will be the most important in the next few years.”
Microsoft’s Kinect, an add-on to the Xbox 360, is just the company’s latest products to adhere to this concept, allowing users to play games simply with body movements. While Microsoft intends for Kinect to move beyond gaming and PCs, other natural user interface technologies are also in the works. On Tuesday, Microsoft researchers showed off an image search tool that will allow users to find pictures by sketching the general outline of the object they are looking for. The image search will then find the best matches by mining the Internet.
But the natural user interfaces Microsoft is building also extend to virtual environments. Avatars that not only look like actual users, with photo-realistic effects, but can also mimic their voices and approximate the lip movements of speech, could provide life-like exchanges among users without the need to leave their computer, Mundie said.
“Another dream we have is that I should be able to sit in my office, send my avatar to meet somebody in Beijing, and I can speak in English and the avatar speaks in Mandarin in real-time,” he said. “We want the computer to be a simultaneous translator.”
Natural user interfaces will be a future game changer that will reshape the market, according to Mundie. But to reach that point means refining smaller-scale technologies while also finding new ways to apply them, he said. One of the best examples of this has been Kinect, which was the culmination of seven to eight research activities occurring at Microsoft Research.
“There’s no reason to believe we won’t see these natural user interface elements become a more integral part of the Windows experience,” he said.
On the other hand, it’s hard to predict where tablet devices will fit in the overall picture of computing, Mundie said. He noted that tablets fit in a special area among devices too large to fit in one’s pocket, but also lacking in the complete computing features of a PC.
“I think there will be a whole range of products that fall in that gap. And that they will be applied to a whole range of tasks: reading, writing, annotation.” he said. “I think there will be demand for those things and that the tablet form factor may be with us for a long time. Or it might be displaced by some other more radical technology.”