Recently, Microsoft took the unusual step of placing its Cortana and Bing product teams inside the same organization as Microsoft Research. The new Microsoft AI and Research Group will be led by computer vision pioneer and executive vice president Harry Shum, whose 20-year Microsoft career involves leading Bing’s search efforts from 2007 through 2013 and helping launch Microsoft Research China.
We asked Shum how this new organization will benefit Microsoft’s digital assistant in the following interview, which has been edited for length and clarity. (If you want more, see our related stories on what Cortana 2.0 could look like, as well as a deeper dive on how speech could work within Windows.)
The language of the blog post announcing the formation of Microsoft’s new AI division, together with how Satya Nadella has characterized it, suggests that Microsoft thinks it's in a space race of sorts when it comes to artificial intelligence. Is that accurate?
I just feel that the timing’s right to go big on AI. It’s kind of a little bittersweet for me because I did my Ph.D. in robotics and AI, my area was actually computer vision. When I graduated a little while ago, I guess, it wasn’t the best time for AI graduates to find a good job. And now you look at those people and anyone that knows how to train a few layers of neural nets probably gets job offers from every company.
It is a very interesting time. Several technological factors have all converged, including the availability of a lot of data, big computing power, and in the last several years I would say incredible progress in machine learning, especially deep learning. I think people feel that it is the time, and we feel that way at Microsoft. That’s why we made this big announcement to form the Microsoft AI and Research Group.
By seating Cortana and Bing’s product teams next to Microsoft researchers, the implication is that these products will become the focal point for Microsoft’s intelligence initiatives. How do you see both products improving as a result?
First of all let me say that over the last 25 years everything we’ve established at Microsoft, MSR has contributed to products left and right. I would claim that almost every major product within the company benefited from some technology from MSR. The difference now is that to develop the type of AI products and services we actually need to get those latest technologies into user’s hands much faster, so we see the need to accelerate the cycle from research to product. Which is why we made this organization move to really get our researchers and developers together.
And these two products: If you look at Bing and you look at Cortana. I personally worked on Bing for almost seven years. People certainly deserve to have different opinions, but I would say that Bing is a credible search engine and that we offer people a credible alternative. We have about a third of the search traffic in the United States.,
Cortana is something that we really really get very excited about. And you’re right, a product like Cortana symbolizes artificial intelligence. Today we actually have about 133 million Cortana users, and we actually have answered more than 10 billion questions. People start to use Cortana more and more. And with this joining forces with MSR and with more MSR AI researchers I suddenly have very high expectations in terms of the quality of Cortana, user experience of Cortana. All will continue to improve.
I think the number of people actively using Cortana is less than half of the devices running Windows 10. Are you satisfied with that?
There’s always room for improvement, if you work on these kind of products. And you are right, there are really two things that are most important when we design and ship this kind of product: the number of users, and the amount of user engagement. It’s a very interesting design decision: What is the product? What is this thing doing with, for the users? One of the things we certainly track is the number of conversations, with these agents, in every single session. We actually feel pretty good that Cortana compares well with other competitive products in the market.
I don’t know if you know we have another very exciting product called the XiaoIce chatbot. We shipped in China, and we shipped a Japanese version, also equally popular in Japan, and we’re still thinking about what to do for the U.S. market.
When you design a different kind of agent like a chatbot, the amount of user interaction is significantly higher and different. So it really depends on the product design.
Cortana right now is an assistant: She appears, and then disappears. Google is trying a chatbot approach with Assistant. Do you think we’ll see a Cortana chatbot in the near future?
Cortana is designed to help people to complete a task, whether it be a reminder that she should buy something for their mom’s birthday or that it is time to leave for home because the traffic is jammed. Or some of those knowledge kind of questions. This is the kind of design decision that we have to make today.
We will learn from the other chatbots we have already shipped and some we’re still developing. We have to see what kind of user scenarios are more important for the users. I don’t think that there will be only one intelligent agent in the world anytime soon. I think we will learn along the way.
Again, I want to emphasize the complexity and difficulty of this type of product. I think it’s still early.
You’re now Cortana’s boss, so to speak. Can you compare Cortana’s strengths and weaknesses with Siri and Google Assistant?
I would say it’s great that we have those competing products in the market. I think nothing is more exciting than looking at the other great engineers, and what they’re building.
I would say clearly we have the heritage from Bing, we understand the world’s knowledge, and what we can do there. And then we compare with other agents. We also understand about the users in different kinds of settings. So there’s old knowledge, as well as the other interesting related data like the calendar information, that the users are willing to share with us, There’s others, such as the email that they share with us. And so we can do a better job.
Another thing that we have been very clear [on] from the get-go is that we do want Cortana to have some form of personality. So that’s actually another kind of design choice. If you look at Siri and Cortana, they probably have a very similar philosophy there. And other agents may think differently.
Microsoft is encouraging me to have a conversation in German – a language I do not speak – with someone else using Skype Translator. But speech recognition is fundamental to other aspects of productivity, including dictation. Why isn’t dictation playing a more prominent role within Word and Office?
I will tell you that there is really no reason why it is not playing a much more prominent role yet. Rest assured that we are infusing AI technology into all Microsoft products. I do want to tell you that if you missed it, a couple of weeks ago we actually announced that we broke the world record in error rates in speech recognition in the Switchboard test data. IBM has always been on top, and as of now we have reached number one.
We are racing hard hopefully to become the first one to achieve human parity [recognition at the same level as a human]. Hopefully anytime soon. (Editor's Note: On Tuesday, Microsoft said its speech-recognition technology has now achieved parity with human beings.)
Windows 10’s speech technology dates back to Windows Vista, while Cortana’s speech technology was apparently developed more recently. When will the Cortana speech technology be merged into Windows?
The technology that we use in Cortana is not based on the Vista technology. I guess Vista is like a grandpa agent. That’s also why the Microsoft researchers are teaming up with those AI product teams to really accelerate the cycle from the technology we have in the lab to the products that we get into our users’ hands. We’re very excited about this and hope to show you more progress soon.
Updated on Oct. 18 to note that Microsoft's speech recognition has now achieved parity with human beings.