Microsoft's Mundie on the Next Big Thing

Craig Mundie, 59, is Microsoft Corp.'s chief research and strategy officer. He assumed his position as chief visionary in June 2008 after Bill Gates retired from day-to-day operations at the company.

In his current role, Mundie is responsible for the long-term strategic direction of the business and recently completed a technology tour of U.S. colleges. Microsoft Research has 800 researchers in six locations worldwide. The company plans to spend more than $9 billion on R&D this year, up from $8.2 billion in 2008. An edited transcript of Computerworld's recent meeting follows.

CW: Microsoft announced some belt-tightening recently. How will that affect the R&D budget and your priorities? Will you scale back on basic research, as some competitors have?

Mundie: No. We have an opposite view on that, which is the tighter the economic times, the more the focus has to be on maintaining your R&D investment broadly. You want the normal cycle to produce timely results, but we've always believed that the pure research component was critical to us for several reasons. It gives us the ability to continue to enhance the businesses we're in. It gives us the ability to disrupt certain industries that we choose to enter [and] it's a shock absorber that allows us to deal with the arrival of the unknown from competitive actions or other technology breakthroughs. In these uncertain times, all three of those things are important to us.

CW: So, no substantive changes despite the economy and staff cutbacks?

Mundie: Yes. We did our first cross-company layoff in January -- about 1,400 people -- and we'll continue to make adjustments up to a total of 5,000 people in the course of the next 18 months. But a lot of this is not strictly cost restructuring but a resource reallocation mechanism in order to fund the things that we think we need to grow.

We believe that the economy is heading for a reset, not something from which there's going to be a nice, quick, happy rebound. We've taken our cost structure down to a level that we think is sustainable against that kind of economic outlook.

There are several different ways we seek to insert innovation into the products. First, we develop new features for the products we have. Second, we create new products and put them alongside the products we already have.

For example, we created OneNote because we thought there was going to be a new requirement for this type of more preformed "notebook/handwriting/aggregate everything" type of capability that didn't fit naturally within the mission of Word, PowerPoint and the other products that are part of the Office suite. So we put a new product in there. That didn't create any new compatibility problems but produced a whole new capability.

As we look in the platform and tools areas toward the future, we expect that there's a lot of change coming in the underlying architectures -- for example, like virtualization and the fact that there will be many CPU elements that actually allow some of the side-by-side execution of things. That can provide perfect compatibility while still allowing the introduction of things that represent whole new capabilities.

CW: Have you ever been disappointed that a great technology didn't take off as well you expected?

microsoft mundie
Mundie: One of the biggest areas I championed in my early years here in the early 1990s was interactive broadband television. We had many of the ideas, perfected a lot of the technology in the '90s, and we're sitting here in 2009 just starting to see a significant global ramp-up of that as a successor to traditional television.

It's certainly been a disappointment to me that the collective things necessary to make that happen -- for example, broadband network penetration and performance and things like that -- have lagged so far behind globally, and especially in the United States. Things that are very interesting technologically for the user are just not being brought to market at the speed I would have hoped for.

CW: But users are consuming more television programming over the Web these days. Are we really that far away from your vision?

Mundie: If you're looking at the big-screen experience, is it actually being delivered over a packet-switched network with a basis of interactivity and two-way communication? That's what I call full-tilt-boogie IP TV, and it's still ultimately the solution that people will come to use.

Given that that largely hasn't become available yet, and that we have so many people growing up with a lot of comfort in the use of PCs as media playback devices, we're starting to see people seeking that kind of network entertainment experience, delivered on demand and over the network. The business model there tends to be more ad-supported than subscription-based. For some people, that does represent access to content that historically they had to pay a [cable TV] network subscription fee for, and they're now able to get back into the ad-supported model.

In a way, it's a bit like the new world surrogate for over-the-air television. All the major networks in the United States are free over the air all the time, yet very few Americans watch them because the experience isn't very good, the shift to digital has not taken place, and there is limited content availability.

None of those [problems] exist in Internet access to that media. There's no limit on shelf space, you can have as much differentiation as you want, there's no rigorous timetable that you have to watch in prime time. So many of the things people covet in their entertainment experience, they can get today in what people call "over the top" or over the IP network access to the media. I think of it as the contemporary surrogate for free-to-air television.

CW: What is your proudest R&D achievement?

Mundie: I did a lot of the early work broadly in non-PC-based computing. All of the things that we have today that have matured into our game console business, our cell phone business and our Windows CE-based, Windows Embedded and Windows Mobile capabilities all started in the groups I formed here between 1992 and 1998.

I look at the progress we've made there and I take quite a bit of pride in the fact that we anticipated those things and we were able to get into so many of them. The company's strategy all along was to recognize that, ultimately, people would have many smart devices, and we wanted to be the company that would have some cohesive way of dealing with all of them. There's still work to be done, but nobody else has invested to have a position in so many of the devices that are now important to people.

The Next Wave

CW: You talk about technology waves. What will be the next big wave?

Mundie: What happens in waves is the shift from one generation of computing platform to the next. That platform gets established by a small number of killer apps. We've been through a number of these major platform shifts, from the mainframe to the minicomputer to the personal computer to adding the Internet as an adjunct platform. We're now trending to the next big platform, which I call "the client plus the cloud."

That's one thing, not two things. Today, we've got a broadening out of what people call the client. My 16 years here was in large measure about that. And then we introduced the network. The Internet was a place where you had Web content and Web publishing, but other than being delivered on some of those clients, the two things were somewhat divorced.

The next thing that will emerge is an architecture that allows the application developer to think of the cloud plus the client architecturally as a single thing. In a sense, it is like client/sever computing in the enterprise. It was the homogeneity that existed between some of the facilities at the server and the client end that allowed people to build those applications. We've never had that kind of architectural homogeneity in this cloud-plus-client or Internet-plus-smart-devices world, and I'm predicting that will be the next big thing.

What the world is searching for now is the right combination of underlying technologies and some killer apps that will demonstrate that the capabilities of this integrated end-to-end view of the cloud-plus-client will enable things that the world hasn't seen yet. That's what we're focused on here.

CW: So, what technologies will drive this?

Mundie: The technologies come at this at two levels. What are the underlying shifts in the lower-level platform technologies that will allow that to happen? And what are the things that might change the user's experience in some fundamental way?

There are two big things that form the nucleus of those two big changes. The microprocessor itself is going to change to this heterogeneous, many-core capability over the next four or five years. We've been planning for it, we know it is coming, it's sort of on the rails, and yet most of the world hasn't come to grips with the implications of that in terms of the application model and programming tools. To get performance, you're going to have to write parallel applications, and if it's cloud-plus-client, you're going to have to write distributed parallel applications. Those have historically been viewed as hard problems, but they will have to become de rigueur in the future.

The second thing is that the technologies of man-machine interaction are evolving and will be aided by the quantum change in computational capabilities that for the first time client devices will be able to implement natural, more humanistic ways of dealing with people. We call that next era that natural user interface.

Think of it as the successor to the graphical user interface. Microsoft was the company that drove the broad adoption of the GUI by putting Word and Excel on the early version of Windows. That became the killer app that brought us personal computing. Now we can see the outline of the NUI, just as we could see the outline of Windows coming. And yet you have to figure out, what are the killer apps?

CW: And what will those killer apps be?

Mundie: We're working on some, [but] they're very hard to predict. You can't really gin up a killer app on demand. A certain serendipitous process has to take place for those things to emerge. There's an invention part of that, there's a technological part of that, there's a market readiness part of that, and none of those things are completely controllable. But that kind of [event] comes around every 15 years or so in our industry, and we're getting into that time zone. That's why we think it's going to happen.

CW: During presentations, you've shown Laura, a 3-D avatar you call the robotic receptionist. How does this tie into this wave?

Mundie: We can take the new technologies of robotics, which are designed for high-scale, highly concurrent, distributed application development, and use them as a vehicle to compose together many of the individual advanced technologies like speech synthesis, speech recognition, human-feature-based modeling, machine vision and machine learning. Is there a way to compose these things together such that the whole really is greater than the sum of its parts?

What Laura showed was that we're at the bleeding edge of being able to bring these together in such a way that there is a qualitative change in the way you can interact with a computer system. It really does become more like dealing [with the computer] on a person-to-person basis in a free-form way. The computer will move from being strictly a reactive tool in your hand to being a proactive partner in trying to solve problems. It's that change in the qualitative experience, by which the computer helps you get stuff done or does stuff for you, that I think will be the hallmark of this next era.

CW: How has Laura evolved since you took the demo on the road last year?

Mundie: We've been broadening out Laura, teaching her some other domains to learn more about how she interacts with people. We taught her recently how to play trivia [games] with people. All of these things are ways of finding out how people react to dealing with a lifelike avatar that really does interact with them just like a person.

CW: What are the challenges Laura faces before she can work in the lobby?

Mundie: The demo consumes an eight-core machine pretty much fully when it's interacting [with people]. Yet each element of it -- the vision system, the speech system, the reasoning system -- is running at a fairly coarse granularity.

But if you give us more horsepower, it will just get better. This is a precursor to a new class of applications that have an almost unlimited appetite for computational capability. That's a very different situation than we find ourselves in with most applications today. They barely utilize the capability of the machines we have.

As we get more horsepower, Laura's performance will be better in every dimension. The quality of her speech will be better, we'll be able to move beyond a rough polygonal model of her face and features and the animation of her face and movement.

Going Virtual

CW: You've also demonstrated shopping in a three-dimensional virtual world that maps to a real-world location. What is the significance of that?

Mundie: That demo showed what it would be like to navigate through a model of the real world. I call it First Life because many people have experimented today with Second Life. That world is a synthetic one. It did not get a lot of broad usage because in reality, there are not that many people who want to build their own world or operate in some alter ego.

The technology allows us to build models of the real world at a resolution that allows them to become a navigational metaphor. You get another element of a natural user interface. In this case, it's not natural in that it is simulating human sensory input and output, but rather it uses those senses that you have to interact with a model that is a world you already understand. It gives us another natural way to get stuff done in cyberspace.

What I demonstrated was the idea that I could walk down a [virtual] street in the way I would normally navigate it in the real world, enter a shop and look around and interact with things much as I would if I were at the physical shop. The goal was to maintain as much fidelity between what the 3-D model of the world was and what the actual world was, such that the two became increasingly interchangeable.

If I know how to walk around in the real world, I know how to get around in cyberspace. If that lets people do a lot more things without having to perfect using an Xbox game controller to navigate in 3-D, then that will further expand what these cloud-plus-client applications will look like and how many people will be able to use them.

CW: What role will robotics play in this world of the future?

Mundie: People think of some anthropomorphized machine like Terminator 2. I don't think of robotics that way at all. Today, industrial robots perform dedicated tasks, and they're machines that are designed to optimize that [task]. We're moving to an environment where the kind of synthesis that goes on assembling a robotic system will be done whether the robot is a physical realization -- a machine -- or a virtual realization like Laura the robot receptionist.

The design and operation of these systems is virtually identical in the two environments. Building the tool kits that will allow these highly distributed, highly concurrent systems to be built is the next big thing in robotics. As we perfect that and people become more accustomed to having these artificial helpers around, either physically or logically, we will start to see more acceptance of [robots] performing tasks that today we associate only with people.

What tasks do you see these robots taking on? My near-term dream is Laura the robot doctor or physician's assistant. If we had that humanlike ability to communicate with the computer in natural language and yet embody an expert system like what's required to do medial diagnostics and prescribe treatments, we'd be creating the only scalable solution I know of [that is capable of] delivering health care to another 5 [billion] to 7 billion people in the next few decades.

There's no way to scale the current model of health care to the people who don't have any today. Something's going to have to change, and I think information technology is it. Many times you say that and people think, "I want better electronic medial records." But I think that's the uninteresting part of the problem.

CW: The idea of electronic medical records is uninteresting?

Mundie: Yes, in the way that people have historically talked about them. In fact, this Amalga product that we've been building and deploying really moves beyond the idea that you need a prior, specified medical record. It just ingests all of the historical medical information, synthesizes metadata, flattens it out and creates a new abstraction for dealing with all for the medical information.

The technology we demonstrated bypassed the traditional primary view of what information technology should do next in medicine. But even if you solved all of those problems of medical record-keeping, you still don't have a way to scale heath analysis, diagnosis and delivery to the billions of people who don't have any at all.

The same may hold true to some degree in education. Maybe if you had Laura the teaching assistant that complemented teachers in a rural village, as well as Laura the physician's assistant or doctor in a box coupled with sensing technologies, many forms of nonacute medical care could be rendered to people who today have none at all. So many of these things are important in dealing with some of the societal challenges we have on a global basis, and they ought to be big businesses to boot.

CW: In the era of client plus the cloud, what will be the role of Windows? Will the operating system be as relevant to the end user as it is today?

Mundie: How relevant was Windows when you thought the world was DOS? The answer is it became pretty relevant. That's the way I think about this problem now. We're going to move to a new platform with new models of human interaction solving new problems at a higher level of abstraction. The operating system will be the thing that creates the mapping between the physics of the computing environment and our ability to write these applications and portray them for people.

In some of the things I described, there's no shell, no graphical user interface model, [but] there's still an incredibly important role for the operating system.

You may not have the same direct association of the operating system as a part of the application. This doesn't mean you won't have clients in screens. That correlation remains. I'm describing a world where there's no less of a requirement for controlling complex hardware that arguably will get even more complicated. But the boundary between what the user associates as the app, what part lives in the cloud, what part lives on the device in their hand -- those boundaries will be blurred.

To make the machine work, there's going to have to be operating systems on the devices that make up the cloud and on the devices that make up the client. The world I'm describing is one where those two things will be operating in some very symbiotic relationship.

There will be a new class of apps, and I think that those will be as different as the difference when we moved from the command-line interface to the Windows model. That's the way I see the future.

CW: Microsoft has had some famous flops, such as Microsoft Bob and Clippy, the infamous animated paper clip in Office. Why is it so hard to make things easier?

Mundie: Why is it that we haven't been able to create the starship Enterprise yet? We see it on Star Trek, we have a fairly clear model of space travel. What's so hard about it? The answer is, it's just a big engineering problem and it takes time.

Whether you're a science fiction person, a visionary in computing or someone who's just trying to think about a practical way of making something like television better, it turns out that it's a lot easier to envision what that future thing should be like than it is to actually build it. Then, even after you build it, there's hysteresis in the system that resists the change. So you have to have some big thing, whether it's economics or whatever, that forces you past that hysteresis point where you have stake in the new model.

We called [Microsoft Bob and Clippy] the social user interface. We were trying to make the machine add a more social view to its interaction. Those things may be anticipating something that we still ultimately want to get to, and the natural-user-interface things we discussed here may be a stepping stone to that day.

Those were a valiant attempt, at a time when the computers really weren't powerful enough, to introduce some of those types of contexts. But in both of those cases, in contrast to the [Microsoft Office] Ribbon, for example, what we were trying to do was introduce them as an adjunct thing without changing the primary model of the user's experience.

Both were intended to help you. It was more like saying, "We have this character-based social interaction model to help you." But in many cases, people said, "I don't really want the help. How do I make that annoying guy go away?"

That's very different from saying that ultimately, we're going to change the complete model of how people interact, not just in an adjunct function like help but in the fundamental way in which you interact with the machine. But that was certainly not possible at the time. So you could say it just failed because it was a dream for starship travel and all we could build was automobiles.

CW: Was the Microsoft Office Ribbon an adjunct technology as well?

Mundie: No. We realized that people were having a tough time discovering in a natural way what the system could do for them. We saw that [Clippy] didn't work. The Ribbon was a way of creating a user interface that in the natural course of usage would expose and let people trial, at zero risk, the rest of the features of the product.

It morphed in a task-oriented way to expose the features that were the most useful, even if you didn't know they existed. For example as you roll across the formatting buttons, your document is immediately -- but only temporarily -- reformatted. You don't have to click it and then decide you didn't like it and then undo it. As you scroll across, you see the one you like, you stop, you click, it's done. It becomes a risk-free, low-cost way of learning in the process of doing.

Subscribe to the Best of PCWorld Newsletter