Earlier this month, Microsoft quietly appointed software architect Mark Russinovich as chief technology officer for its Azure cloud computing platform, formalizing a role he’s been executing for the past several years.
It was a smart appointment not the least because it may help ease any remaining concerns of system administrators reluctant to take on Microsoft’s cloud platform as part of their job duties. Among the Microsoft faithful, Russinovich has serious geek credibility. If Russinovich is behind the gears at Azure, it must be O.K.
Russinovich has long been one of the most popular speakers at Microsoft’s Build and TechEd technical conferences, thanks to his clear, cogent explanations of the company’s technologies. Russinovich joined the company in 2006, after Microsoft purchased his enterprise software company, Winternals Software, which offered a line of Windows repair tools that many found superior to Microsoft’s own.
Joining Microsoft led Russinovich to shift his ardent concentration from Windows to the company’s then-emerging cloud practice, Azure, which is now becoming a cornerstone in the company’s business strategy. Russinovich has experienced Azure’s growing pains first-hand, and has spoke about them candidly, and in depth.
One high-profile growing pain happened in February 2013, when Azure blinked offline due to an expired digital security certificate. The company had updated the certificate but failed to install it on the servers in time, given that it was part of larger update patch that got pushed back at the last minute.
The incident was an eye-opener for the company. “With services, it’s about the whole lifecycle. You have to think end-to-end when you are dealing with processes,” Russinovich said during his talk at the latest Build conference in April.
The IDG News Service sat down with Russinovich to learn more about the how Azure is reshaping the way the company operates.
IDGNS: Why is running a cloud service different from shipping software products?
Russinovich: It’s a very different mindset. With the box-product development cycle, the planners plan it up front and hand off the plans to developers, to develop it. Once they feel they have something stable, then they hand it off to the testers, who test it for a while. You might have customers test it in beta, and once you decide it looks good, you ship it out to the outside world. The customers get it, operate it, report bugs back to you. Typically you have a separate team—we do at Microsoft—who takes a look at the bugs, fixes them and rolls out updates to the customers.
That’s the traditional process, and many Microsoft products teams are totally designed around that. You have these very different kinds of skillsets for each phase—product planners, developers, testers.
When you go to a services delivery model, it is much more continuous. Developers are developing, operating, and testing the software.
The other key difference is that we’re also the ones operating the software. Because we’re operating on a large scale, we have the benefit and the onus of developing deep monitoring of the software, so we can detect problems.
Those problems need to be fixed by the developers of the software itself, rather than the engineering team.
We release an update to a small slice of production, and look at the health of the updated code, comparing it with the existing code to see if there are any anomalies that would indicate that something has gone wrong. Once you have confidence with that, you can roll it out to larger and larger segments of the service.
At any time, you need to be able to detect that there is a problem, get a fix for it, and then push out the fix it as quickly as possible. For as long as the problem is out there, you are impacting customers. If you haven’t built a system that supports [bug fixes] in a first-class manner, then you go into all sorts of heroics to get a patch out.
IDGNS: So devops is not just a buzzphrase, but the new way of doing things.
Russinovich: There is a lot hidden behind that word. A lot of people think it is just a developer who is pushing stuff to production.
It’s easy for a single developer, especially for a start-up. They are writing the code, pushing it out to production, watching it. They have a small customer base who can call and say its broken. So the risk is a lot lower there, and the scale is a lot smaller.
Once you get to something at the scale of an Azure, you can not do it that way. You have to have be automated.
The mind shift started a while back and segments of the company were already operating that way for our first services, such as Bing or Hotmail. So this has been spreading through the entire company and it takes a concerted effort, and takes learning, of what works and what doesn’t work. We can share best practices, but each team goes through a unique transformation. Its been fascinating to watch.
A great example is Windows Server. Cloud first means that a lot of things they are doing is for the benefit of Azure, and cloud deployments in server infrastructure in general. One of the obvious things is that Windows Server has a backup feature you can point to Azure, so you can use Azure storage subscription and to back up servers to the cloud.
IDGNS: How did you get involved in Azure ?
Russinovich: I started at Microsoft in 2006 when my company was purchased, and I worked at the tail-end of Vista, and Windows 7. As Windows 7 was finished, and we were starting to work on Windows 8, I was looking around to do something different.
Some of this was spurred on by some of the people who started the Azure project, including one of my heroes, Dave Cutler, the guy who created Windows NT.
He started the Azure project right as I joined the Windows team, which was disappointing to me that he took off to go work on this. I was looking at his project and thinking “Yeah, knock yourself out over there, I’m still working on the cool stuff over here at Windows.”
A few years later, I talked with him, talked with [then Microsoft chief architect] Ray Ozzie, and I started to take a step back. I saw the transition that the industry has going through. I was helping Microsoft get ahead of the mobile transition from desktop oriented computing. But I realized there is another side to that mobile disruption, which was the cloud part of it. All of those devices communicate with each other and store data in a central location. That made me realize that the cloud was probably way bigger and way more disruptive than even mobile.
The infrastructure, the software systems, and the application models for the cloud barely existed then. How do we operate at that scale? How do people write applications that operate at that scale? How do you develop an application that would work across the globe? This is like a brand new OS. I realized then Azure was the foundation for the future of Microsoft.
IDGNS: How will the job change for the system administrator whose workload is moving to the cloud?
Russinovich: If you look at what enterprises are doing in the cloud, they are really only getting started. As much as capacity has been built out, it is just the tip of the iceberg.
If an IT pro doesn’t want to be left on the sidelines, they need to figure out how to help their companies get to the cloud.
Shadow IT—I call it “bring-your-own-IT”—is just the business going around the central IT because the cloud is much more agile.
So if the central IT guys don’t want to be left behind managing the on-premise stuff, they have to figure out how to get in the position of helping their businesses get to the cloud. That means understanding the cost model of the cloud, the security model of the cloud, how to put in governance so people are doing the right thing, and making it so that business departments are incentivized to have you help them, rather than go around you as an impediment.
IDGNS: Microsoft has been pretty quiet on the OpenStack front, which has been generating a lot of publicity as a cloud platform. What potential value does Microsoft see in OpenStack?
Russinovich: From our perspective, OpenStack is, as something we’d adopt to run our own cloud, not mature enough, scalable enough or stable enough. The way we look at it is not how we could use it, but what our customers want from it.
What we find is that almost no one uses the OpenStack APIs [application programming interfaces] directly. When we talk to them, we hear that OpenStack is incredibly hard to set up and it is incredibly hard to maintain. So few people are actually deploying it and using it successfully.
Those who are using it successfully are not using the APIs. They are using an abstraction layer on top. So when we ask if it would be helpful to provide API support, typically the answer is no. And in fact, it would be difficult for us to do. If you look at the OpenStack API set, or the Amazon APIs, or our APIs, there is impedance mismatches between them. Creating an abstraction layer, you will end up with things that bump going through where there is not an exact match from one API to another. So it would be challenging for us to make a high-fidelity OpenStack API interface on top of Azure, because OpenStack is moving in possibly different directions and not necessarily at the same speed.
IDGNS: What were the lessons learned from the 2013 outage?
Russinovich: A key lesson has been configuration management, understanding what your desired configuration is, and what your current configuration is. That is the disconnect there. Our desired configuration was an updated certificate, our current configuration was old, and we didn’t connect the two.
I mean, it is obvious when you look at it, but you have to build up a culture to internalize that, and if you don’t get it right then something like the certificate thing can bite you.