Microsoft’s Windows Azure suffered from an issue on Wednesday that affected a management feature in the compute section of the public cloud, and was finally resolved Thursday morning.
Microsoft first updated the Windows Azure Service Dashboard at 2:35 AM UTC (Coordinated Universal Time) on Wednesday: “We are experiencing an issue with Compute in North Central US, South Central US, North Europe, Southeast Asia, West Europe, East Asia, East US and West US.”
About 17 hours later the company posted a message saying that manual actions to perform so-called swap deployment operations may fail, and users should therefore delay them. Microsoft was still struggling to solve the issue on Thursday morning. But the company seemed to be on the right track saying that it “was continuing to validate and deploy mitigation for this issue” and at 10:45 AM it told users that compute service management functionality had been restored in all regions.
The swap deployment operation is related to how services are deployed on Microsoft’s cloud. Azure offers two deployment environments for cloud services: a staging environment in which users can test their system, and a production environment. The two are separated only by the VIP (virtual IP) addresses used to access them, and the swap deployment operations are used to switch them and turn the staging environment into the production environment.
The company hasn’t elaborated on what caused the issue, but fortunately for Microsoft and users, the issue hasn’t affected the ability to run applications on Azure. However, the fact that it affected all regions raises questions about how Microsoft has constructed the management portion of its cloud. The time it took Microsoft to fix the issue also puts the company in a less than favorable light.
Also, Azure’s service management functionality has suffered from multiple interruptions in October. Microsoft didn’t immediately reply to questions why there have been so many performance problems.
Microsoft has apologized for any inconvenience this has caused its customers.
Updated at 8:53 a.m. PT to reflect that the issue has been resolved.