Microsoft Azure Stabilizes After Leap Year Glitch
Microsoft's Azure cloud infrastructure and development service was apparently running nearly trouble-free on Thursday, following a series of outages on Wednesday that affected multiple aspects of the system.
The Azure service health dashboard showed only one problem at 3 p.m. GMT, a "performance degradation" in the south-central U.S. Compute zone. "Our recovery efforts to restore compute service to impacted customers in this sub-region are complete," Microsoft said in a message on the site. However, "a small number of customers in this sub-region may face long delays during service management operations," it added.
Azure's service management component fared the worst during the outage, going out worldwide starting at 1:45 a.m. GMT on Wednesday. The dashboard showed the service management system running normally at 3 p.m. GMT on Thursday, as were other previously affected pieces of the Azure platform, including Reporting, Marketplace and Access Control 2.0.
Microsoft provided some insight into the outage's root causes in an official blog post.
"Windows Azure operations became aware of an issue impacting the compute service in a number of regions," wrote Bill Laing, corporate vice president of server and cloud. "The issue was quickly triaged and it was determined to be caused by a software bug. While final root cause analysis is in progress, this issue appears to be due to a time calculation that was incorrect for the leap year."
"Once we discovered the issue we immediately took steps to protect customer services that were already up and running, and began creating a fix for the issue," he added. "The majority" of customers and services had been fully restored by 2:57 a.m. PST on Wednesday, according to Laing.
Microsoft is planning to provide an update that will include more details on the problem's root cause, he said. "We sincerely apologize for any inconvenience this has caused."
Azure users took to official forums during the outage, complaining of disruptions to their operations and a lack of concrete updates from Microsoft.
The frustration was still lingering Thursday for some users. "I think we could have lost 2 prospects who are testing our system currently," one wrote on the forum. "But I can't imagine the damage this has done to companies with large scale customers. I mean we have chosen Windows Azure due to the redundancy..... How can we explain this to our customers."
Chris Kanaracus covers enterprise software and general technology breaking news for The IDG News Service. Chris's e-mail address is Chris_Kanaracus@idg.com