Microsoft's Azure Cloud Suffers Serious Outage
Microsoft's Azure cloud infrastructure and development service experienced a serious outage on Wednesday, with the system's service management component going down worldwide starting at 1:45 a.m. GMT.
"We are experiencing an issue with Windows Azure service management. Customers will not be able to carry out service management operations," Microsoft said in an initial message on the outage on its Azure service dashboard.
The issue has been "mitigated and service management is restored for the majority of customers," Microsoft said in a message posted at 1:30 p.m. GMT. "We still need to work through some issues before we can completely restore service management."
The incident's root cause "has been traced back to a cert issue triggered on 2/29/2012 GMT," Microsoft said in a previous update.
At 5 a.m. GMT, Microsoft said less than 3.8 percent of hosted services had been affected, and measures had been taken to stop the problem "from spreading across the production environment."
In addition, Azure customers in the north and south central U.S. as well as northern Europe may be experiencing some performance problems, according to a message on the dashboard posted at 10:55 a.m. GMT.
"Incoming traffic may not go through for a subset of hosted services in this sub-region," it stated. "Deployed applications will continue to run. There is no impact to storage accounts either."
At 1:30 p.m. GMT, Microsoft said it was "still troubleshooting" the issues affecting these regions.
As of 9 p.m. GMT, the service management function was still experiencing a worldwide outage, according to the dashboard.
But in an update posted at 7:30 PM GMT, Microsoft said it was "actively recovering Windows Azure hosted services in the North Central US, South Central US and North Europe sub-regions," and that "more and more customers applications should be back up-and-running even if service management functionality is not yet restored."
Previously, as Wednesday wore on, the dashboard reported other outages affecting different aspects of the platform.
The SQL Azure Data Sync service was unavailable in six regions around the U.S., Europe and Asia, and various problems were also listed for some regions regarding Access Control 2.0, Azure Reporting, Azure Marketplace and Azure Service Bus.
The notifications promised regular updates on the work being done to fix the issues, but no concrete timetables.
Azure users posted a stream of critical comments about the outages to the service's official forums on Wednesday.
"The dashboard shows it's being worked on," one commenter said. "Since we rely heavily on Windows Azure, we've been monitoring the dashboard closely the entire day. What I've noticed is a complete lack of estimates on (when issues will be resolved. For the last 4 hours, the status has essentially been 'The restoration steps to mitigate the issue are underway.'"
"My company's website's have been down for the last hour (since approx 11am GMT)," another user wrote. "This is causing quite a problem for us as we accept online payment 24-7 across the UK and Europe. The service dashboard isn't telling me anything outside of what I already know, except it's saying "Deployed applications will continue to run." - I can tell you this isn't true as the deployed applications aren't responding."
A Microsoft spokesperson could not immediately provide further comment on the Azure service problems and their root cause.
This is far from the first outage to hit Azure since its launch in late 2009. Rival offerings such as Amazon Web Services have experienced their own share of uptime issues as well.
Chris Kanaracus covers enterprise software and general technology breaking news for The IDG News Service. Chris's e-mail address is Chris_Kanaracus@idg.com