Google vs. Microsoft: Lessons on handling a Cloud Fail
This week's Gmail outage reminded us all that the cloud ain't perfect. But you may draw some inspiration from a look at how Google reacted.
InfoWorld's Leon Erlanger tells the tale of Google's response: the company's App Status Dashboard reported the outage shortly after it occurred on Sunday. At 3 p.m. (Eastern time) on Sunday, Google told us that "We're investigating reports of an issue with Google Mail." Google continued to post updates to the Dashboard report every two or three hours through the night and into the next day. At Crash + 5 hours the Dashboard reported, "This issue affects less than 0.08% of the Google Mail userbase... Affected users may be temporarily unable to sign in while we repair their accounts." At Crash + 22 hours, Google revised its damage estimate to 0.02% of Google Mail users -- perhaps 40,000 accounts. At Crash + 32 hours, a Google VP posted a full explanation of the problem, and details about what was being done to correct it.
Now consider how Microsoft handled a similar incident.
On Dec. 30 of last year, Microsoft suffered a massive SQL Server failure that affected 17,355 Hotmail accounts. As I reported at the time, Microsoft's response left much to be desired.
At Crash + 8 hours, I saw sporadic reports of Hotmail problems. It wasn't at all clear whether the problems were random or systemic. Hotmail, like Gmail and other email services, shuts users out for brief periods to perform system maintenance, so temporary minor outages are hard to distinguish from major ones. At Crash + 12 hours, we still had nothing official from Microsoft. The Windows Live Solution Center fielded hundreds of complaints. In the absence of an official statement, the beleaguered staff at the Solution Center resorted to answering each post with basically the same response, a cut-and-paste response to an avalanche of criticism and angst.
I didn't see any official acknowledgement of the problem -- much less a status report on the resolution -- until Jan. 3, when Chris Jones on the Inside Windows Live blog posted an explanation: "Beginning on December 30th we had an issue with Windows Live Hotmail that impacted 17,355 accounts. Customers impacted temporarily lost the contents of their mailbox through the course of mailbox load balancing between servers. We identified the root cause and restored mail to the impacted accounts as of yesterday evening, January 2nd."
That's how it played out. On the Microsoft side, at Crash + 4 days we received confirmation that the problem had been resolved at Crash + 3 days. At the time, many Hotmail users reported that they still didn't have their mail back.
While the sources of the problems were completely different, the net results were almost identical: some panic-stricken online email users thought their data was gone for good. In the end, it looks like everyone got their data back. But the roller coaster ride in between clearly differentiates the two companies.
It appears as if Microsoft is trying to mend its incommunicative ways. While fact-checking this report, I ventured to the Windows Live Solution Center's Hotmail Portal. A yellow banner at the top of the page says, "Windows Live Hotmail is currently experiencing issues with inbound mail delivery. You may see a delay in receiving email into your inbox. For more information on this event, click here."
When I clicked on the indicated link, I was dumped onto the MSN home page. The page offered "Windows Live Hotmail: E-mail made simple. Fight spam with Microsoft SmartScreen technology" with no other references to Hotmail. There was no indication at all about any Hotmail problems.
The more things change, the more they stay the same.
This story, "Google vs. Microsoft: Lessons on handling a Cloud #Fail," was originally published at InfoWorld.com. Get the first word on what the important tech news really means with the InfoWorld Tech Watch blog. For the latest business technology news, follow InfoWorld.com on Twitter.