Gmail Outage Explanation Doesn't Wash
It's too bad the National Transportation Safety Board can't investigate Google to find out just why Gmail crashed Tuesday as Google's explanations for its outages (via its dashboard) are short and kindergarten-like.
The NTSB would seek out the root cause of the outage, hold hearings and issue a report with recommendations for fixing the problem. But Google follows the standard operating practice of cloud and SaaS (Software-as-a-Service) providers, and that is to tell customers as little as possible about an outage. They treat their customers like dumb bunnies.
A Gmail outage isn't on the scale of a contaminated food supply incident, the discovery of lead paint on children's toys, or a plane crash -- all events that trigger a federal investigation and detailed reports that flesh out causes and remedies.
But what happens if Google wins contracts to provide applications and mail services for Los Angeles and other government entities?
Cloud and SaaS providers increasingly want to manage critical services for government. And in time, outages that are now annoyances may have critical implications to them. Los Angeles' IT department is recommending the city move to Google Apps and says the company's services "often exceed the current city level."
That's a plus for Google but if something goes wrong with LA's IT systems, at least there is still a clear line of accountability to the managers responsible and an opportunity to probe.
But along with telling customers as little as possible, hosting, cloud and SaaS providers indemnify themselves as much as possible from any business losses resulting from an outage.
Who's Responsible for Critical Data?
In theory, the accountability is provided by the market: a customer can move to new service provider. But a migration to the cloud may be a path of no return. LA, in its assessment of cloud services, said that if it ditches its current infrastructure, "it may be cost-prohibitive to return to the city-owned and operated structure."
Today, the harm is mostly economic. When eBay Inc.'s PayPal service crashed last month , it was just something customers had to deal with it.
PayPal blamed the failure on a "back-end router" and some redundancy issues, and left it at that. That meant the companies like Sailrite Enterprises Inc., a sailing supply company, which relied exclusively on PayPal, were unlikely to learn what happened and had to suffer the loss.
But if cloud and SaaS providers manage government services then it's unlikely that an informed public will settle for incomplete explanations about outages.
If the service is critical, they will want to know what went wrong. Was the equipment upgraded, patched? Was staffing at proper levels? When was the last time someone tested the emergency generators? And so on.
Answers to fair and legitimate questions will be sought and little "dashboards" aren't going to cut it.