Microsoft’s Bing site was offline Thursday night. Relatively speaking, the outage didn’t last very long, but any outage is a bad outage when you are introducing new features and trying to win market share from a dominant competitor like Google.
Users were unable to use Bing from around 6:30pm to 7:00pm (give or take 10 minutes on either side) Pacific time. Initially the site was simply unresponsive or returned partial search results. Eventually Microsoft published an error page explaining that the site was unavailable.
Nadella explained that “The cause of the outage was a configuration change during some internal testing that had unfortunate and unintended consequences.”
Nadella also used the blog post to indicate Microsoft’s commitment to ensuring issues like this don’t occur again. “We strive to maintain a high standard of operational excellence at Bing. We are running a post mortem to find out how our software and processes need to be improved to prevent anything like this from happening again.”
Google has suffered a variety of outages and service interruptions, but not with the actual bread and butter Google search site. Gmail and Google News have had occasional issues and the public sentiment always seems to be a mixed response between “Google should never have outages–how can we trust Google or the cloud if Gmail goes down” and “Google is providing all of these services at no cost-what do you expect for free.”
You may be familiar with the bumper sticker wisdom that “stuff” happens. Or, to borrow a phrase from the cult classic movie Breakfast Club “screws just fall out all the time, the world is an imperfect place.” Either way, although we expect sites like Bing and Google to be online 24/7/365, sometimes things go wrong.
Somehow, I actually find it more comforting that it was human error or a configuration flub that took Bing down. Had the outage been the result of usage overwhelming server capacity, or from a malicious attack of some sort my faith in Bing would be more shaken.
That is not to suggest that outages resulting from human error are OK. I expect Microsoft to investigate the root cause and implement some additional change controls to ensure this sort of thing doesn’t happen again.
A simple mistake is a tidy problem that is easily addressed and has less impact on my overall opinion of the service. Human error is somehow more acceptable than not being able to survive an attack or handle traffic capacity.