What Your Business Can Learn from the Amazon Cloud Outage
The servers are back up and users can once again check in on Foursquare and ask questions on Quora, but the legacy of last week's Amazon Elastic Cloud Computing (EC2) outage will live on and provide important lessons for businesses as they look to cloud computing for their IT future.
While there have been high profile cloud outages before, the scale and length of Amazon's unexpected downtime, as well as the profile of some of the clients that were dragged down with it, make it all the more impactful. So while Amazon scrambles to find out what went wrong, here's how to make sure you're ready for turbulence on the way to the cloud.
Amazon structures its cloud data center into Availability Zones to provide a level of redundancy. It's like designing a ship with multiple water-tight compartments, so that if one or two are damaged, the ship remains afloat. However, history has shown us no "unsinkable" ship is truly unsinkable, and to believe so is folly. Trust in your design, but always have enough lifejackets on board.
Even with a major hole poked in its credibility by nearly two days of downtime, cloud computing in general and Amazon EC2 in particular still offer compelling benefits to the small business community--most notably, the capability to offload the management of complex compute demands.
There are ways to mitigate some of the potential challenges of an outage like Amazon's. With some care and forethought, small businesses can still turn to the cloud as a way to reduce the time and money they stay on the "keeping the lights on" part of IT management, and increase the amount of effort they spend on innovation through technology.
What Is Mission Critical?
Businesses that depend on Internet connectivity, like Foursquare and Quora, are more attracted than most to the value proposition of cloud computing. The capability to scale their environment (and their bill) up or down with usage is huge. However, these are also the companies that stand to lose the most when there's downtime, as the Internet-based service literally is the business.
But unless you're launching the next hot-button social media property, you're a little bit more fortunate. You can pick and choose the parts of your IT infrastructure you want to keep on-site, and outsource others to maximize profitability.
The cloud doesn't have to be all or nothing. Maybe e-mail and the Website are too important to your business to be under anyone else's watch. But maybe there are some test and development workloads that can happily live on the cloud. If you're concerned about cloud provider reliability, use the cloud for workloads that won't take the business down with them if they go down or that can wait for a while should need be.
Diversify, Diversify, Diversify
If you are in a position where it makes sense to have even your most important IT infrastructure in the cloud for the sake of accessibility, flexibility or economics, there are ways to make it work for you. You're just going to have to make sure your cloud environment is at least as redundant and disaster-ready as is your on-location network, server and storage infrastructure.
My PC World colleague Tony Bradley offers the example of SmugMug, the online photo-sharing Website that run on Amazon's cloud, and yet survived the Amazonpocalypse with nary a scratch. In SmugMug's case, it was largely a case of being in the right parts of the cloud at the right time, and not subscribing to the hardest-hit Amazon service, its Elastic Block Storage offering.
But just to be sure, if the workload is critical, it may be worth investigating entering relationships with multiple cloud providers, preventing your business from falling to a single point of failure even in the cloud.
And while you're negotiating those deals with one or more cloud providers, take a minute to examine your service level agreements (SLAs) with any provider. SLAs should set out how your providers are rewarded when things go right, and how you're compensated when things go wrong.
Especially if you're working with a local service provider which is working with an Amazon, a Google, or another major public cloud infrastructure vendor, make sure those SLAs spell out who is responsible for what should things go awry. It's worth the extra time and effort early in the relationship to make sure those SLAs are clear, comprehensive and iron-clad.
If something goes wrong, you don't want your business to languish offline while your vendors pass the buck for responsibility for the outage. This is the very definition of when you want one throat to choke, and you want to make sure it's clear to whom that throat belongs.