Master MySQL in the Amazon Cloud

For many MySQL database admins, Amazon Web Services represents the brave new world of cloud computing--one fraught with disappearing servers, disk I/O variability, and other shared resource challenges, not to mention questions regarding the security of data.

But for those seeking to tap the powerful flexibility that cloud computing affords MySQL database deployments, AWS should be viewed simply as the next natural step in data center evolution -- one in which virtualization and commodity hardware are a given, and scalability, performance, and high availability are readily attained.

[For the full scoop on the state of the cloud in the enterprise in 2012, check out InfoWorld's Cloud Computing Deep Dive PDF special report. | Also check out our "Cloud Security Deep Dive," our "Cloud Storage Deep Dive," and our "Cloud Services Deep Dive." | Stay up on the cloud with InfoWorld's Cloud Computing Report newsletter.]

To help DBAs take advantage of what offers, we've compiled the following primer on managing MySQL databases in Amazon's cloud. Along the way, you'll find essential tools and techniques for migrating databases to AWS, tuning them for high performance and high availability, and avoiding the pitfalls of computing in the cloud.

Demystifying Deployment and Disappearing Servers

Of all the steps to ensure a successful AWS deployment, spinning up a basic instance is the simplest. More challenging is dealing with the new reality of disappearing servers.

To get started, download the Amazon API tools and install. Next set up your environment variables (EC2_HOME, EC2_PRIVATE_KEY, EC2_CERT; you may also need to set JAVA_HOME and PATH), spin up your instance, fetch the name of your new instance, and connect:

$ ec2-run-instances ami-31814f58 -k my-keypair -t t1.micro$ ec2-describe-instances$ ssh -i my-keypair

Next you'll want to set up MySQL and a few base packages. Here we recommend the Percona free edition of MySQL:

$ rpm -Uhv$ yum install -y Percona-Server-shared-compat$ yum install -y Percona-Server-server-55$ yum install -y Percona-Server-client-55$ yum install -y libdbi-dbd-mysql

Set the root password on your new MySQL instance, and you're ready to begin.

Perhaps the most difficult shift you'll make in adapting your thinking to cloud computing is around virtual machines themselves. AWS instances are built on virtualization technology, and although they sit on top of physical hardware that behaves much like the servers you're used to, the virtual machines are not as reliable as physical ones. These machines can disappear out from under you and your application without notice. As such, redundancy, high availability, and scripted automation are key. Such pressures also put disaster recovery front and center. Now no longer relegated to a best practices list of tasks you'll get to when other pressing problems are resolved, disaster recovery becomes an urgent priority.

Take, for example, what the operations team at Netflix decided to do. They wanted to meet this server reliability question head on, so they built a piece of software that would play Russian roulette with their servers. The resulting Chaos Monkey randomly knocks out servers in their production environment in the middle of the day. What's more incredible is how this illustrates two sides to the AWS cloud coin. On one hand, the servers aren't as reliable; on the other, Amazon provides the tools with which to build in all the redundancy you need.

For example, Amazon makes using multiple data centers seamless. They organize the objects (AMIs, snapshots, instances, and so forth) around the availability zones and regions in the environment. There are currently seven regions to choose from outside of AWS GovCloud, including Virginia, Oregon, California, Ireland, Singapore, Japan, and Brazil. Each region includes multiple data centers. Replicate your database data between these regions, build and keep fresh your server images, and automate push-button rebuilds to run with the most robust and fault-tolerant infrastructure possible.

Beware Disk I/O Variability

Relational databases often appear as unnecessarily complex beasts. But they've evolved the way they have to provide an array of great features. We can load them full of data, then mix and match that data asking complicated questions and selecting slices based on an endless set of conditions.

Behind the SQL language we use to fetch data and make changes, the underlying engine of a relational database--whether it's MySQL, Oracle, or SQL Server -- has the sole job of reading and writing data to disk, keeping the hottest (most requested) bits in memory, and finally protecting against server failure.

That said, disk I/O--the speed with which you read and write to that underlying storage system--is crucial. In the early days of Oracle, for example, before you had RAID, the database engine offered ways to stripe data across many disks, and it emphasized putting redologs to protect against data loss on its own disks entirely. When RAID became widely available, DBAs could simply place all their data files on one volume and the RAID array would take care of the rest.

Enter the present day where Amazon's EBS (Elastic Block Storage) is virtualized, allowing you to cut up a slice of a RAID array that sits somewhere on your network and attach it to any instance. This greatly enhances operational flexibility, allowing easy programmatic changes to hardware, but with any new solution there are challenges.

EBS grapples with the limitations of a shared network resource. Many servers and many customers will all be using that network storage, so their resource usage can potentially impact your server. Amazon's SLAs promise an average disk I/O throughput; however, that throughput can rise and fall dramatically in a given time period. This door swings both ways. When the disk subsystems are overused by multiple tenants, you'll receive less of the resource; when it becomes underutilized, you will receive more.

Next page: Disaster preparedness

Shop ▾
arrow up Amazon Shop buttons are programmatically attached to all reviews, regardless of products' final review scores. Our parent company, IDG, receives advertisement revenue for shopping activity generated by the links. Because the buttons are attached programmatically, they should not be interpreted as editorial endorsements.

Subscribe to the Business Brief Newsletter