How to Move a Data Center Without Having a Heart Attack
As the dust settles in the aftermath of a successful physical data center move, I'm nursing my bruised and cut hands, kicking back with a Scotch, and reflecting on what went right. I said "successful," but actually there's no such thing as a failed data center move: If something's going wrong, there's nothing you can do except keep working until everything's up and running.
But a successful data center move is no accident. Whether it's a data center relocation or new data center build-outs, detailed plans must be made months or even years in advance.
[ Also on InfoWOrld: See Paul Venezia's guide to "Killer open source monitoring tools." | Stay current with "What IT should know about AC power." | And when you're ready to kick back, try "InfoWorld's Linux IQ test." ]
There are a number of ways to move a data center. If budgets and skill sets allow, the easiest method is to build a brand-new data center in the new location, drop high-bandwidth links between the two, and use virtualization tools to migrate all the virtual machines from one site to the other -- live.
This assumes a fully virtualized infrastructure and a massive budget since you're duplicating the whole thing at the new site. And although very expensive, this method offers low-to-zero downtime and a brand-new computing environment with new servers, storage, and core networking at the new site. Plus, time and scheduling considerations are far less stringent. However, it's also out of the budgetary ballpark for most organizations.
Then there's the hybrid method, where some portion of the new data center is built out before the relocation, such as racks and core networking. When the time comes to relocate, the old data center is shut down and the servers and storage are physically moved to the new location, reracked, recabled, and then brought back online.
This offers a far lower cost than the duplication method, but involves at least a day of downtime and the threat of data and service loss. It's also performed under the gun, as critical services and applications are down for the duration, and when a storage array stubbornly refuses to come up properly, that downtime grows as the new problem is dealt with.
Next there's the "kitchen sink" approach, where the new data center is provided with power and cooling only and everything is moved from one site to the other: racks, servers, network, storage, the whole shebang. This is the cheapest method, but is also necessarily the hardest and longest process of all.
Most companies involved in an office or data center relocation will opt for some blend of the last two methods. The hugely expensive first method is essentially guaranteed to succeed and offers plenty of time to get everything just right. But the way to ensure that the other approaches go smoothly is with copious amounts of planning, such as anticipating the worst possible scenarios prior to the actual move.
As an example, let's look at the data circuits. The existing data center may have a few fiber connections from a T1 carrier that connect it to the Internet and a WAN. Without those circuits, the data center is functionally useless, so they get priority. However, you can count on the fact that the carrier will take forever to build out the circuits to the new space, SLA be damned.
It's best to assume that, even given four or five months' notice, those circuits won't be in place when the relocation date arrives. Hedge your bets with one or two business-class cable circuits. These are generally installed much faster than dedicated fiber links or T1/T3 circuits, and can get you by with them in a pinch. It's not ideal, but it's better than nothing when the carrier finally realizes it needs to run new fiber from the street to the new location, requiring city permits, traffic detours, and lengthy delays.
It's also a good idea to have plenty of hands available. While a few core admins will be responsible for actually bringing the site back online, a dozen or more people who can be trusted to securely transport and rack servers, storage, and network gear are invaluable. When the senior network admin is neck-deep in switching and routing reconfigurations, you don't want him distracted by quandaries like how the hell that blade chassis rail kit goes back together.
Also, draw up explicit plans specifying which servers and other gear will go where in the new site. Take this down to the rack-unit level, so there's no mistaking what system should reside in which rack. This will speed up the rebuild and make cabling much easier. Speaking of cabling, this might be time to take a good look at alternative solutions for rack access. If you've traditionally run copper to all racks from core switches, you might look for room in the budget for top-of-rack switching and bundled or 10G uplinks to the core.
Label everything: the servers, the switches, the KVM dongles, and all the rail kits as they come out of the racks. Few things are more frustrating than spending an hour searching for the other rail for a mission-critical database server while the project comes to a standstill. Also, take tons of pictures of the old data center in situ and the new data center before, during, and especially after the relocation.
Though it may not need to be said, make sure the folks transporting your gear are good drivers. Smaller relocations may find servers and other gear riding in SUVs and small trucks, while larger moves may leverage entire racks rolling from loading dock to box truck to the new loading dock. The hardware and data contained within those vehicles is more important than anything else in the company, and when it's rolling down the highway at 70 mph, the risks of losing some or all of it are extremely high. Putting an eager intern behind the wheel of a rented truck might be a bad idea.
Finally, once all that gear arrives and is racked up and before powering anything on, take time to inspect the data and power cabling, cable pathing, power loads on new PDUs -- and go even further by reseating blades in chassis, modules in modular switches, and hot-swap power supplies. With all the jostling and bouncing these systems have just experienced, you don't know what might be floating around.
Be very aware of temperature differences between the data center and outside. Rolling a recently shutdown core switch that came from a 75-degree data center with an internal temp still in the 90s to a 20-degree outside loading dock can result in catastrophic problems, because circuit boards can rapidly contract in the cold and crack.
Also, if the relocation involves a full office with hundreds or thousands of switchports, make sure you have an elegant method to assign those ports to the right VLANs. Some infrastructures use dynamic VLAN assignments according to login credentials, but others use fixed assignments. In the past, I've written custom code leveraging wildcard DNS and locked-down VLANs to facilitate self-service VLAN assignment. When users arrive at the new site, they plug in their PC to a data jack, open a Web browser, and are presented with a Web application that allows them to choose the appropriate VLAN for their system.
The back-end code of this Web app makes an SNMP call to the appropriate switch and reassigns the VLAN for that port. A few seconds later, the user is ready to go. This is also handy when dealing with printers and other network devices, because admins can log into the tool and assign ports to VLANs that normal users should never see. Tools like this can save massive amounts of time and aggravation.
Once everything's in place, fire everything up and watch your monitoring systems to make sure that everything comes up normally. This is only one reason why an exhaustive network and service monitoring implementation is absolutely necessary, being that it takes almost all the guesswork out of the situation. But when the new site is up and running and everything goes according to plan, the Scotch tastes even better and the scratches and scrapes don't matter so much. Trust me. I'm feeling pretty good right now.
This story, "How to move a data center without having a heart attack," was originally published at InfoWorld.com. Read more of Paul Venezia's The Deep End blog at InfoWorld.com. For the latest business technology news, follow InfoWorld.com on Twitter.