Data Center Don't: Unplug First, Ask Questions Later
It's easy to remember those users who call the help desk all the time complaining about silly things that turn out to be either nothing or problems of their own making. But what happens when someone from the IT lines makes a huge mistake? That person becomes a legend -- and the error can unearth other problems.
This story happened while I was working at a large, non-U.S. government organization. An IT guy, let's call him "Robert," was asked to decommission servers in the main data center that were offline without a label and no longer used. The idea was to clear up space, gain a more precise idea of all the hardware present in the building, and determine what team was using it and for what. If there was any doubt about a piece of equipment, Robert was to first figure out who was in charge of it, then wait until they gave the all-clear before proceeding.
Like any data center, ours had many servers and additional equipment, some more important than others. Because we were a large government organization, the perimeter firewalls controlled a vast amount of Internet traffic: companies and organizations from both inside and outside the country, remote clients, all internal traffic, all consults to the databases, and more.
Those firewalls were controlled by a central management server, which sent them all the policies and configurations. Any change needed to the firewalls had to be made in this server. And there wasn't a way to restore this management server from the firewalls, according to the manufacturer. If a firewall rebooted, it loaded without anything and asked the management server for the config file. At that point, everything would go up as usual.
Robert, wandering through the data center on his assignment and armed with his Excel spreadsheet, decided for some reason that this management server wasn't doing anything important. He turned it off, unplugged it from the network, took it to his desk, and started formatting the hard drive.
He wasn't a spy, nor a Communist trying to take down the capitalist world. He just wasn't careful enough in the production data center.
Of course, the network team started investigating -- and there were a lot of screams when they saw that the server wasn't physically in the data center.
When they finally tracked the server down at Robert's desk, they discovered that Robert hadn't seen the label on the server and had somehow failed to notice that it was powered on and in use. He'd decided -- without investigation -- that it was no longer needed.
The network team rushed to get everything restored, then ran into a bigger problem: There wasn't a backup of this server. It turned out that those managing it didn't realize that the database was saved there. They'd thought that if there was a problem they could just reinstall from the firewalls without losing anything. They had backups of the firewalls, but not of the server.
It took about five months to repair this damage, rebuilding mostly everything from scratch.
As you may imagine, the managers were not happy with Robert, and he was transferred to another position in the IT department. The data center best practices got an overhaul and all the staff a refresher course. And that server got backed up regularly from then on.
Do you have a tech story to share? Send it to firstname.lastname@example.org, and if we publish it, you'll receive a $50 American Express gift cheque.
This story, "Data center don't: Unplug first, ask questions later," was originally published at InfoWorld.com. Read more crazy-but-true stories in the anonymous Off the Record blog at InfoWorld.com. For the latest business technology news, follow InfoWorld.com on Twitter.