Guide to Data Backup and Replication

How do you back up virtual environments?

By Deni Connor

When it comes to backing up virtual servers, IT administrators have a lot of choices. But increasingly, they're finding one method of backup isn't enough to satisfy all the demands of a virtual environment.

Some are opting for a combination of methods, such as using agent-based and serverless backup for protecting data on virtual machines, then using cloning or snapshot technology for protecting and recovering server images in the event of a hardware failure.

For instance, Ben Moore, a systems engineer for Mission Hospitals in Asheville, N.C., uses a multilayered approach.

Moore has had VMware ESX Server installed since October 2006 on 10 physical servers with 60 virtual machines. His method of choice is VMware Consolidated Backup (VCB) in combination with Tivoli Storage Manager.

"We use VCB piggybacked on our existing Tivoli Storage Manager backup software," Moore says. "You put your backup software on a proxy server that is aware of the storage space for your virtual machines, and then the VCB integration scripts allow you to have visibility of your virtual machines and that storage."

Michael Passe, storage architect for Beth Israel Deaconess Medical Center in Boston, likewise uses a multilayered approach.

Passe has 10 VMWare ESX servers hosting about 100 virtual machines. He uses Vizioncore's esxRanger once a week to capture the Virtual Machine Disk Format (VMDF) image-level backups. (The VMDK describes the format and storage of the file containing the virtual machine instance.) He also uses Symantec Veritas NetBackup agents on Linux and Windows guest operating systems daily for file-level restore capability.

Passe is testing VCB.

For his part, Robert Berry, director of technical services for magazine publisher Homes & Land in Tallahassee, Fla., has been using Swsoft's Virtuozzo for a year to virtualize 15 physical servers into 39 virtual private servers.

"We span our virtual machines across two separate hardware nodes. If one was to crash, we just clone from the one and bring it over to another," Berry says.

"I do a normal backup of the Progress database with the Progress backup tool. I back up the Progress database to the virtual machine, and then back up the virtual machine using Virtuozzo's Management Console and vzbackup utility," Berry adds.

At Springfield Technical Community College in Springfield, Mass., backup technology is critical to providing a safe environment for students to learn about operating systems.

Sam Jamrog, an adjunct professor at the college, uses VMware workstation to teach students about Linux and security. "We have about 25 machines with VMware workstation installed," Jamrog says. "All my Linux students have a virtual Linux machine on their Windows XP host."

Jamrog says one of the biggest difficulties in teaching students about operating systems is the fear of ruining the system. Backup utilities can help minimize that fear.

"We use snapshots [a capability included in VMware Workstation] to keep several images of the OS on their system," Jamrog says. "It takes a few minutes to create a snapshot, but only seconds to restore a broken machine. That way they can fearlessly make changes to their virtual machines, and if they do have a problem they can simply revert back to a working system."

"For security my students will clone several machines and create a 'virtual network' of operating systems. They will also make snapshots just in case they do damage to a virtual machine. This way they can practice security scans on a virtual network locked off from the school network, without much fear of getting into the school network by mistake," Jamrog adds. "They can get as aggressive as they want with the security scans, as any problem they create is one click away from being fixed in seconds."

7 tips for better backups

By Deni Connor

When backing up and protecting your systems, don't just pay lip service to the best practices. When shopping for backup and recovery tools, make sure you choose products that help you perform these actions:

Just backing up your network isn't good enough. The reason you backup your network is so you can recover data if it is lost or in some way corrupted. You need to make sure that the copies of data you make are valid. Errors may occur in backing up servers – the backup window may not be large enough to accommodate the backup, tape malfunctions can occur, the backup may not capture every bit of data that it should or orphan servers, or volumes may exist that the backup process does not know about. You need to test the recoverability of your backups to ensure that business-critical data is being protected.

To have an effective backup strategy, you need to define your recovery time objectives and recovery point objectives for the different applications running on your network. The recovery time objective is the elapsed time between the occurrence of the disaster or loss of data until business operations are restored. The recovery point objective is the point in time before the disaster to which data will be restored. You need to classify your applications based on these objectives. Recovering a database first may be high on your list, followed by e-mail and finally, front-office document production.

While disk-based backups are increasing because they are faster to back up and recover than tape, you should consider also backing up to tape or replicating data to an offsite location. Doing this lets you extend your data protection, while preserving it at the primary location. If you don't back up to tape, consider some of the removable drives now coming to market – they serve not only as a reliable backup but they are also portable. If you replicate data to another location, consider the cost of that replication and the type of data that is being replicated. Synchronous replication, in which each data transfer is acknowledged, can be very expensive; asynchronous replication is less expensive, but may not be reliable enough for business-critical applications.

Backup reporting software will help you identify volumes or servers that are orphaned and are not being backed up. It will also let you evaluate your backup windows and determine if all applications and servers can be protected within it. Backup reporting software can identify media health and capacity, as well as tape library volumes that need to be ejected and shipped off-site. It can show drive utilization and let the user improve load balancing.

Whether it's encrypting backup tapes destined for off-site storage or encrypting data on disk, compliance and data leakage laws require it. Encrypt USB keys traveling out of the organization. Both hardware-based encryption appliances and software-only packages are available. Also consider what data you encrypt – it may not be necessary to encrypt applications, just the customer data those applications generate.

Virtual tape library technology will de-duplicate data during backups over time, often resulting in a savings of 20 to 1. Virtual tape libraries are becoming common because they emulated tape libraries and management. Consider the impact on your network of virtual tape libraries – some de-duplicate data as it is being backed up; other de-duplicate data after the backup process. Performance and the amount of disk space allocated to the backup are primary concerns.

That means that you back up not only all servers in your environment, but also the applications, laptops and desktops. In backing up the servers, plan on protecting the configuration files, updates and security patches. When backing up applications, be sure to also back up the data associated with the application and any log or configuration files. Implement software that backs up the laptops and desktops in your environment. Make provisions for the portable assets in your environments.

The new backup tools you can't live without

By Beth Schultz

Monitoring backups always has been one of those unglamorous IT chores. Over the last several years, however, numerous vendors have taken backups from boring to remarkable as they roll out heterogeneous backup-management tools that perform so many new functions.

Spun off from the broader storage-resource management market, these tools monitor and report on backups across multiple vendors' backup products. In doing so, they can ease the auditing process. They create a way to implement chargeback programs for backups. They let network executives offer and verify service-level agreements for backups, and more.

Heterogeneous backup-management tools are available from various niche vendors including Aptare, Bocada, CommVault and WysDM Software, as well as such infrastructure vendors as EMC and Symantec. An enterprise might be running EMC's Legato Networker, IBM's Tivoli Storage Manager and Symantec's Veritas Backup Exec, but with backup-management software, an IT administrator can get an at-a-glance, big-picture look at what's happened with all those operations from a single console, in real time and historically.

Some vendors take a traditional client/server approach to backup management. An example is EMC's Backup Advisor, in which agents sit on production servers and backup hosts feed system information into the backup-management server residing on the network. More typical is the agentless approach, favored by Bocada, WysDM and others, in which backup-management software gathers statistics through scheduled polling.

These tools are getting ever more sophisticated. In February, start-up Illuminator released Virtual Recovery Engine (VRE), which coordinates reporting of backup applications and other data-protection technologies. In addition, the software associates that information with the application data, so IT executives get an easy view of the backups connected to every data set, says Yishay Yovel, a vice president with the vendor (see screenshot). The initial release provides interfaces to storage arrays and point-in-time copying, replication and backup applications from EMC and Network Appliance.

Given the rise of the dynamic, open New Data Center, products that provide a centralized, heterogeneous view of the data-protection infrastructure are a huge boon. Suddenly, monitoring SLAs gets a lot easier.

WysDM for Backups software, for example, uses a predictive-analysis engine to spot potential SLA problems. The engine learns the normal behavior patterns of the data-protection infrastructure, then flags discrepancies, says Jim McDonald, CTO and co-founder of WysDM Software, one of the pioneers in heterogeneous backup management.

For example, if the engine notices the backup of financial data is taking five minutes longer each night, WysDM for Backups could notify IT that if it doesn't address the situation, it will fall out of SLA compliance in X amount of time. Another example: The engine might notice the absence of a nightly 3GB-file-system backup. Because that doesn't fit normal behavior -- and could put IT out of SLA compliance -- the software would issue an alert, he says. "This is the difference between just having technical output of backup information and providing business protection."

Steve Frewin, storage administrator for TD Banknorth, a banking and financial-services company in Portland, Maine, and wholly owned subsidiary of Toronto-based TD Bank Financial Group, says he is using backup-management functions within Symantec's broader Command Central Service software for a daily health check. That differs from what he had to do in the old days when a server didn't complete its backup within SLA windows: gather the backup's parameters manually.

Having the historical perspective also helps him provide better advice when IT administrators ask how additional data volume would affect the backups. "I can now say, well, we're just barely meeting that SLA now, if you add [100 gigabytes], that's going to be a problem," he says. "Before, all I could say was, 'Well, I think we're out of room' -- that doesn't go far with management."

With these tools, auditing and compliance become painless, some users say. That seems to be the case at Catholic Healthcare Partners (CHP), which uses Bocada's Bocada Enterprise to manage its backup operations and ease audits. "We use Bocada to print reports and say, 'Here you go -- it's done.' We can customize reports to auditors' needs, and that's so much easier than writing long SQL statements to gather the information," says Brian Witsken, the lead storage-management systems engineer for the 30-hospital, nonprofit healthcare system in Cincinnati.

Bocada, one of the first vendors to offer a heterogeneous backup reporting tool, has more than 400 enterprises using its software. According to Nancy Hurley, a vice president with the company, 75% of those users say Bocada reports are essential in helping them pass audits.

Illuminator also pitches itself as an antidote for auditing headaches. With VRE, users can recover application data on request and show exactly what assets are in place and how the data is protected. It also shows the processes in place to fix any problems that occur in the data-protection environment. And if a company can show how well organized it is about its compliance processes, "who knows, then maybe next time the auditor won't even ask [about the processes] and instead just agree to look at the reports," Illuminata's Yovel says.

Illuminator's Virtual Recovery Engine provides an at-a-glance look at enterprise backup and recovery processes. In this case, it shows data-protection processes for a critical Oracle database used in a trading application.

For the full story, please go here.

Data de-duplication changes economics of backup

By Miklos Sandorfi

The ability to de-duplicate backup data - that is, back up or copy only unique blocks of data - is rapidly changing the economics of data protection.

Data volumes are growing exponentially. Companies are not only generating more primary data but also are required by government regulators to back up and retain that data many times over its life cycle. With a retention period of one year for weekly full backups and 10 days for daily incremental backups, a single terabyte of data requires 53TB of storage capacity for data protection over its life cycle. Backing up, managing and storing this data is driving up labor costs as well as power, cooling and floor space costs.

That's the bad news. The good news is the cost of disk storage is decreasing, making it increasingly attractive for secondary storage.

And data de-duplication technology - typically found on disk-based virtual tape libraries (VTL) - can help control data growth by backing up and storing any given piece of data only one time.

VTLs are disk-based systems that emulate tape technology to enable enterprises to install them in existing environments with minimal disruption. De-duplication software (available on some VTLs) stores a baseline data set and then checks subsequent backup sets for duplicate data. When it finds a duplicate, it stores a small representation of it that enables the software to compile and restore complete files as needed.

There are two main data de-duplication methodologies: hash-based and byte-level comparison-based. The hash-based approach runs incoming data through an algorithm to create a small representation and a unique identifier for the data called a hash. It then compares the hash with previous hashes stored in a look-up table. If a match is found it replaces the redundant data with a pointer to the existing hash. If no match is found, the data is added to the look-up table. But using a look-up table to identify duplicate hash strings can put a significant strain on performance and may require several weeks to achieve optimal de-duplication efficiency.

A more efficient method simply compares items on an object-by-object level; for example, comparing Word documents to other Word documents. Some technologies perform this comparison using a pattern-matching algorithm. However, a more efficient technology uses intelligent processes that analyze the back-up files and the reference data set to identify files that are likely to be redundant before comparing the two files in more detail. By focusing its activities on suspected duplicates, it can de-duplicate more thoroughly and avoid processing new files unnecessarily.

Some technologies perform the de-duplication as the data is being backed up. This inline de-duplication slows backup performance and adds complexity to the backup. Other technologies perform out-of-band de-duplication in which they back up the data first at full wire speed and perform the de-duplication afterward.

Byte-level de-duplication can provide up to 25:1 data reduction ratios. When combined with compression technology - a typical VTL feature - enterprises can store 50 times more data in the same space without adding capacity. This dramatic reduction enables companies to store more data online and keep it online longer, leading to labor savings and the advantages of keeping data on disk.

Storing data on disk, for example, takes up less physical space than tape, and significantly reduces power, cooling, security and other operating and infrastructure costs (according to a recent Gartner report, by 2008 50% of current data centers will have insufficient power and cooling capacity to meet the demands of high-density equipment).

Other benefits include:

* Longer online data retention - A 50:1 capacity reduction for a typical mix of business data (e-mail and files) means data can be maintained online longer to meet increasingly stringent business/regulatory service-level agreements.

* Decreased workload, increased reliability - An enterprise with a 65TB data store that is growing at a typical rate of 56% annually and is backed up weekly would typically require two racks of disk storage using de-duplication vs. 49 racks without. By reducing the number of racks required and the number of disks spinning, the reliability of the overall system is increased; and the power, cooling and administration required is significantly reduced.

* Enable faster backups and restores - Appliance solutions that de-duplicate outside the primary data path can deliver unimpeded wire-speed Fibre Channel backup and restore performance in the many TB/hr range.

• Eliminate physical threats to data - Unlike physical tapes that can be lost, stolen or damaged, data on disk is maintained in a secure, highly available environment.

Data de-duplication changes the economics of data protection by making the cost of backing up to a VTL significantly less expensive than just disk-based data-protection solutions.

Data de-duplication is an important way for data center managers to address the spiraling cost of energy, labor and space, and to manage the impending shortage of power and cooling capacity.

Subscribe to the Power Tips Newsletter

Comments