Guide to Storage Virtualization
How to do storage virtualization right
By Galen Gruman, CIO, 09/11/07
When Roland Etcheverry joined chemical company Champion Technologies two years ago, he looked around and realized he needed to remake the company's storage environment. He had done this twice before at other companies, so he knew he wanted a storage-area network (SAN) to tie the various locations to the corporate data center, as well as to a separate disaster recovery site, each with about 7TB of capacity. He also knew he wanted to utilize storage virtualization.
At its most basic, storage virtualization makes scores of separate hard drives look to be one big storage pool. IT staffers spend less time managing storage devices, since some chores can be centralized. Virtualization also increases the efficiency of storage, letting files be stored wherever there is room, rather than have some drives go underutilized. And IT can add or replace drives without requiring downtime to reconfigure the network and affected servers: The virtualization software does that for you. Backup and mirroring are also much faster because only changed data needs to be copied; this eliminates the need for scheduled storage management downtime, Etcheverry notes.
Better yet, he will save money on future storage needs, because his FalconStor storage management software combines drives from multiple vendors as if they were one virtual drive, letting Etcheverry avoid getting locked in to the expensive, proprietary drives that array-based storage systems often require.
Although storage virtualization technology is fairly new, it's quickly gaining traction in the enterprise. In 2006, 20% of 1,017 companies surveyed by Forrester Research had adopted storage virtualization. By 2009, 50% of those enterprises expect to. And the percentages are even higher for companies with 20,000 or more employees, the survey notes: 34% of such firms had deployed storage virtualization in 2006, and that will climb to 67% by 2009.
But storage virtualization requires a clear strategy, Etcheverry says. "A lot of people don't think much about storage, so they don't do the planning that can save costs," he says. Because storage virtualization is a very different approach to managing data, those who don't think it through may miss several of the technology's key productivity and cost-savings advantages, concurs Nik Simpson, a storage analyst at the Burton Group.
Strategically, storage virtualization brings the most value to resource-intensive storage management chores meant to protect data and keep it available in demanding environments. These chores include the following: replication to keep distributed databases synchronized; mirroring to keep a redundant copy of data available for use in case the primary copy becomes unavailable; backup to keep both current and historical data available in case it gets deleted but is needed later; and snapshots to copy the original portions of changed data and make it easier to go back to the original version. All these activities have become harder to accomplish using traditional storage management techniques as data volumes surge and time for backup chores decreases.
Because storage virtualization technology used for these purposes copies just the individual parts of changed data, not entire files or even drive volumes as in traditional host-based storage architectures, these data-protection activities are faster and tax the network less. "You end up transferring 40% or 50% less, depending on the data you have," says Ashish Nadkarni, a principal consultant at the storage consultancy GlassHouse Technologies.
This efficiency lets a CIO contemplate continuous backup and replication, and enables quick moves to new equipment in case of hardware failure. "We can add new storage as needed and have data transferred in the background, without the users even knowing," says Ryan Engh, IT infrastructure manager at the investment firm Wasatch Advisors, which uses DataCore's virtualization software.
Another advantage: "This prevents the states of the disaster recovery site and the production site from pulling apart," he says, a common problem in a traditional environment where the two data sets are usually out of synch because of the long replication times needed.
Moreover, the distributed nature of the data storage gives IT great flexibility in how data is stored, says Chris Walls, president of IT services at the healthcare data management firm PHNS, which uses IBM's virtualization controller. "That control layer gives you the flexibility to put your data in a remote site, or even in multiple sites," he says, all invisible to users.
Understanding these capabilities, a CIO could thus introduce 24/7 availability and disaster recovery, perhaps as part of a global expansion strategy. That is precisely what Etcheverry is doing at Champion. "We now have a zero-window backup, and I can rebuild a drive image in almost real-time," he says.
Some enterprises have gained additional advantage from storage virtualization by combining it with an older technology called thin provisioning that fools a drive into thinking it has more capacity than it has; this is done typically to create one standard user volume configuration across all drives, so when you replace drives with larger ones, IT staff does not have to change the user-facing storage structure. By adding storage virtualization, these standardized, thin-provisioned volumes can exceed the physical limit of any drive; the excess is simply stored on another drive, without the user knowing. "This really eases configuration," says Wasatch's Engh. That also reduces IT's need to monitor individual drive usage; the virtualization software or appliance just gets more capacity where it can find it.
For example, Epilepsy Project, a research group at the University of California at San Francisco, uses thin provisioning, coupled with Network Appliance's storage virtualization appliance. The project's analysis applications generate hundreds of gigabytes of temporary data while crunching the numbers. Rather than give every researcher the Windows maximum of 2TB of storage capacity for this occasional use, CIO Michael Williams gives each one about a quarter of that physical space, then uses thin provisioning. The appliance allocates the extra space for the analysis applications' temporary data only when it's really needed, essentially juggling the storage space among the researchers.
Storage virtualization comes in several forms, starting with the most established, array-based virtualization. Here, a vendor provides an expandable array, to which that vendor's drives can be added; management software virtualizes the drives so they appear as a common pool of data. You're typically locked in to one vendor's hardware but donÃ¢Â¬"t have to worry about finger-pointing among vendors if something goes wrong, says Forrester Research analyst Andrew Reichman.
Providers of such arrays include Compellent, EMC, Hewlett-Packard, Hitachi Data Systems, Network Appliance (NetApp), Sun and Xiotech. Reichman notes that several such array-based virtualization products, including those from Hitachi (also sold by HP and Sun) and NetApp, also support third-party storage arrays. The Hitachi array is "the only option for the high end," he says, while the others are designed for relatively small storage systems of less than 75TB.
The newer option, network-based storage virtualization, uses software or a network appliance to manage a variety of disk drives and other storage media. The media can come from multiple vendors, typically allowing for the purchase of lower-cost drives than the all-from-one-vendor options. This lets you use cheaper drives for non-mission-critical storage needs and allows you to reuse at least some storage you've accumulated over the years through mergers and acquisitions, says Ashish Nadkarni, a principal consultant at the IT infrastructure consulting and services company GlassHouse Technologies.
Providers of such network-based storage virtualization (often as a component of the SAN offering) include BlueArc, DataCore Software, EqualLogic, FalconStor Software, IBM, Incipient, iQstor and LSI. Current offerings tend to be for medium-size environments of less than 150TB, notes Forrester's Reichman.
Storage virtualization's newfound flexibility and control does have risks. "The flexibility can be your worst nightmare...it's like giving razor blades to a child," says Wasatch's Engh. The issue that storage virtualization introduces is complexity.
Although the tools keep track of where the files' various bits really are, IT staff not used to having the data distributed over various media might manage the disks the old-fashioned way, copying volumes with partial files rather than copying the files themselves for backup. Or when setting up virtualized storage networks, they might accidentally mix lower- performance drives into high-performance virtual servers, hindering overall performance in mission-critical applications, notes GlassHouse's Nadkarni.
Virtualization tools aren't hard to use, but it's hard for storage engineers to stop thinking about data from a physical point of view, says PHNS's Walls. "Everything you thought you knew about storage management you need to not bring to the party," he adds.
Another issue is choosing the right form of storage virtualization, network-based or array-based. The network-based virtualization technology is delivered via server-based software, a network appliance, or an intelligent Fibre Channel switch, and it comes in two flavors: block-level and file-level. Array-based virtualization is typically provided as part of the storage management software that comes with an array.
Array-based virtualization is mature, says Burton Group's Simpson. But it's limited to storage attached directly to the array or allocated just to that array via a SAN; IT usually must buy array storage from the array vendor, creating expensive vendor lock-in.
Network-based storage virtualization has been in existence just a few years and so has largely been offered by startups. It's the most flexible form of storage virtualization, says Forrester's Andrew Reichman, and lets you manage almost all your storage resources, even offsite, as long as they are available via the SAN. Although these tools can theoretically act as a choke point on your SAN, in practice the vendors are good at preventing that problem, he notes.
Most network-based storage virtualization products work at the block level, meaning they deal with groups of bits rather than whole files. While block-level network-based storage virtualization is the most flexible option, the technology typically requires that an enterprise change its storage network switches and other network devices to ones that are compatible, Nadkarni notes. "But no one wants to shut down their SAN to do so," he says. Although you can add the technology incrementally, that just raises the complexity, since you now have some virtualized storage and some nonvirtualized storage, all of which need to be managed in parallel.
Thus, most organizations should consider adopting network-based storage virtualization as part of a greater storage reengineering effort, he advises.
That's exactly what both Champion's Etcheverry and PHNS's Walls did. Etcheverry brought virtualization in as part of an enterprisewide storage redesign, while Walls brought it in as part of adding a new data center and disaster recovery site.
In both cases, all the setup work happened in a nonproduction environment and could be tested thoroughly without affecting users. Once the two IT leaders were happy with their new systems, they then transferred the data over and brought them online. That meant there was only a single disruption to the storage environment that users noticed. "This was a one-time event," Walls notes.