Quantcast
RSS

Whoop de Doop for De-Dupe

De-duplication started out as a way to do backups without having to store mostly the same stuff over and over again. Companies like Data Domain, Diligent Technologies, and NetApp provided de-dupe of virtual tape libraries and direct-to-disk backup targets, providing full backups that stored only the changes since the previous backup. The result: You could reap the same space savings you get with incremental backups but without the necessity for multiple restores to re-create an entire volume.

Now these same companies are advertising de-duplication of near-line storage, and even online storage in NetApp’s case, while other vendors are using de-duplication to reduce WAN traffic, shrink the size of databases, or compress e-mail archives. Yes, de-duping is going gangbusters. Heck, we might even dream of the day when you might never need more than one copy of any file throughout the entire enterprise. Assuming it’s possible, is that something you’d want?

Currently, all storage de-duplication requires a gateway between the server and the storage. Methods of de-duplication vary widely. Some solutions function at the file level, some at the block level, and some work with units of storage even smaller than blocks, variously referred to as segments or chunklets. Processing for de-duplication can occur either "in-line" (i.e., before the data is written to storage) or "post process" (meaning after the data is initially written).

There are applications where de-duplication is extremely effective, and ones where it isn’t. If data is largely the same, such as multiple backups of the same volume or boot images for virtual servers, de-duplication can provide enormous reductions in the storage space required. However, dynamic data, such as transactional databases or swap files, will show very little reduction in size and may also be sensitive to the latency introduced by de-duplication processing. In the case of databases, though, de-duplication can in fact improve I/O performance and speed up some queries (see "Oracle Database 11g Advanced Compression testbed, methodology, and results").

But the biggest issue with de-duplication is that it creates a choke point: All data to be compressed must be saved and retrieved through the de-duplication gateway. This isn't much of an issue with backups or even near-line archives. But for applications where access to the data becomes critical, or usage is heavy, the gateway becomes a hot spot, requiring redundant gateways, dual-path SAN infrastructure, and redundant storage. Given the investment necessary to support live data, where even short interruptions to access would cause major problems, it is typically cheaper to live with multiple copies.

There’s no question that de-duplication can provide great benefits in specialized applications, including backups, e-mail archives, and other cases where data is largely repetitive, such as VMware boot images. However, a fully de-duplicated enterprise, even if feasible, would require a massive and expensive infrastructure. Given that disk capacity continues to grow in leaps and bounds, scaling out de-duplication will be difficult to justify. It’s cheaper to keep buying more local storage than to put all the eggs in one basket.

Was this article useful? Yes 0 No 0
Add Yours

Comments Readers reply with their ideas and expertise.

Subscribe to this discussion via email or RSS
  • What do you think?

  • Great year-end deals for small business!
  • Get 24/7 live remote AT&T Tech Support 360* service along with select Lenovo* PCs (with Intel® Core™ 2 Duo processors and save up to 200!

    Learn more

  • HP EliteBook* 6930p Notebook with Intel® vPro™ technology and a free HP Basic Docking Station - $641 instant savings!

    Learn more

Business News Daily

Get the latest technology news that's important to you and your business, fresh seven days a week.

Featured Webcasts

Free Whitepapers

Software and Services Whitepapers from PC World

More whitepapers »

Whitepaper Alerts

Get updates on white papers, case studies, and spotlights on tech products and solutions for your business.

PC World's Marketplace

Sponsored Links