Data Storage: Be Proactive, or Pay a High Price

In my last post, I talked about our increasing dependence on huge, rapidly growing piles of data that must be backed up and duplicated in turn -- compounding the data explosion problem. An astute reader posed a question I think all of us have asked in one way or another, but rarely tried to answer: Why is our data growing so rapidly?

It would be easy to respond to that question by trotting out the usual suspects. For example, high-definition technologies such as medical imaging, document imaging, and video certainly eat up lots of space. And who can deny that compliance regulations have forced us to retain more and more documents and messages?

Yet these easy explanations miss an obvious part of the equation: In large part, our data is growing so quickly because it is far, far easier to create data than it is to get rid of it. It takes work to police our data after it is created, and to be blunt, we've become too lazy or busy to deal with it.

I will be the first to admit that I am an excellent example of the problem. I am an email pack rat. At the day job, my mailbox is larger than 2GB. My personal account is easily two or three times that size. If everyone was like me -- thankfully, most are not -- we'd be in serious trouble.

My excuse for this deplorable behavior is that I never know what I'm going to need again. I often find myself in the position of needing to remember what I did three or four years ago -- say, digging out a license key that a client has misplaced. If I took a hatchet to my email and ditched everything older than six months, I guarantee that in two weeks or less I'd be without something that I needed. The only realistic solution for me is to go back through my email, reread all of it, and delete the stuff I know won't have any future significance.

Technology can't do that for me. All of the archiving, deduplication, and compression in the world might shrink the data and make it easier to search, but it won't magically get rid of it for me. I have to do that. And you know what? I'm not going to. Because it would take me a massive amount of time -- that I have too little of -- to do accurately. The few hundred dollars' total cost of ownership of having that data sitting on a server that's attached to a SAN somewhere is simply not worth the time it would take me to free it up.

