Guide to Information Management: Data Classification, Search and Management

Six steps to ILM implementation

By Mike Karp, Network World, 10/28/06

Many things can spur a company to kick off an ILM project, but two reasons lead all the rest: a desire to implement storage tiers to reduce costs and the need to align corporate IT practices with regulatory compliance demands.

There is no need to do ILM if you do not have to, though the odds are, you do.

First, determine whether your company's data is answerable to regulatory demands. If you work for a U.S. company, this likely means checking out the California Privacy Law Compliance (SB1), Gramm-Leach-Bliley, Health Insurance Portability and Accountability Act, Real Estate Settlement Procedures Act or the Sarbanes-Oxley Act.

If you do business in Europe, also check out Basel II compliance. This is complicated stuff, and no one expects an IT manager to know about it. Immediately set up a meeting with someone in your legal department to discuss this.

Large organizations have compliance officers who are there for this sort of discussion. Be prepared to learn that the list of regulatory requirements above does not begin to scratch the surface.


Second, determine whether your company uses its storage in an optimal manner. If you have lots of data online, some of which is quite old, and if you have only one kind of storage, there's a good chance that high-value and low-value data are intermixed at your site. That's a pretty good indicator that you need to re-examine your storage strategy. Keep in mind that if your shop is run according to service-level agreements and you frequently fall out of conformance with those objectives, something is very wrong.

If either of the above typifies your situation, go on to Stage 2.

Understanding the value of data lies at the heart of ILM, so this raises the question of what do you need to know in order to value the data properly? At the very least, identify the following: file type, users accessing the data and key words used. This can be accomplished only by meeting with the data owners.

First, make a list of regulatory requirements that may apply. Get this from your legal department or compliance office. Don't assume that the people involved in the next step are aware of these requirements. In many cases, they are not. Bring this list with you during the next step.

Second, define stakeholder needs. You must understand what users need and what they consider to be nonnegotiable. Engaging early with the various lines of business helps focus everyone on what is really necessary and sufficient, and puts a human face on IT (sometimes a good idea). The result of such engagements are SLAs that are driven by user requirements and give you a service-level objective, a targeted service level.

Third, verify the data life cycles. Everyone understands that data life cycles are a function of business value, but not everyone can take it from there. In some instances, key players can't even agree on when the data's value to the business shifts. Therefore, verify the value change for each life cycle with at least two other sources, a second source within the department that owns the data (if that is politically impossible, raise the issue through management), and someone familiar with the potential legal issues.

Fourth, define success criteria and get them widely accepted. Useful criteria are simple and easy to understand, and include cost savings, well-defined improvements in application or data availability, improved performance or recoverability and lowered incidences of being out of compliance with SLAs.

At this point it is time to identify the business value of each type of data object, which means understanding three things: what kind of data you are dealing with, who will be using it and what its keywords are. This stage is preliminary to doing the classifications. The strategy you deliver to your various organizations must emphasize mitigating risks in three areas: data security, data availability and data integrity. In the new nomenclature of IT, the way you go about this will be your policies.

First, create classification rules. This means assigning value based on criteria such as business importance, availability and performance requirements, and legal/regulatory/corporate governance rules. By doing this, you will be creating guidance that indicates where data is to be stored. The list, in the chart below, shows one way to go about this. The classification scheme will be your own, reflecting local needs, but whatever terminology you use, it will be useful to identify at least three classes of data.

Data class Description Attributes
Mission-critical The most valuable data, high access High performance, highest possible availability/protection, may require continuous data protection
Business-critical Valuable data, average access Good performance, good availability, less than eight-hour recovery
Fixed content Compliance or reference data Good performance, high availability, recovery typically depends on regulatory issues
Nonsensitive Rarely accessed, low value data that still should be kept online Low performance, low investment in hardware and protection services
Offline data All remaining data SOURCE: ENTERPRISE MANAGEMENT ASSOCIATES

Second, build retention policies. Establish linkage between each class of data and the hardware and services they require, including rules governing when each data class should be moved to the next storage tier. At the very least, this means determining correct storage tiers, security levels, degree of data protection and migration strategies. For example, files answerable to Sarbanes-Oxley will typically require disk-based backup for rapid retrieval; for others, backup to tape is fine. Don't forget the lowest level of the hierarchy, the offline archives. You will find that in many cases a data class may really refer to only a single data set, and that archiving for one class of data may not be at all suitable for another class.

Aligning data classes with the data life cycle predefines the events that will drive ILM policies, and plays a key role in enabling both automated resource provisioning and automated data migration. With few exceptions, this process will be the same at all sites - classification simply means aligning your stakeholders' business requirements to the IT infrastructure. Create a formal procedure that identifies each group's requirements, how it values its data and how satisfactory current performance levels are.

Talk is cheap, but working with vendors rarely stays that way. Most IT folks have been doing this for as long as they can remember, and the rules haven't changed just because they pertain to ILM products. In some cases, one vendor can supply all your needs. In most cases, however, that will not be true. Concentrate on vendors whose solutions can incorporate your legacy systems, and whose offerings include data classification capabilities. Don't be afraid to talk to consultants, read what the analysts have to say and compare notes with colleagues at other sites.

When you engage with the vendors, make sure to understand their products' capabilities in each of the following areas:

* Ability to tag files as compliant for each required regulation.

* Data classification.

* Data deduplication.

* Disaster recovery and business continuity.

* Discovery of compliance-answerable files across Windows, Linux, Unix and any other operating systems you may have.

* Fully automated file migration based on locally set migration policies.

* Integration with backup, recovery and archiving solutions already on-site.

* Searching (both tag-based and other metadata-based).

* Security (access control, identity management and encryption).

* Security (antivirus).

* Set policies to move files to appropriate storage devices (content-addressed storage, WORM tape).

* Finding and tagging outdated, unused and unwanted files for demotion to a lower storage tier.

* Tracking access to and lineage of objects through their life cycle.

The point in the process in which you bring a vendor's product on board will depend on the product's capabilities. If it does automatic data classification, a good rule of thumb will be to inject it into the process sooner rather than later. The ability to install and manage all of this in a manner that is nondisruptive to the workers at your site will carry significant value.

Do not be misled by the idea of a "data half-life." The value of data is not like an isotope, and almost never decays at the same rate throughout its life cycle. In fact, many data life cycles demonstrate that some data gains in value after having lost value over time.

The data-classification stage must be revisited periodically for each set of data. A likely time for this: when the next year's set of SLAs is being written. If the SLAs don't change, there isn't likely to be a need to change the classification criteria. By doing this you ensure that the ILM policies you build today will continue to align with future application and data availability, performance and other requirements.

In theory, most information within organizations is easy to classify. In reality, classifying data can become quite subjective. Nowhere will this be more evident than with unstructured data.

Fortunately, several products are able to classify data from such vendors as from Abrevity, Index Engines, Kazeon, Njini and StoredIQ, among others. Just about every IT site will find that a product that classifies data and then automates its migration across the infrastructure provides excellent ROI. Once these steps have been completed you can take any actions necessary, and can look for solutions to automate the needed processes.

First comes the pilot project. In one sense at least, ILM projects are no different from any other: Test the waters before jumping in. A well-tested IT rule of thumb is as applicable here as it is with any other large IT initiative: Validate everything (strategy, procedures, the whole lot) with a pilot project before full corporate cutover.

A helpful hint: Identify an IT service that everyone interacts with (the most likely candidate at just about every site is e-mail) and begin there. Another way to view this is to choose a project that offers aggressive ROI, because it is likely to reduce costs of storage or management, or because it will likely provide measurable improvements in performance or in meeting service levels.

Phase in the next data sets and move to full implementation. After incorporating what you learned during the previous step, things get much easier. Determine some appropriate order for the phase-in. Again, this will be site-dependent. An encouraging word: Successful early deployments will have made your team comfortable with the process, and will enable them to extend implementation to business-critical systems with greater ease. Success will have bred success.

Subscribe to the Daily Downloads Newsletter

Comments