Guide to Information Management: Data Classification, Search and Management
How exactly does information management software work?By Deni Connor
Information management software incorporates a variety of functions that make useful information out of the glut of data stored on heterogeneous network attached storage (NAS) appliances, file servers and storage area networks (SANs).
As much as 80% of this data -- intellectual property, product plans, customer information, sales forecasts and personnel records -- exists as unstructured and semi-structured content in the form of word processing documents, e-mails, Adobe PDF files, spreadsheets, images, video, audio and content from Web sites. The remainder of the data consists of structured transactional data from databases.
In order for this data to be useful to the organization for litigation discovery, archiving, lifecycle management or improved business intelligence purposes, it needs to be able to be discovered, indexed, classified, searched for and reported upon according to policies an IT administrator sets.
Discovery of data consists of electronically scanning for information on file servers, NAS and SAN devices and gathering information into a common repository that can be indexed, classified and searched.
Classify and index
Once data has been discovered, it is classified and assigned a metadata reference that contains the name of the file, its size and other information that identifies it to the system. From there it can be acted on depending on rules IT sets. For medical images, IT would set a retention date for the data – in the case of pediatric records for instances, the retention period may be 21 years. Adult images that have not been accessed in more than a year may have a rule applied to them that specifies when they are migrated to secondary storage.
Data is then indexed and information such as the creator of the file, the date of last access and its format are stored. It is then grouped into categories once again based on rules the IT manager sets so that it can be managed, moved, copied, deleted, encrypted or take some other action with the data.
With the classification groups created, the next step is creating policies that manage them. A policy consists of rules that define the characteristics of the data – ownership, age or content -- and the actions that must be performed on data matching the filter. Actions might include copying the data to a different storage tier, copying or moving it off to an archive device for compliance or packaging it for e-discovery processing by corporate legal or human resources representatives.
For more information on data classification techniques, read this story on the topic.
Search and analyze
Searches of data can also be performed with either separate software or with integrated search capabilities. These searches are done based on parameters the IT administrator sets. In e-discovery, for instance, the search may consist of the discovery of all e-mails from 'Frank Green' to the manager of a hazardous waste disposal facility. For information lifecycle management, search and discovery may be of any file that has not been accessed in 180 days and which can be deleted. A search of structured database information, may relate all invoices for a customer in the last year or establish links between customer data.
Once data has been collected, classified and searched, detailed reports may be necessary. In the case of e-discovery or compliance these reports provide an auditable chain of custody for evidence. They also may highlight files that are not appropriately secured.
Most information management software is supplied as a software based appliance that attaches to the network via Gigabit Ethernet. A Web console lets users create, define classifications and rules and actions that will be taken on the collected information.
Most software packages from vendors such as StoredIQ, Kazeon, Njini are also tailored to the specific user benefit, whether it is e-discovery, information lifecycle management, business intelligence or information privacy.