At first glance, Microsoft’s acquisition of document e-discovery software maker Equivio looks like another key addition to Office 365’s growing portfolio of document management tools for major industry verticals. Equivio supports law firms with a set of tools that automatically generate relevance-based indexes from large streams of text, helping attorneys locate information vital to their cases.
But a big key to today’s acquisition may have just been delivered to Equivio by the U.S. Patent and Trademark Office. Patent number 8,938,461, issued just today, describes a “Method for organizing large numbers of documents” that may very easily apply to a future document management system for premium Office 365 subscribers.
Today’s patent, applied for in July 2010 by Equivio CEO Amir Milo and vice president for engineering Yiftach Ravid, describes a system for detecting and sorting near-duplicate documents, especially emails. While it’s relatively easy, even with a “big data” store, to identify identical documents and eliminate duplicates, emails often contain fragmented quotes and partial excerpts, often broken up by “>” characters at the beginnings of lines to indicate the fact that they’re excerpts. In the e-discovery process, attorneys need to be able to connect threads of discussions compiled from multiple streams of emails, almost none of which contain absolutely identical excerpts.
“Businesses and governments around the world generate enormous volumes of data every day,” reads http://blogs.microsoft.com/blog/2015/01/20/microsoft-acquires-equivio-provider-machine-learning-powered-compliance-solutions/”>a blog post this morning by Microsoft Outlook and Office 365 corporate vice president Rajesh Jha. “Sifting through that data to find what is relevant to a legal or compliance matter is costly and time consuming. Traditional techniques for finding relevant documents are falling behind as the growth of data outpaces people’s ability to manually process it.”
Theoretically, a cloud-based service incorporating this technology could draw these associations between discussions among related parties in real-time. This could become useful for any number of Office 365 paying subscribers, perhaps on a premium tier, above and beyond the legal profession.
The “Method for organizing” patent explains how such a system analyzes streams of document data and compiles fingerprint information from those streams. Those fingerprints are continually checked for similarities, and when they arise, the system constructs “presumed documents” that may later correspond to the original documents from which fragments are excerpted. Such documents are assemblies of nodes, each of which is described by metadata. When multiple properties of this metadata appear to correspond, the system compares fingerprints to see whether separate fragments actually match. If they do, they’re incorporated into the “presumed documents.”
It’s a way of compiling a multi-dimensional index from several huge streams of data, without the system having to have “root documents”—files known to be the originals from which the other copies are made—in advance.
Elsewhere in the portfolio of the Israel-based company are other similarly important patents to the e-discovery process, including methods for enhancing expert system processes. While Rajesh Jha’s blog post focused on the e-discovery process, Equivio has described for its own customers in the past a sophisticated machine learning system, compiling relevance statistics for accumulated nodes in the stream assessment process that help the system ascertain the relevance of future documents as they are acquired.
That system involves predictive coding, which in this context is a way of fast-tracking elements of documents that are more likely to be relevant to a search. Imagine a client company building its own “local Google” on-premise, using automatic processes that Equivio says save its clients millions of dollars over hiring data scientists to make assessments themselves.
Microsoft declined to go into further detail Tuesday about the relevance of Equivio’s patents to its future products and services, though a spokesperson did issue this statement: “We intend to incorporate Equivio’s technologies into products coming from Microsoft. We will share more on this in the near future. We are fully honoring Equivio’s existing agreements with current customers and partners, and look to continuing working with all these players moving forward.”