A number of the largest big data vendors, including IBM, Hortonworks and Pivotal, have banded together to specify a unified base platform for the open source Hadoop data processing software.
The Open Data Platform will identify the specific versions of Apache Hadoop and its supporting software that will run together as a seamless whole, potentially reducing the work required on the part of enterprises to build and maintain complex Hadoop-based data analysis systems.
“As the business value of Apache Hadoop is increasingly recognized by enterprises, the need grows for a rigorously tested, consistent, well-defined release of this ecosystem,” said Raymie Stata, CEO of Hadoop vendor Altiscale, in a statement.
The Open Data Platform will provide “a proven base against which product and service provider companies can certify enterprise-class solutions,” Stata said.
The code base for Hadoop is managed by the Apache Software Foundation. Like the Linux operating system kernel, Hadoop is packaged by multiple vendors into commercial distributions, not all of which are compatible with each other. Adding to the complexity are a number of adjoining Hadoop programs, such as Hive, Ambari and ZooKeeper, all of which can take work to integrate into Hadoop.
This approach is similar to the one taken by the Linux Foundation with its Linux Standard Base, a core set of components that work together.
By establishing a common base library for Hadoop, the Open Data Platform will streamline the process of understanding on what technologies, and what versions of these technologies, can be seamlessly used with each other. Organizations can then more easily integrate off-the-shelf software into their Hadoop systems, mixing-and-matching different Hadoop components from different vendors.
In addition to IBM, Hortonworks and Pivotal, other companies that have signed on to the initiative include General Electric, Infosys, SAS, Altiscale, Capgemini, CenturyLink, EMC, Splunk, Verizon Enterprise Solutions, Teradata, and VMware.