Merging the worlds of big data and cloud computing, Red Hat, Hortonworks and Hadoop integrator Mirantis are jointly building a software program, called Savanna, that will make it easier to deploy Apache Hadoop on an OpenStack cloud service.
The software will "allow Hadoop to take advantage of the scale-out storage architecture that OpenStack offers," said Adrian Ionel Mirantis CEO. "Enterprises will have a much easier way to deploy and use Hadoop at scale."
Mirantis launched the project earlier this month, donating the code to the OpenStack Foundation. OpenStack is a collection of open source software designed to offer shared compute, storage and networking services on an on-demand basis. And Apache Hadoop is a data processing framework for analyzing large amounts of data across multiple servers in a cluster. Both sets of software are increasingly being tested and deployed by organizations.
"The cloud provides an economic low-cost infrastructure that scales out easily. And that is something that is very important in the Hadoop world, as many of these projects are spinning up quickly inside of business units, and they don't necessarily talk with the IT folks," said Shaun Connolly, Hortonworks vice president of strategy. Savanna will work with any standard Hadoop distribution, not just Hortonworks' own distribution.
Savanna will provide an easy way to install a Hadoop cluster on an OpenStack cloud. Administrators can specify the cluster topology, the number of nodes, required hardware and other attributes. The project is preparing Savanna to be an element of the OpenStack suite, accessible by either an API (application programming interface) or through a GUI available for the OpenStack dashboard.
Over time, the software will offer additional functionality, such as auto-scaling, the ability to schedule when a Hadoop deployment runs and the ability to manage multiple Hadoop clusters. Savanna will also be able to reallocate unused computational power on an OpenStack grid for Hadoop workloads. And Savanna will provide an integration point for third party Hadoop provisioning and management software, notably the Apache Ambari.
The team expect to have demonstrations of the software ready for the Hadoop Summit in June.
Beyond providing a potential time-saving tool for administrators, Savanna is notable in that it shows how enterprises are becoming more reliant on open source software. "We're starting to see major projects like Hadoop and OpenStack to integrate, because there is this huge drive in the enterprise to arrive at a unified open source infrastructure," Ionel said.