Now that many organizations see the utility in big data, BMC Software has provided a way to incorporate jobs from the Hadoop data processing platform into larger enterprise workflows.
“We provide a finer level of granularity for Hadoop workflows, and not just within Hadoop, but across the enterprise,” said Shamoun Murtza, BMC director in the company’s office of the chief technology officer.
BMC has released a new module for its Control-M job scheduling software that’s designed to allow its users—typically large organizations—to manage Hadoop jobs as well, alongside the traditional IT jobs that they already manage using Control-M.
Murtza explained that even as enterprises start to use Hadoop, they don’t have a lot of utilities to fit it into their existing IT operations.
Hadoop works mostly by batch processing, in that it ingests a chunk of data, analyzes it, and returns some output. This approach makes it well-suited for running in serial with other applications that can either feed Hadoop data, or use the results from Hadoop in some other computational operation.
Batch work “has to be coordinated not just within Hadoop, but across the enterprise. There are other workflows with results that can be pushed into Hadoop. Once you get the data out of Hadoop, you run some more [applications] to get value out of the data,” Murtza said.
Originally designed for mainframe computers, BMC’s Control-M workload automation tool provides a way for administrators to build workflows by linking different applications together into one task, without having to write a lot of scripts. BMC now offers the software for most modern enterprise platforms, including Linux, Unix and Microsoft Windows, and can work with most enterprise software, such as databases, enterprise resource planning (ERP) and ETL (extract, transform, load) software.
The new Control M module now recognizes commonly used Hadoop components such as the HDFS (Hadoop File System), Pig, Scoop, Hive and MapReduce, which eliminates the need for administrators to write scripts to wire these applications into their workflow. Hadoop has its own set of job scheduling tools, although they work mostly for managing jobs only within the Hadoop environment, rather than for managing all the software being used in an organization.
Control-M offers a way to provide instructions to these Hadoop components from within a centralized console. “You don’t have to write shell scripts,” Murtza said.
One early user of Control-M for Hadoop has been Sears’ MetaScale business unit, which analyzes data on behalf of Sears as well as other retailers, as well as for finance and health care companies.
Control-M allowed MetaScale to replace its whole ETL process with one using Hadoop, which can analyze data covering longer periods of time than could be captured by traditional data warehouses. It also provided an interface that was easier to work with for MetaScale’s administrators and programmers as well, Murtza said.
Control-M also has some predictive analysis that estimates how long a particular job will take, based on other jobs that are running and the length of time it took to run the job in the past.
Consultancy firm Wikibon estimates that big data analysis is generating more than US$11 billion a year for software sellers, consultants, system integrators and other support providers.