While EMC and VMware spinoff Pivotal prepares to launch its business intelligence PaaS (platform-as-a-service), the new company has also been busy building its portfolio of data mining and analysis software.
The company is releasing two programs to help data analysis. One is an in-memory data store for real-time analysis that works with the Hadoop data processing platform, and the other is a data discovery tool for business analysts.
As part of an update to its Hadoop distribution, Pivotal HD 1.1, Pivotal will include GemFire HD, an in-memory transactional store that VMware acquired in 2010. Offering GemFire as part of its Hadoop package, Pivotal is hoping that organizations use the software as a base for building OLTP (online transaction processing) systems that can use Hadoop for long-term storage. This approach will provide organizations with the ability to analyze both current data, as it is being held in GemFire, and older transactional data that has been offloaded to Hadoop, said Abhishek Kashyap, Pivotal principal product manager.
A GemFire instance is held entirely within the working memory of a server, which provides a speedy way to interrogate live operational data with SQL, useful for situational awareness and other forms of real-time analysis. The company also unveiled Pivotal Data Dispatch, a tool to help data workers find and prepare data sets for analysis.
Data Dispatch was originally developed by the New York Stock Exchange, whose data analysts have used the software since 2007 to better understand the effects of regulatory requirements.
The software allows analysts to pick, filter and combine the data sets from different sources they need for analysis. The resulting analyst-generated data sets are stored in a “sandbox” that is available to business intelligence tools, such as those offered by Oracle, IBM and SAP, said Todd Paoletti, Pivotal vice president of product marketing.
Using traditional BI tools, analysts typically would have to request the IT staff to combine data sets and deploy them to a data warehouse. This software eliminates the need to consult with IT to generate each new data set, Paoletti said.
The system administrator initially defines and tags the data sources that are then made available to the analyst. Source data files can be from database files, flat files, Hadoop files, Microsoft Project files or from other commonly-used formats. The resulting combined data sets can be stored either in Hadoop, or with Pivotal’s Greenplum.
Pivotal plans to launch its cloud service Nov. 12. Pivotal HD 1.1 will be available Nov. 1.