Two Google big data toolsets have finally moved out of beta and into full commercial release, adding to its cloud portfolio a data analysis framework and a service for managing data streams in real-time.
Google Cloud Dataflow, which could serve as a possible replacement for Hadoop, provides a framework for fusing different sources of data within one processing pipeline. Google Cloud Pub/Sub is the company’s service for managing data streams in real time.
The two services fill out Google’s roster of cloud-based data analysis tools, joining Google BigQuery, a commercial service for analyzing large sets of unstructured data.
These services require less maintenance and operational oversight than in-house data processing systems, Google said in a blog post Wednesday.
Both services were announced at the Google I/O 2014 conference, and have been available as public beta trials for some time.
As full-fledged commercial offerings, these services are now fully integrated into the Google Cloud Platform, Google’s collection of tools for orchestrating cloud-based operations.
Customers have been using the Google Cloud Platform for tasks such as financial fraud detection, genomics analysis, inventory management, click-stream analysis, and user interaction testing.
Google Dataflow provides a unified programming model for handling different sources of data, including both batch and streaming data sources, eliminating the need for complex ETL (extract, transform, and load) software.
Dataflow can also serve as a speedier alternative for crunching large amounts of unstructured data, compared to the batch-processing-oriented Hadoop, Google claimed.
Salesforce.com is using Dataflow to augment its Salesforce Wave business intelligence service, while digital marketing firm Qubit uses it to track customer web interactions in real time.
Google Cloud Pub/Sub can serve as a messaging system, providing a way for data analysis systems to work from a stream of fresh data as it is generated. It can handle up to a million messages a second, which it can push to other Google analysis services such as Dataflow.
The beta version of the service has already delivered over a trillion messages to users.
Pub/Sub starts at $0.40 for the first 250 million messages, with the cost going down for greater usage. Cloud Dataflow pricing is based on a per job basis, depending on the time it takes to complete an operation and the amount of data that must be moved around.
Google also announced that it supports Cloudera Hadoop distributions in its cloud. Users can run copies of Cloudera Express and the Cloudera Enterprise Hadoop distributions on Google Cloud Platform.