Oracle is hoping to turn heads in the crowded data analysis market with Big Data SQL, a software tool that can run a single SQL query against Oracle’s own database as well as Hadoop and NoSQL data stores.
The software is an option for Oracle’s Big Data Appliance, which incorporates Cloudera’s Hadoop distribution, said Neil Mendelson, vice president of product development, big data and analytics.
There’s a lot of experimentation going on in enterprises around so-called big data, but certain factors are impeding customers from moving these projects into production mode, namely a lack of integration between Hadoop and other systems, difficulty obtaining the right talent and concerns about security, Mendelson said.
Big Data SQL takes advantage of the core skills any Oracle database administrator has, he added. “You get to use the full dialect of SQL.”
You also have to buy big into Oracle’s technology stack, however.
Big Data SQL’s full benefits require an Oracle database to be installed and running on the software company’s Exadata database machine. In an implementation, Exadata and the Big Data Appliance would share an interconnect for data exchange, Mendelson said.
In addition, Big Data SQL is only compatible with version 12c of the Oracle database, which was released last year. Most Oracle database customers are still running versions 11g and earlier.
But customers get benefits in exchange for the investments Big Data SQL requires, particularly the ability to use the Oracle database’s advanced security features within Hadoop and NoSQL stores, he said. Security rules set for data in 12c are simply “pushed” into those other environments, Mendelson said.
Oracle over time will add support for using Big Data SQL with other hardware systems it sells, according to Mendelson. The software is set for general availability within the next couple of months, with pricing to be announced at that time.
Big Data SQL isn’t an attempt to replace the SQL engines already created for Hadoop, such as Hive and Impala, which Oracle will continue to ship with the Big Data Appliance, he said. “We’re really solving a wider problem.”
One big challenge facing data scientists is simply the overhead of moving data among systems, he said. Big Data SQL allows various information stores to be queried in place with minimal data movement, and queries are made more efficient using Smart Scan technology from Exadata’s software stack.
At a quick glance Big Data SQL might appear to simply be another take on federated querying, which has been around for quite some time. It also has its disadvantages, said analyst Curt Monash of Monash Research.
“Federating query across systems involves a network cost, always,” he said. “Often it also leads to a query being planned by an optimizer that isn’t ideal for all parts of the query,” Monash said. “If the performance advantages of moving the data are large enough to outweigh those considerations, it usually would be even better to move the data before you start.”
But Big Data SQL “is data federation with some predicate pushdown,” Monash said. “A predicate is, for this purpose, part of a SQL statement. Rather than do everything at the central processing location, which can be a cluster itself, you push down some of the predicates to where the data is stored.”
“That’s the whole point of Exadata,” Monash added. “A lot of the filtering is done locally, so that the network impact isn’t as miserable as it otherwise could be. This reduces the objections to data federation. It’s a good idea, just as Exadata is a good idea.”
But it would be an overstatement to call Big Data SQL a breakthrough, Monash said.
“These are all well-known ideas, which Oracle seems to have now implemented for its own particular walled-garden environment.”
Oracle is expected to discuss Big Data SQL further during a webcast on Wednesday featuring Andrew Mendelsohn, executive vice president of database server technologies.