Data warehousing vendor Greenplum on Monday rolled out its “Enterprise Data Cloud” initiative, a long-term product and methodology strategy meant to further the idea of “self-service” data warehousing.
Traditional enterprise data-warehousing practices, where a company tries to gain “a single version of the truth” by combining various data sources into a single store, has proven too clunky and slow for modern businesses, said Greenplum’s president, Scott Yara.
The company’s EDC approach will instead see companies use a general pool of underlying infrastructure — be it an in-house physical server farm, virtualized machines or public clouds like Amazon Web Services — to create and manage a range of data warehouses and data marts, Yara said.
In fact, maybe only 10 percent or 20 percent of a company’s data is actually stored in the main enterprise data warehouse, Yara said. “There’s always an explosion of marts in the shadow of the EDW,” he said. “You have legitimate needs for hundreds and thousands of databases for departmental work.”
But these “shadow data marts” should be viewed as a powerful tool for businesses, not something that goes against data-governance and management policies, he said.
Greenplum plans to build out a software stack that includes a Web-based front end with which database administrators can easily spool up new data marts, users can search for data, and system managers can set policies and manage users. Customers would continue plugging in third-party BI (business intelligence), data mining and other tools.
Greenplum’s plans also include a middle tier of platform services for handling tasks like identity management, disaster recovery, policy enforcement, storage management and performance diagnostics.
Some of the new features are now available in the 3.3 version of Greenplum’s database, which was also announced Monday. Greenplum plans to fill out the remaining capabilities over time.
Greenplum is still mulling over whether it will charge separately for the Web-based management interface, which will be released along with Greenplum 3.4 later this year, Yara said.
The Greenplum database itself is priced in a number of ways. List pricing for perpetual licenses is US$16,000 per core or $70,000 per terabyte of data, with 22 percent annual maintenance, while subscription pricing is $8,000 per year per core or $35,000 per year for each terabyte.
The vendor is “doing the right thing” by pushing the EDC concept, analyst Curt Monash of Monash Research said in an interview.
A rigidly formal, centralized data warehouse may make sense for a few large enterprises, but the approach has severe limitations since big companies are always acquiring new third-party data, such as from businesses and new applications they buy, according to Monash.
Greenplum’s notion of self-service data marts has merit, but with certain caveats, Monash said in a blog posting Monday. “Suppose users could order up the data mart they want, perhaps test it at a very low processing priority (if they choose), and then send the completed request to IT for approval and provisioning. That would have some value.”
It also would be a good thing for certain users to be able to manage data marts on their own once they’ve been generated, he added in the post: “That’s a great idea, full of agility and don’t-make-IT-a-roadblock goodness. Data miners and similar analytic professionals commonly have the technical ability to manage a simple database, and should be allowed to do so if it’s ensured that they don’t break anything for anybody else.”
But Monash also noted on his blog that Greenplum’s vision needs “sophisticated data movement and synchronization” capabilities in order to come to fruition.