Storage

Oracle Pushes Compression to Scale Databases

Oracle's powerful new HP Oracle Database Machine comes with 168TB of storage, a new method of retrieving data more quickly and intelligently, and -- wait for it -- a $2.33 million price tag.

It's the turbocharged option for the database administrator with money to burn and a need for speed.

But most DBAs don't get to drive in the fast lane -- especially not with IT budgets the way they are. So as a less lavish option for enterprise users, Oracle is touting another approach.

That one involves data compression, which has long been a popular way to save storage space and money. Traditionally, though, the trade-off has been high: Gobs of memory and processing power typically are needed to compress data and write it to disks. Even more is needed when the information is later extracted.

Now Oracle claims to have solved this thorny problem with a feature it first introduced in its Oracle 11g database, which was released last year.

By using the Advanced Compression option in 11g, Oracle says, DBAs can shrink database sizes by as much as three-fourths and boost read/write speeds by three to four times, no matter whether they're running a data warehouse or a transaction-processing database -- all while incurring little in the way of processor utilization penalties.

Oracle claims the storage and speed gains are so dramatic that companies using Advanced Compression will no longer need to move old, seldom or non-used data to archives. Instead, they can keep it all in the same production database, even as the amount of data stored there grows into the hundreds of terabytes or even the petabyte range.

"This works completely transparently to your applications," Juan Loaiza, Oracle's senior vice president of systems technologies, said during a session at the company's OpenWorld conference in San Francisco last week. "It increases CPU usage by just 5%, while cutting your [database] table sizes by half."

Oracle says it's responding to the demands of enterprise customers with fast-growing databases (download PDF). "The envelope is always being pushed," Loaiza said. "Unstructured data is growing very quickly. We expect someone to be running a one-petabyte, 1,000-CPU-core database by 2010."

It's also responding to the fact that storage technology, one of the keys to database performance, has made little progress from a speed standpoint, according to Loaiza. "Disks are getting bigger, but they're not getting a whole lot faster," he said.

Taking Data Compression Down to the Block Level

Oracle has offered simple index-level compression since the 8i version of its database was introduced in 1999. That improved several years later with the introduction of table-level compression in Oracle 9i Release 2, which helped data warehousing users compress data for faster bulk loads, according to Sushil Kumar, senior director of product management for database manageability, high availability and performance at Oracle.

Advanced Compression provides even finer capabilities, letting the database compress data down to the disk-block level (download PDF). The algorithm used in the new feature compresses data while keeping track of exactly where information is stored, Kumar said. The result, he claimed, is that when data is extracted by users, the database can focus in like a laser on the exact block on the disk where the information is located, instead of pulling whole tables and sifting through unwanted data.

Other compression schemes "have no idea what's on the disk," Kumar contended. "They can't read part of a document without opening up the entire one."

According to Oracle officials, Advanced Compression is also smart enough not to compress data with every single change to a database, but to instead let the changes accumulate and then run them in batches. That is efficient enough to enable Advanced Compression to work with OLTP databases, which tend to have heavy read/write volumes, said Vineet Marwah, a principal member of the Oracle database staff.

Another component of Advanced Compression, called SecureFiles, can automatically detect, index and compress non-relational data such as Word documents, PDFs or XML files, Marwah said. Oracle also has enhanced its backup compression performance so that it is 40% faster in 11g than in the previous version of the database, while not degrading the performance of other database functions, he said.

And because a compressed database is generally much smaller, it shrinks the flow of data between the storage server and database, where bottlenecks tend to occur, Kumar said. The gains are so dramatic that DBAs can dump their complicated partitioning and archiving schemes, he claimed. "A lot of people archive data because they have to, not because they want to," he said. "So if you see a business value in keeping data around, compression is a useful way to not let resource constraints dictate your architecture."

Advanced Compression: Not a Cure-All

Oracle acknowledges that Advanced Compression isn't a cure-all. For instance, while large table scans "are a whole lot faster, compression doesn't make random-access reads that much faster," Loaiza said. Also, data that has already been compressed, such as a JPEG image, can't be compressed further, according to Kumar.

Oracle's claim of 4:1 compression also isn't the highest level in the database industry. Database analyst Curt Monash pointed out in an online post this week that analytic database start-up Vertica Inc. claims compression ratios from 5:1 to as much as 60:1, depending on the type of data.

Kumar declined to comment about Vertica. But during his OpenWorld presentation, he claimed that Oracle's variable-length, block-level compression is more efficient than what is offered in IBM's rival DB2 9 database, not to mention faster. "Because DB2 is so inefficient to begin with, Oracle is the winner any day," Kumar said. He also called the compression offered by data warehousing database vendor Teradata Corp. "very primitive."

But users haven't flocked to Advanced Compression yet. One reason is that it's a paid add-on. A license costs $11,500 per processor, with updates and support adding an additional $2,530 per CPU. Also, it's available only to users of 11g Enterprise Edition, and Oracle hasn't seen much adoption of 11g thus far. According to Andrew Mendelsohn, Oracle's senior vice president of server technologies, 75% of its customers are running 10g, and another 20% are still running 9i.

Take what is likely Oracle's biggest customer, LGR Telecommunications, which develops data warehousing systems for telecommunications companies. LGR has built two 300TB data warehouses for AT&T Inc. for use in storing and managing the carrier's caller data records, according to Paul Hartley, general manager of LGR's North American operations in Atlanta. The databases, which run concurrently with one another, can scale up to a total of 1.2PB, Hartley said during a presentation at OpenWorld.

But the two data warehouses are based on Oracle 10g, so they can't take advantage of Advanced Compression. LGR does "use compression to some extent today, but we plan to use it extensively in the future," Hannes van Rooven, a manager at LGR, said during the same presentation.

Another Oracle customer, Intermap Technologies Corp., is using the spatial-data version of 11g for its 11TB database of digital mapping and imagery data, which is expected to grow to 40TB by the first quarter of 2010, according to Sue Merrigan, senior director of information management at the Englewood, Colo., company. Intermap isn't in the compression camp now. "We don't compress the data because we are concerned it would lose its accuracy," Merrigan said.

That isn't true, responded Kumar, who said that Advanced Compression is a so-called lossless compression scheme.

Rivals such as John Bantleman, CEO of archiving software vendor Clearpace Software Ltd., argue that sending old data to archives will continue to boost database performance more than compressing information. Moreover, it isn't much more complicated to do so, Bantleman claims. And using tools such as Clearpace, users can search and extract data archived outside of the database as quickly and conveniently as if the information was stored in it, according to Bantleman.

"A telco might need to maintain its caller data records for years," Bantleman said. "But does it really make sense to keep all of that in your database if regulations only require you to keep access to it for 90 days?" He added that it might seem better "emotionally" to maintain a single data storage environment. "But I think you want to segment the live part of your data for OLTP performance from your highly compressed historical data. These two schemas don't meld well in the same box."

Subscribe to the Power Tips Newsletter

Comments