Over the past two months, InfoWorld has been researching a flaw in Oracle's flagship database software that could have serious repercussions for Oracle database customers, potentially compromising the security and stability of Oracle database systems.
Typically, when a bug results in a database outage, affected systems can simply be recovered from backups. But as InfoWorld has learned, this particular collection of Oracle issues could incur database outages that take considerable time and effort to correct.
According to a source who asked not to be named, "This is a very real problem for us. We're spending considerable time and money to monitor, plan, and address it where we can."
In the process of reporting this story, we've conducted our own tests, verified information with sources we believe to be reliable, and consulted extensively with Oracle itself, which has given credit to InfoWorld for bringing the security aspect of the problem to the company's attention.
After we notified Oracle of our discoveries and following several technical discussions, the company requested that we hold this story until it had time to develop and test patches that addressed the flaw. In the interest of security for Oracle users, we agreed. Those patches are available at 1 p.m. PT today as part of the Oracle Critical Patch Update for January 2012.
To be clear, the security aspect of the flaw could make any unpatched Oracle Database customer vulnerable to malicious attack. Yet another, more fundamental aspect could pose a special risk only to large Oracle customers with interconnected databases. Both stem from a mechanism deep in the database engine, one that most Oracle DBAs seldom deal with day to day.
The heart of the matter
At the core of the issue is the System Change Number (SCN) in Oracle. This is a number that increments sequentially with every database commit: inserts, updates, and deletes. The SCN is also incremented through linked database interactions.
The SCN is crucial to normal Oracle database operation. The SCN "time stamp" is the key to maintaining data consistency in Oracle, allowing the database to respond to every user's query with the appropriate version of data at every point in time. It functions as the database's clock, so to speak. And like time, the SCN cannot decrement. It must always tick forward.
When Oracle databases link to each other, maintaining data consistency requires them to synchronize to a common SCN. This is necessarily the highest SCN carried by any participating Oracle database instance because the SCN clock cannot run backward -- so database linking causes the SCN in many databases to jump during normal operations. And only very basic permissions are required to make a connection that can cause one database to increment the SCN on another.
The architects of Oracle's flagship database application must have been well aware the SCN needed to be a massive integer. It is: a 48-bit number (281,474,976,710,656). It would take eons for an Oracle database to eclipse that number of transactions and cause problems -- or so you might think.
In addition to that hard limit, there's a soft limit imposed by Oracle itself to ensure the SCN value at any given moment is not unreasonably high, which would indicate a database malfunction. If the soft limit is exceeded, the database can become unstable and/or unavailable. And because the SCN cannot be decremented or otherwise reset, the database cannot simply be restored from a backup and replaying the logs, since they will necessarily carry the SCN elevation along with those transactions. An analogy can be made to the effect of the Y2K bug on an unpatched system as the clock struck 12 a.m. on Jan. 1, 2000.
The soft limit derives from a very simple calculation anchored to a point in time 24 years ago: Take the number of seconds since 00:00:00 01/01/1988 and multiply that figure by 16,384. If the current SCN value is below that, then all is well and processing continues as normal. To put this in simple terms, the calculation assumes that a database running constantly since 01/01/1988, processing 16,384 transactions per second, cannot exist in reality.
In reality, it doesn't. But if you alter reality, it does -- and there are several ways a breach of the SCN soft limit may occur.