Watson Teaches 'Big Analytics'
This vendor-written tech primer has been edited by Network World to eliminate product promotion, but readers should note it will likely favor the submitter's approach.
IBM's Watson's impressive "Jeopardy!" win demonstrated the awesome strides in computing power and ingenuity, but just as impressive was the way in which Watson's creators attacked an avalanche of information to come out victorious. Notably, Watson wasn't concerned with big data alone.
"Big data" is often cited as the core problem holding back companies from gaining a competitive advantage in this age of information overflow. Most organizations are fairly adept at capturing that information, but what ultimately matters is what they do with it, how quickly they utilize it to glean value. This is "big analytics." And though Watson is clearly a different animal than database analytics solutions for business, fundamentally, Watson is big analytics.
Working from just a single terabyte of data, Watson performed complex analyses at incredibly high speeds to come up with correct answers. For those of us in the business of data storage and analytics -- in fact, most companies -- this illustrated the power and challenge of big analytics, not just big data.
A Combination Problem
For years, big data was considered a critical problem for businesses trying to capture information and then deliver new products or solutions to customers based on that knowledge. Initially, the costs in storage alone could get out of hand quickly and admittedly, the numbers associated with data collection look and sound daunting.
Retailers regularly collect massive amounts of information about customers from online, in-store and even social media sources. Financial institutions gather millions of daily credit card and bank transactions, and rely on multiple terabytes of historical data to create new business insights. A recent IDC report predicts data will grow some 44 times over the course of the next decade!
Too often, the industry focuses its attention primarily on this piece of the data problem. But today, those are simply big numbers. But the second piece, often ignored or pushed aside, is the problem of big analytics, because even 100 terabytes of data is entirely useless if companies haven't solved the big analytics problem.
This of course includes the aforementioned problems of scale. But modern analytic platforms must also be extremely fast in answering creative, often difficult questions drawn from multiple sources in a variety of programming languages. That is, these platforms require velocity, agility and the capacity to deal with complexity.
Velocity, first and foremost, is about brute speed and power. Watson was not only able to come up with answers with a required level of confidence but also physically buzz in before his human competitors. In business, vast stores of data -- customer information, social media feeds, financial records -- have diminishing returns as time goes by. If the information is not acted on immediately, its value plummets. For instance, financial institutions attempt to identify trades just 30 seconds ahead of the competition to maximize returns, or attempt to identify fraudulent patterns as they occur. They can't predict an event, however, if they must wait for big analytics to come back with an answer. Critically, businesses must now get from problem to question to answer in a drastically reduced timeframe.
Agility is the capacity to have a "conversation" with data. Watson was in a sense having a conversation with "Jeopardy!" host Alex Trebek. But the computer was also having a conversation with its data store, creating a series of answers with varying degrees of certainty.
Businesses successfully utilizing big analytics can take this process of knowledge discovery even further, identifying questions, exploring the answers and asking new questions based on those answers. This iterative quality of data analysis, rather than incremental exploration, can lead to a deeper understanding of business and markets, and begin to answer questions never before considered.
GARTNER REPORT: BI, analytics software spending jumps 13.4%
Watson was also able to understand the intricacies of human language, in many cases even the semantics of puns and wordplay. While database analytics solutions of course can't understand language, the ability to understand complex questions, and explore gargantuan data stores, is indeed critical.
Bringing these big analytics traits together for enterprise risk management (ERM), now a central focus for companies, is one example. Exploring risk across the organization, companies glean the "risk web" that shows causality and not simply correlation among various risky actions. Analyzing this risk web in order to make sound decisions, often in a short timeframe, requires an analytic platform delivering on the promise of big analytics.
Performing big analytics
To enable this knowledge discovery, a novel approach to big data is needed. Legacy analytic database solutions, as well as many modern offerings, aren't meeting the big analytics challenge.
For one, many organizations have created expensive static workarounds to the dual problems of big data and big analytics -- from laborious database tuning with armies of DBAs, to proprietary hardware -- which are not conducive to this new era of intense analytics.
Many companies also utilize databases originally designed for transaction processing in the 1980s, not the big analytics of today's hyper-fast business landscape, and are only able to pull from structured data sources but not unstructured sources such as the Internet, social media or even satellite imagery. Finally, old and slow solutions such as these are combined with today's complex data warehousing solutions, and unable to draw out critical insights on business buried by this complexity.
By contrast, the new generation of analytic platforms solve the big analytics problem by integrating on two broad levels. On the infrastructure level, they integrate and leverage existing hardware and software technologies while satisfying the most demanding analytic requirements. On the data level, these platforms seamlessly integrate analytic algorithms written in any language and run them completely parallel inside the platform next to the data. The modern analytic platform is also able to integrate with and consume data, whether structured or unstructured, from multiple data sources in and out of the enterprise.
Big Analytics in Practice
We witnessed what Watson could accomplish on a game show with big analytics, but organizations across the globe are also benefiting from this new focus that goes beyond big data.
Government agencies are examining patterns, relationships and correlations to more accurately identify security threats or predict the impact of geopolitical events -- revolutions and protests in Libya, Egypt, Tunisia and elsewhere come to mind.
Retailers analyze data from a nearly unlimited number of sources. Structured information from in-store customer data and live click-streams is one source, while Twitter, Facebook feeds and market news provide a rich store for unstructured "sentiment" analysis. By analyzing these complex data iteratively, rather than incrementally, retailers rapidly identify product affinities, up-sell opportunities and optimize pricing to drive profitability.
Financial firms, meanwhile, utilize big analytics for trading, fraud and risk assessment as well as new financial regulations now requiring firms to complete standardized portfolio stress testing. Big analytics drills into the complexities of investments never previously explored.
Typically identified as the core challenge facing information-intense business, big data is only part of the puzzle. And Watson isn't the only one proving companies should pay just as much attention to solving big analytics as they do to solving big data. New analytic platforms prove it as well: the ability to rapidly analyze fresh data from multiple sources, ask creative, complex and iterative questions of that data and get useful answers in a timely fashion are the crux of big analytics, creating new opportunities and giving organizations true competitive advantage.
ParAccel is the developer of ParAccel Analytic Database (PADB), the world's fastest, most cost-effective platform for empowering analytics-driven businesses. ParAccel enables organizations to tackle the most complex analytic challenges and glean ultra-fast deep insights from vast volumes of data. Data-driven businesses in financial services, retail, healthcare, government and more are taking advantage of ParAccel to tackle critical, time-sensitive questions outside the scope of conventional data warehouses and existing analytic tools. For more information contact email@example.com or visit http://www.paraccel.com.
Read more about data center in Network World's Data Center section.