Companies Offer Services to Crunch Gov't Raw Data

What if a U.S. president called for a bunch of government data to be released, but the raw numbers were difficult to make sense of?

A handful of companies and an open-source development project are trying to make sure that isn't happening as U.S. President Barack Obama pushes for open government in the early days of his administration.

The business models differ, but some companies are using the raw data released on Data.gov and elsewhere to demonstrate the power of their data-publishing and number-crunching services.

The release of all this data is a good move, but much of it is in a raw format, making it difficult to present it in a way that people can understand, said Kevin Merritt, CEO and founder of Socrata, a two-year-old company focused on helping government agencies and other users of the data reorganize and republish it on the Web.

Socrata calls its service of reorganizing the data into easy-to-read, interactive charts and graphs "social data discovery."

"The data is valuable, but the social data is valuable as well," said Merritt, a former Microsoft executive. "It's one thing to put the data online, but it's another thing to actually get some civic feedback loop."

On Data.gov alone, there were nearly 400 raw data sets available as of Wednesday morning.

There's a database of people, reported by country and region, granted asylum in the U.S. between 1998 and 2008; there's data on toxic chemicals released in Guam in 2005; there's a database of tornadoes, large hail and damaging wind reports from 1950 to 2006; there's data on the geochemistry of water samples in the U.S.; and there's a database of copper smelters around the world.

There's also data about patent applications, workplace fatalities, federal IT spending and migratory bird flyways. There are an additional 109,000 geographical data sets.

Vivek Kundra, federal CIO, was asked at a recent U.S. Federal Communications Commission forum about the Obama administration's philosophy on releasing data. "We don't really know which data feeds are going to lead to better analysis," he said. "What we are doing is, we're trying to release as much data as possible. As a result of that, we're finding a lot of innovation happening out there."

As the data is released, many U.S. residents are spotting trends that government workers hadn't seen before, he added. For example, based on data on which airline flights are typically late, fliers are starting to avoid flights from some airlines at certain times of the day, he said.

Socrata aims its products at government agencies, as well as journalists, researchers and other people wanting to make sense of the raw data. It offers a handful of products, including a free entry-level offering that allows anyone to host data on Socrata.com and a hosted, branded data site for large organizations. Socrata can help government agencies cut costs for storing and delivering data, Merritt said.

Socrata reformats data from a variety of formats and allows users to share the data on Twitter, Digg, Facebook and other Web sites. The company operates from a philosophy that data is meant to be shared, Merritt said.

"Data becomes more valuable as it propagates away from its source," he said. "The theory there is, the farther it gets away from its source, the more people have made use of it, and therefore, that data must be intrinsically more valuable."

Other companies, including iCharts and Visual i/o, are using the government data that continues to be released to demonstrate the power of their visual-analysis or chart-publishing products.

In addition, Sunlight Labs, started by the Sunlight Foundation, is developing open-source software that makes use of government data, and is encouraging other developers to do the same. This year, Sunlight Labs has hosted two contests to encourage application development based on government data, and it has gotten more than 90 submissions.

Part of the reason that the Sunlight Foundation started Sunlight Labs was to assist traditional and citizen journalists with investigative reporting, said Clay Johnson, director of Sunlight Labs.

"As the Obama administration begins to release more data, there aren't enough fingers on keyboards here in Sunlight Labs to handle all this," Johnson said. "Has the Obama administration succeeded in making more government data available? You're talking to the guy with the most unquenchable thirst for that, who will never say that they're successful."

ICharts doesn't focus exclusively on government data, but works to help Web site publishers present information in a searchable, easy-to-digest format, said Seymour Duncker, iChart's founder and CEO. "There's a huge abundance of open data, for example, produced through government and through universities," he said. There's an opportunity to make that data accessible to everybody, he added.

Obama's push for government transparency gives iCharts a lot of new raw data to work with, he said. "We see that providing the raw data is not sufficient," Duncker added. "You need to provide context. I see a new value chain emerging here."

For example, one of iChart's featured charts on its front page tracks the U.S. gross domestic product from 1948 to 2009. This year, as should be expected in a recession, the GDP is down significantly.

Visual i/o calls its products visual analysis software. Users can create interactive charts and graphs and share them with others using Visual i/o's Web-based tools, said Angela Shen-Hsieh, Visual i/o president and CEO. Users of the charts can, with a few clicks, create different views based on different parameters.

"When you look at the Data.gov data sets, they're going to become interesting when you mash them up together," Shen-Hsieh said. The data gets more useful as users overlay data sets such as chronic diseases with geographical information, she added.

Two IT analysts said they see a growing trend of companies like iCharts and Visual i/o using government data to demonstrate their products and services. But both Shawn McCarthy, a government vendor analyst with IDC, and David Curle, lead analyst with Outsell, questioned whether a company could build a business model entirely on repurposing government data.

There would seem to be limited sales opportunities outside of selling the repurposed data back to government agencies, and with the data being available to anyone, the potential for competition would be great, Curle said.

Many public interest groups already create charts and graphs with government data, McCarthy added.

"What I have found is that most data I've seen needs further manipulation to make it truly valuable," he said. "Anybody that's grabbing the data and doing something with it is mostly likely adding value to it."

Subscribe to the Today in Tech Newsletter

Comments