Finding that aggregate Web searches alone cannot provide an accurate assessment of where flu has struck and its severity, Google has decided to take into account data from the U.S. Centers for Disease Control and Prevention in its Google Flu Trends model.
The Google tool is based on the premise that people turn to the Web when they are searching for information on the flu, making certain search terms good indicators of flu levels. The indicators of flu activity were also provided faster by Google than the weekly reports of the CDC.
During the 2012-2013 flu season in the U.S., the model performed well in estimating the start and duration of the season, according to Google. But it overestimated the severity of the flu leading to a difference between Google’s estimates and the percentage of healthcare visits for influenza-like illnesses reported by CDC.
“We found that heightened media coverage on the severity of the flu season resulted in an extended period in which users were searching for terms we’ve identified as correlated with flu levels,” Google said in October last year.
The overestimation also lead to skepticism whether big data can really deliver the benefits expected from its analysis. In an article earlier this year in Science magazine, titled ”The Parable of Google Flu: Traps in Big Data Analysis,” four researchers argued that traditional “small data” such as data from CDC often offer information that is not contained, or containable, in big data.
Google appears to have taken their advice and decided to use CDC data as well in its model. The company’s Christian Stefansen, a senior software engineer, wrote Friday that Google had investigated the overestimation in the 2012-13 season and launched a retrained model for the next season. It performed within the historic range. But Google decided it could perhaps improve the accuracy significantly with a model that learns continuously from official flu data, Stefansen wrote.
For the 2014-2015 flu season, Google is now using a new Flu Trends model in the U.S. “that takes official CDC flu data into account as the flu season progresses.”
Flu Trends now covers 29 countries and Google is exploring whether the new model can be extended to other countries, depending on its performance in the U.S.