Could a lack of privacy regulations in the U.S. and abusive practices lead to a backlash that negatively affects scientific research for the greater social good? That worries Tom Mitchell, a Carnegie Mellon professor and machine learning researcher, whose profile appears this week in the pages of Computerworld.
As smart phones diligently record people's locations, movements and other activities, machine learning and real time data mining can be used for the greater good. For example, real time positioning and movement data from you smart phone is already being used to track traffic congestion. Soon it could be used to change traffic light patterns in order to optimize traffic flows.
Machine learning algorithms feed on such data to make predictions for good -- or ill. Patient data could be analyzed to inform you that yesterday you came in contact with someone who has a contagious disease. But if you have the disease, do you want that information made public? What about entities that might use machine learning tools to identify you in random groups of photos that you or others have posted on the Web? How about identifying your mother or your child?
The current approach to privacy needs to be turned on its head, Mitchell argues. "I would like to see a shift from the current idea that the data capturer owns the data to see the person that the data is about have final control over it."
But that won't happen without an informed debate on how data mining goals can be done while preserving anonymity. Right now, he says, policy makers are not getting that information. They may be lead to believe that it's all or nothing, and that any attempt to provide guidelines through privacy regulation will shut the doors to innovation.
"Even though there are guidelines out there about how to deal with privacy issues, a lot of those guidelines and a lot of the debates I've heard are under informed about what the available technology is for assisting in protecting privacy," he says.
Mining data while protecting patients
For example, says Mitchell, let's take all of the patient data from all of the hospitals in the US.
If researchers could collect all of the data from all H1N1 patients in hospitals in the US, they would have a tremendous data set. Machine learning algorithms, which work best with very large data sets, could then be used to determine which treatments allow patients to recover quickly for a given patterns of disease symptoms. In a rapidly moving disease outbreak, such as pandemic, machine learning algorithms could help doctors figure out what works for very specific patient profiles -- and what doesn't. What works best when the patient is under 12 and has a high fever but doesn't work so well if no fever is present? "Those patterns exist out there but we can't discover them quickly if we don't have the data," Mitchell says.