Turn Privacy Debate on Its Head, Says Researcher
But if hospitals were willing to participate, researchers could do that work without copying or directly accessing patient data.
"Most people think that we'd have to send all of that data to one place to run all of our data mining algorithms on it. The truth is there are privacy enhancing data mining methods that allow people to leave the data there. There are techniques to collect the statistics you need through a combination of cryptographic methods and passing around the statistics rather than the data," Mitchell says.
"Suppose we had six hospitals and we wanted to run some data mining algorithms to see which symptoms tend to lead to a particular treatment being successful. In the inner loop of that algorithm there's a repeating set of questions being asked, like how many patients are under 12. You can answer those questions without sharing the data," he says.
"You make up a random number, a really big number like 6453522, and you give it to hospital one and ask, 'How many of your H1N1 patients are under 12?' They pass that number plus that random number -- the total -- to the next hospital. The next hospital gets that random number and they add in theirs and so forth. When it comes back to you, you subtract the random number you started out with and you've got the total. No hospital had any information about any other hospitals' data. There was no information shared at all, even at the level of how many patients satisfied the criteria, and yet at the end, because you knew the random number, you could figure out the total. You don't know anything about the individual hospitals," says Mitchell.
"That's an example of privacy-preserving ways of collecting statistics," according to Mitchell.
Ten years ago no one even thought about this kind of approach, Mitchell says. "But because of the increasing interest in privacy, people are trying to design algorithms like that to achieve the same result without the privacy implications."
It takes a technologist to explain that to policy makers.
"Most of the discussions I've heard in Washington about privacy and mining personal data are just not informed about the existence of these techniques. So, it's really important that technologists insert themselves into this discussion to make sure that when we're weighing all of the tradeoffs we're informed about what the options really are," Mitchell says.