Just three small clues—receipts for a pizza, a coffee and a pair of jeans—are enough information to identify a person’s credit card transactions from among those of a million people, according to a new study.
The findings, published in the journal Science, add to other research showing that seemingly anonymous data sets may not protect people’s privacy under rigorous analysis.
“The fact that a few data points are enough to uniquely identify an individual was true in credit card metadata,” said Yves-Alexandre de Montjoye, an MIT graduate student and one of the study’s authors.
Montjoye and his colleagues analyzed credit card transactions provided by an unnamed major bank from 1.1 million people over a three-month period in some 10,000 stores.
They were trying to see how much data they needed to identify a person’s transactions from a larger set of transaction records. Absent from the data were names, addresses, email addresses and other personal information.
Ninety percent of the time, the researchers could pick out an individual using just four pieces of data, such the locations where four purchases were made. Adding price information—for example, purchase receipts—allowed them to identify a person with just three transactions.
They could also identify individuals from “one receipt, one Instagram photo of you having coffee with friends, and one tweet about the phone you just bought,” they said.
“The fundamental scientific question is one of our human behavior,” de Montjoye said. “It’s really how our behavior compares with that of others and eventually makes us unique and identifiable.”
The researchers didn’t try to actually identify particular individuals, but instead to figure out on average how much data would be needed to narrow transactions down to a person.
“We did not try to find a specific person on purpose,” he said.
The latest research adds to a 2013 study de Montjoye co-authored that showed that four data points, such as place and time, were enough to identify a person from a mass of mobile phone records 95 percent of the time.
The research highlights the regulatory and policy challenges around anonymity, de Montjoye said. Legally, society relies on a definition of anonymity—such as removing names and email addresses from records—that is widely believed to provide protection.
“What our study shows is that this is not enough to prevent identification,” he said.
The other way to define anonymity, endorsed by the European Union, is that data must be “provably” anonymous, and make it impossible to identify an individual under any circumstances.
Verifying that condition is difficult, de Montjoye said. In addition, scrambling the data too much may prevent novel and legitimate uses, such as studying consumption patterns or inflation. But people should be aware of the potential risks of identification.
“I don’t think it’s ever going to be 100 percent safe, but there are steps that can be taken,” he said.