Fingerprinting Tech: Data Aggregators’ BFF
Using cookies to recognize people online and sync up data about them isn’t ideal, however. A cookie associated with a particular IP address might contain the browsing histories of multiple people in the household who use that PC. And cookies may not last very long in the browser: Security software is often set to delete cookies once a week. People in the online advertising industry call such deletions “cookie erosion.”
Naturally, companies are springing up with technologies that resolve these issues. New “fingerprinting” technologies rely on some highly sophisticated means to verify that the personal data collected at different sites at different times, and for different reasons, are all from the same consumer.
BlueCava, based in Irvine, California, has developed a “device ID” technology that identifies site visitors based on the unique combination of settings in their Web browser. The company then buys demographics, preference, and Web tracking data from site publishers all over the Web, and matches and adds that data to the identified users’ profiles in its database. It can then sell all that profile data to advertisers and marketers. BlueCava CEO David Norris says that his company’s technology can identify devices with 99.7 percent accuracy, and that it has already identified roughly 10 percent of the 10 billion Internet-connected devices in the world.
Fingerprinting Challenges Anonymity Online
Fingerprinting technologies like BlueCava’s give some in the privacy community serious pause. “I think device ID is really unethical,” says Kaliya Hamlin of the Personal Data Ecosystem Consortium. “It’s one thing to put cookies in your browser, because you can throw them out; but a device ID is permanent, and takes away your means of defining context in your digital life.”
Hamlin believes that device ID degrades privacy by taking away our ability to use alternate identities online to keep assorted aspects of our digital lives separate.
In the physical world, Hamlin points out, we can use physical distance and time to separate the various contexts in which we operate. We can get in the car and drive to our kids’ school for a teacher conference, then drive across town to an AA meeting, and maybe participate in a hobby on the weekends. The info we give out in each of these contexts stays separate because we give it to different people at different places at different times.
But online, Hamlin notes, those firewalls just don’t exist. Instead, to stay anonymous, people rely on various nicknames and avatars at the sites they frequent. But device ID defeats this practice. Device ID concerns itself with the device and the browser people use to access websites, not the identities they set up there. It ties all those identities together into one big profile.
“Device ID is almost like the police putting GPS trackers on cars, which the Supreme Court just ruled illegal [in United States v. Jones],” Hamlin says. The one difference is that a driver can remove a GPS tracker, but a device ID is established far away, so a computer user can’t easily remove it.
BlueCava’s Norris counters that his company will remove a device ID from its system if a consumer requests it at the company’s website. Norris says that this accommodation is more privacy-promoting than Do Not Track for cookies, because, he says, Do Not Track cookies can easily be deleted in the browser (by the user or by antivirus software), but the deletion of a device ID is permanent.
The problem, however, is that most people will never even know that a device ID exists for them.
‘Big Data’ Analysis Infers a Lot From a Little
So-called Big Data is one of the few big concepts that will define technology and culture in the first part of the 21st century. The term refers to the capture, storage, and analysis of large amounts of data. This can mean any kind of data, but the term often refers to the collection and analysis of personal data.
Running deep analysis of terabytes of data was perhaps pioneered by Google, but Big Data practices are now in place at all kinds of organizations, from law enforcement to dating sites to UPS to Major League Baseball. IDC (owned by the same parent company as PCWorld) says that the $3.2 billion that companies spent on Big Data in 2010 will grow to $16.9 billion in 2015.
Among people involved in the personal data economy in one way or another, one anecdote comes up over and over again, and beautifully demonstrates both the possibilities and the dangers of Big Data.
A story by Charles Duhigg in the New York Times Magazine in February described how analysts in the predictive data department of Target developed a way to use the company’s customer data to predict the pregnancies (and future baby product needs) of its female customers, sometimes even before the woman’s family knew she was pregnant.
This was an extremely important discovery for Target because it allowed the company to show the women ads for various baby products timed to each phase of the pregnancy. There was an even bigger bonus. During the stressful months of pregnancy, future moms’ and dads’ normal buying habits frequently go out the window, and they look for the most convenient place to buy everything. If Target could get the women into its stores to buy baby products, it might become their go-to source for all sorts of products.
The Target analysts got their breakthrough by looking at the buying histories of women who had signed up for new baby registries at Target. The analysts noticed that pregnant women often bought large amounts of unscented lotion around the start of their second trimester, and that sometime during the first 20 weeks of their pregnancies they bought lots of supplements like calcium, magnesium, and zinc.
The analysts then searched for these same “markers” in all females of childbearing age, found the likely moms-to-be, and sent them offers and coupons for baby products carefully timed to the various stages of pregnancy. Ka-ching.
This is a relatively simple example, and one that happened to be reported in the media. But, as the Duhigg article points out, most large companies in America now have “predictive analysis” departments and are learning to look for the kind of markers that Target discovered hidden in its data.
Next: Big Data Puts Privacy in a New Light