Let's hope the NSA hasn't actually used this machine-learning model to target drone strikes
The data set used to train it was 'totally inadequate,' one expert says
The U.S. National Security Agency could be relying on a seriously flawed machine-learning model to target drone strikes in Pakistan, according to a new analysis of slides uncovered last year by whistleblower Edward Snowden.
Published last May by The Intercept, the slides detail the NSA's so-called Skynet program, in which machine learning is apparently used to identify likely terrorists in Pakistan. While it's unclear if the machine-learning model has been used in the NSA's real-world efforts, it has serious problems that could put lives at risk if it were, according to Patrick Ball, director of research at the Human Rights Data Analysis Group.
"I have no idea if any of this was ever used in actual strikes or even made it into a meeting," Ball said Monday. But "nobody rational would use an analysis this crappy for making decisions about who to kill."
Dating back to 2012, the slides describe the use of GSM metadata for behavioral profiling of 55 million cellphone users, including factors such as travel behavior and social networks. Equipped with that data, the model aims to predict which people are likely to be terrorists.
It's no secret that the United States has been using unmanned drones to attack militants in Pakistan over the past decade. Between 2,500 and 4,000 Pakistanis have been killed by drones since 2004, according to the Bureau of Investigative Journalism, a nonprofit news organization. Many of those killed were members of groups such as al Qaeda, the organization said.
General Michael Hayden, former director of the NSA and the CIA, has stated the connection explicitly: “We kill people based on metadata.”
Particularly troubling, however, is that drones have reportedly killed more than 400 civilians -- possibly more than 900 -- along the way.
That's where the model's specific failings become relevant. First and foremost is that the NSA didn't use nearly enough data about known terrorists to be able to train the model to distinguish terrorists from other people with any reasonable level of accuracy, Ball explained.
In fact, the model was trained using data about just seven known terrorists, according to the slides. "That's totally inadequate," Ball said.
The algorithm itself is fine, he said, but the paucity of data used to train it leads to an unacceptably high chance of "false positives," or innocent people classified as terrorists. It it were actually used to direct drone attacks, that would mean the loss of innocent lives.
The NSA is "not stupid, and this is a stupid piece of analysis," Ball said. "My guess is that this was someone in technical management at NSA selling it up the chain, but it didn't really work -- it's a failed experiment."
That's not to say that drone strikes aren't going on, or that the possibility that a model like this might be used to direct them isn't concerning.
"Yes, there are drone strikes in Pakistan, and yes, they kill innocent people -- these things are not in dispute," Ball said. But in the case of this model, "all we know is what's on a few slides, and that's worrisome."
The NSA did not respond to a request for comment.