Chalk this up in the "funny, but not really" category: Last week, a company working with Microsoft to combat copyright pirates asked Google to remove multiple Microsoft web pages from Google searches—for infringing Microsoft copyrights.
This wasn't a case of internal idiocy or revenge, and it's also not quite as amusing as it may appear at first glance. Instead, it highlights the harmful way copyright holders use automatically generated DMCA takedown requests to try to scrub the net of pirated content, casting a wide net that often ensnares innocent webmasters with false infringement claims.
If a copyright holder feels that a particular website is ripping off its work, it can send Google a DMCA takedown request and ask for the infringing site to be removed from the search engine. If Google determines that the site does indeed stomp on the copyright holder's intellectual property rights, the site's links disappear from Google Searches. So far, so good, right?
Copyright holders and the companies they hire to manage DMCA takedown requests—in Microsoft's case, a third party called LeakID—frequently automate the process, resulting in a flood of requests that are sometimes erroneous and aren't always checked for accuracy before filing.
These false requests are far from rare. Consider past Microsoft DMCA takedown requests that accidentally targeted the U.S. Environmental Protection Agency, the Department of Health and Human Services, the National Institutes of Health, TechCrunch, Wikipedia, BBC News, Bing.com, Google.com, and many others. Or HBO's attempt to remove links to the open-source VLC media player, or this big list of "DMCA notices so stupid it hurts," or Google's examples of the "inaccurate" DMCA takedown requests it has received over the years, or…
Over the past year, copyright holders such as Microsoft, the Recording Industry Association of America, NBC, Walt Disney, and others have started blasting Google with vast numbers of takedown requests. While Google used to receive around 225,000 DMCA requests per week, according to the company's own Transparency Report, copyright holders now hit the search engine with 3.5 to 4.5 million takedown requests each and every week.
Around the time of the ramp-up—August 2012—Google announced it would start penalizing sites that are repeatedly accused of copyright infringement, ranking them lower in search results.
Between January and July 2013, Google erased more than 100,000,000—that's 100 million—links from the web as a result of DMCA takedown requests. Torrentfreak reports that figure as already being more than twice the total number of links Google erased in all of 2012.
For its part, Google does appear to actively police the DMCA takedown requests it receives. Around three percent of DMCA takedown requests the company receives are rejected, and rejected URLs are listed on the Transparency Report's main copyright page. And yes, the folks in the Googleplex caught LeakID's attempts to scrub the Microsoft.com links before the six Office solutions pages disappeared from search results.
But few companies have Google's resources. The Safe Harbor provision of the DMCA rewards websites that "take down first and ask questions later," and for every amusing story like this one, there are dozens of other, more harmful false takedown requests . Also consider that if even just 1 percent of the 100 million-plus requests for URL removals catches an innocent page in the automated crossfire, that's already 1 million websites affected.
The Electronic Frontier Foundation filed a court brief in 2012 arguing that automated DMCA requests that aren't reviewed by actual humans should be considered negligent, therefore opening the requestor to sanctions. Nothing ever came from the attempt, however—and automated, unreviewed requests generated by Microsoft contractors are still trying to erase parts of the Microsoft.com website to this very day.
Update (7/30/13): A spokesperson for Microsoft sent us the following statement:
"We believe strongly in the effectiveness and the need for accuracy in the use of notice and takedown to address online infringement. To explain what happened here, Google’s online form requires identification of both the copyrighted content being infringed and the website address of the infringement. A vendor properly listed those six URLs as Microsoft copyrighted content that was being infringed, but then inadvertently copied and pasted those same six URLs in the field to identify the locations of infringement. This simple clerical error was identified and corrected right away, and we have taken steps to address the process to avoid it being repeated."