In Search of the Perfect Spam Filter
BOSTON--The roughly 500 programmers, researchers, hackers, and IT administrators who gathered in a chilly classroom on the campus of the Massachusetts Institute of Technology Friday aren't just looking to slow the relentless onslaught of spam--they want to completely destroy its business model.
Their aim is to find a
"Spamming is a business, and the theft efficiency ratio is the same as
stealing hubcaps," said programmer William Yerazunis, speaking at what is
thought to be the first
Yerazunis wrote a language for writing filters based on the Bayesian system, which assigns statistical probabilities to whether or not an e-mail message is spam. The language is called CRM114, and he wrote a filter program in CRM114 called MailFilter.
At the conference at least, MailFilter was being seen as a potential
weapon for battling the
In tests Yerazunis performed, MailFilter was 99.915 percent accurate in identifying spam.
"I'm only 99.84 percent accurate at identifying spam, so this is much more accurate than I am," Yerazunis quipped.
MailFilter is still in alpha testing, however.
Still, Spam Conference organizer Paul Graham said he is extremely excited about Yerazunis' solution.
"Bill's filter looks like the most promising," Graham said.
Graham himself is a big proponent of filters based on the Bayesian system and he has written his own research report on the subject called "A Plan for Spam."
His paper, released last August and
Graham has also written his own Bayes-based filter.
"I believe in filters because I personally do not have a spam problem," he said.
Graham added that the idea that filters alone could thwart spam did not get serious discussion until about a year ago. However, both Graham and Yerazunis believe that if there is widespread adoption of filters that are accurate enough to make spamming economically prohibitive, the problem will cease without the need for legislation or other measures.
According to Yerazunis, spam filters need to be at least 99.5 percent accurate to push the cost of sending bulk unsolicited e-mail to about the same as it is to send direct snail mail, making it a far less attractive method for sending solicitations.
The problem, of course, is getting large ISPs like Yahoo, America
Online, and Microsoft to adopt the filters. As it stands now, each ISP is
taking its own approach. Most are applying technological solutions, while
others are fighting spammers
The Federal Trade Commission has
Still, representatives from all three companies registered for the conference and showed interest in hearing what new ideas were being batted around.
One of the perennial problems with any antispam system is deciding what is and what isn't spam. Whether something should be considered spam is often up to the user, and this makes building and employing filters especially tricky.
"The definition of spam is personal and spam is constantly changing," said Jason Rennie, an MIT student doing research on adaptive spam filtering.
Spam-fighters are hoping to collect as much spam as possible so they can perform analysis and research on the features that make up spam.
Paul Judge, a representative for e-mail security firm CipherTrust, said his company is collecting a spam archive for this purpose. Over the last two months the company has collected 250,000 pieces of spam, and is on track to have 1.5 million pieces within the first year, he said.
"Spam messages are starting to look more and more like nonspam messages," Judge said, adding that analysis is becoming even more important.
While CipherTrust is building its spam archive, Chicago-based programmer Philip Tom was at the conference, handing out he called "a day of spam"--a disk containing 250,000 spam e-mail messages.
Tom said he has an archive of over 50 million spam messages, and receives 250,000 a day from an undisclosed source.
Most people don't understand why he is collecting and analyzing spam, but it provides an interesting project for him, he added. While he might sell the archive for research purposes, he also thinks he might just hand it over "for the greater good" of eliminating spam.
"One thing I can tell you is that spam is growing exponentially," he said, noting that when he started his archive two years ago he received 10,000 daily, compared to the quarter million spam messages he receives each day now.
The sheer amount of spam has made fighting unsolicited commercial e-mail one of the top goals of the technology industry recently.
But when Graham was asked whether he was planning another conference on spam, given the success of this one, he said no.
"Hopefully we will solve this problem and we won't need another conference," Graham said. "I don't want to be working on the spam problem ten years from now!"