Quantcast

In Search of the Perfect Spam Filter

Techies immerse themselves in spam to craft a filter that renders mass e-mail marketing ineffective (and undesirable).

Scarlet Pruitt, IDG News Service

  • 0 Yes
  • 0 No

BOSTON--The roughly 500 programmers, researchers, hackers, and IT administrators who gathered in a chilly classroom on the campus of the Massachusetts Institute of Technology Friday aren't just looking to slow the relentless onslaught of spam--they want to completely destroy its business model.

Their aim is to find a spam filter so effective that spammers would receive few, if any, responses, making sending unsolicited bulk e-mail a financially prohibitive task.

"Spamming is a business, and the theft efficiency ratio is the same as stealing hubcaps," said programmer William Yerazunis, speaking at what is thought to be the first Spam Conference ever focused on spam filters.

But the high payoff for sending spam could change if an e-mail filter like the one Yerazunis pioneered becomes widely adopted by large ISPs.

Work in Progress

Yerazunis wrote a language for writing filters based on the Bayesian system, which assigns statistical probabilities to whether or not an e-mail message is spam. The language is called CRM114, and he wrote a filter program in CRM114 called MailFilter.

At the conference at least, MailFilter was being seen as a potential weapon for battling the escalating spam problem.

In tests Yerazunis performed, MailFilter was 99.915 percent accurate in identifying spam.

"I'm only 99.84 percent accurate at identifying spam, so this is much more accurate than I am," Yerazunis quipped.

MailFilter is still in alpha testing, however.

Still, Spam Conference organizer Paul Graham said he is extremely excited about Yerazunis' solution.

"Bill's filter looks like the most promising," Graham said.

Graham himself is a big proponent of filters based on the Bayesian system and he has written his own research report on the subject called "A Plan for Spam."

His paper, released last August and available online, has generated a lot of discussion within the spam-fighting community.

Graham has also written his own Bayes-based filter.

"I believe in filters because I personally do not have a spam problem," he said.

Graham added that the idea that filters alone could thwart spam did not get serious discussion until about a year ago. However, both Graham and Yerazunis believe that if there is widespread adoption of filters that are accurate enough to make spamming economically prohibitive, the problem will cease without the need for legislation or other measures.

According to Yerazunis, spam filters need to be at least 99.5 percent accurate to push the cost of sending bulk unsolicited e-mail to about the same as it is to send direct snail mail, making it a far less attractive method for sending solicitations.

Cooperation Needed

The problem, of course, is getting large ISPs like Yahoo, America Online, and Microsoft to adopt the filters. As it stands now, each ISP is taking its own approach. Most are applying technological solutions, while others are fighting spammers in court.

The Federal Trade Commission has launched a campaign against fraudulent business promotions distributed through spam. Congress is considering antispam legislation that imposes penalties for spam.

Still, representatives from all three companies registered for the conference and showed interest in hearing what new ideas were being batted around.

One of the perennial problems with any antispam system is deciding what is and what isn't spam. Whether something should be considered spam is often up to the user, and this makes building and employing filters especially tricky.

"The definition of spam is personal and spam is constantly changing," said Jason Rennie, an MIT student doing research on adaptive spam filtering.

Spam-fighters are hoping to collect as much spam as possible so they can perform analysis and research on the features that make up spam.

Paul Judge, a representative for e-mail security firm CipherTrust, said his company is collecting a spam archive for this purpose. Over the last two months the company has collected 250,000 pieces of spam, and is on track to have 1.5 million pieces within the first year, he said.

"Spam messages are starting to look more and more like nonspam messages," Judge said, adding that analysis is becoming even more important.

Messages in the Millions

While CipherTrust is building its spam archive, Chicago-based programmer Philip Tom was at the conference, handing out he called "a day of spam"--a disk containing 250,000 spam e-mail messages.

Tom said he has an archive of over 50 million spam messages, and receives 250,000 a day from an undisclosed source.

Most people don't understand why he is collecting and analyzing spam, but it provides an interesting project for him, he added. While he might sell the archive for research purposes, he also thinks he might just hand it over "for the greater good" of eliminating spam.

"One thing I can tell you is that spam is growing exponentially," he said, noting that when he started his archive two years ago he received 10,000 daily, compared to the quarter million spam messages he receives each day now.

The sheer amount of spam has made fighting unsolicited commercial e-mail one of the top goals of the technology industry recently.

But when Graham was asked whether he was planning another conference on spam, given the success of this one, he said no.

"Hopefully we will solve this problem and we won't need another conference," Graham said. "I don't want to be working on the spam problem ten years from now!"

  • Recommend this story?
  • 0 Yes
    0 No
 

Featured APC Accessories

  • APC Back-UPS ES Safeguards your equipment from damaging surges and spikes that travel along your utility & data lines.
  • APC SurgeArrest Performance Highest level of protection for your professional computers, electronics and connected devices, as well as provides surge protection.

People who read this also read:

Sponsored Links