Spam not only clogs inboxes and wastes users' time; it often slows the delivery of legitimate e-mail due to the sheer volume of junk passing through corporate servers. And as the "arms race" between spammers and spam filters keeps ratcheting up, mail delays are likely to worsen as e-mail scanning becomes more complex.
As a result, some companies could be forced to invest in more robust hardware to keep mail flowing at acceptable speeds, researchers from HP Labs in Bristol, England, told the 2004 Usenix Annual Technical Conference in Boston.
The Good and the Junk
But there's a less expensive way to speed legitimate messages through the system: by classifying them as probable "good" or "junk" mail before they're sent to be scanned, according to a paper presented Monday afternoon titled "E-mail Prioritization: Reducing Delays on Legitimate Mail Caused By Junk Mail."
After analyzing weeks of incoming e-mail, researchers discovered that servers tend to be "faithful," said Matthew M. Williamson, formerly with Hewlett-Packard and now at San Mateo, California-based Sana Security. In other words, if a server sent a good message before, it's likely to send a good one next time, too; If it sent junk before, chances are high that it's sending junk again. "New" servers that have no history of sending mail to your system before are probably sending spam.
Servers can detect the sending IP address from a message header, before the full message is scanned for viruses or spam content and sent for delivery, so those likely to be deemed "good" can go to the head of the queue for processing.
By keeping data on just 10 prior messages per e-mail-sending server and setting a threshold of at least 50 percent good messages from a server, the researchers were able to correctly predict junk mail 95 percent of the time and good messages 74 percent of the time. While not accurate enough for a spam filter, said HP's Dan Twining, "it's good enough for what we want to do"--assign messages for high- or low-priority delivery.
The "junk" messages aren't spiked; rather, they're sent to the end of the queue for processing.
After testing their system at HP in Bristol, researchers found minor delays for good messages of perhaps two to three minutes at high loads, while all messages were delayed for more than 10 minutes without the preprocessing. In the real world, Williamson noted, e-mail was delayed for four hours during the worst of the Sobig worm attacks.
While one audience member questioned whether data from a single lab server could be used to create algorithms that apply to numerous other enterprises, attendees seemed interested in the concept as another possible weapon in the battle against spam. Researchers said the preprocessing was meant to work with existing filtering technologies, not to replace it.
The paper was one of several presented on issues surrounding the topic "Swimming in a Sea of Data," with several others focusing on ways to maximize storage efficiency by detecting redundancies in stored data. The conference, which runs all week, will host sessions on a variety of technical issues such as network performance, security, privacy and free desktop software.
This story, "HP Labs Develops Spam ID System" was originally published by Computerworld.