MACHINE SHOP
How to Filter with Finesse
How do you keep legitimate messages from getting swept into the spam box?
By Simson Garfinkel
With
somewhere between 80 percent and 95 percent of all Internet messages
now consisting of spam, phishing attacks and e-mail based worms,
organizations have been forced to filter their incoming mail more
aggressively than ever before.
E-mail filtering systems are
faced with the Herculean task of separating out the bad mail from the
good—and doing it fast enough so that the mail doesn't back up.
Computer scientists call this a "recognition task" and say that e-mail
filtering is one of the most challenging of such tasks that's ever been
devised. The job just keeps getting harder and harder every day, as the
bad guys adapt their messages to make them more closely resemble
legitimate mail. The cost of letting through a bad message can be high:
A single phishing message, successfully passed to one of your users,
can compromise your internal systems. Even ordinary spam annoys your
users and can cause employees to miss important messages. Spam must be
stopped.
But as a CSO, you have another job as well: You need
to make sure that your organization's e-mail filtering system isn't
filtering out the wheat with the chaff. That's because the cost of a
legitimate mail message gone missing can be equally high. More and
more, I hear of business opportunities that fall through because an
unexpected e-mail message was delivered to a junk mail box or silently
dropped. The senders of these messages thought that the recipients
weren't interested, but in fact the messages simply never arrived.
Legitimate mail must not be stopped. So even as CSOs filter
aggressively, the trick is to find the right combination of tools and
techniques to keep the real messages from getting swept into the spam
box. While many options are available today, a closer examination leads
me to think that ultimately digital signatures will prove necessary if
we are going to keep spam from turning e-mail into a nonviable
communication medium.
Rejection Rates
There are
several technical metrics that a CSO can use when trying to evaluate
mail-filtering systems. Two of the most common are the false-negative
and false-positive rates. A false negative might be a piece of spam
that the system lets through that it shouldn't; a false positive is a
message that is blocked, even though it is legitimate.
Unfortunately,
there aren't a lot of reliable sources for these metrics. In part,
that's because the metrics are different for every user of a given
filtering system. A user who has posted her e-mail address to a popular
online discussion will receive a lot of spam, and probably a greater
variety of spam, than a user who has been circumspect with his online
identity.
As a result, that first user will probably see more
messages sneaking through the filters and therefore experience a higher
false-negative rate. Also, filtering systems do not behave in a
consistent fashion. In one case I am familiar with, a researcher at MIT
sent two separate one-line text messages to a collaborator in Europe.
One of those messages ended up in the collaborator's spam box; the
other went to the inbox. How do you debug randomness like that?
Earlier
this year, a company called Pivotal Veracity published a 33-page study
on the prevalence of false positives in antispam systems operated by
three of the largest Web mail providers. Pivotal Veracity is itself an
e-mail consulting service (more on it later), which might have led some
people to dismiss the research. But the company's methodology was very
good and the results were interesting and important.
To conduct
its study, the researchers at Pivotal created Web mail accounts at
Google, Microsoft and Yahoo. They then signed up for e-mail newsletters
from 100 randomly chosen corporations, nonprofits and governmental
agencies. Then, over the next six weeks, the researchers checked the
mailboxes to see if mail from those senders got delivered to the inbox
or the spam box.
What makes this kind of study challenging is
that the researchers didn't know if the organizations were actually
sending mail. In fact, mail was received from only 90 of 100 purported
sending organizations. This could be because the other 10 never sent
mail at all. Alternatively, it could be because the incoming mail was
deemed so offensive that it was just dropped, rather than put in a spam
box. For example, some e-mail providers will automatically drop
phishing attacks directed against PayPal or Citibank customers: The
attacks seem so authentic that users will see the messages in their
spam folders, assume that the antispam software has made a mistake and
click on the links! So the e-mail providers prevent such messages from
even getting to the spam box. It's conceivable (however likely or
unlikely) that this kind of aggressive filtering could explain some of
the 10 missing newsletters.
Unless our systems
for filtering mail get dramatically better, we face the very real
possibility that e-mail might become a lost communication medium.
Ignoring
that missing 10 percent, Pivotal found that e-mails from 54 percent of
the legitimate mail senders—real companies from whom mail had been
requested—had at least some of their messages identified as spam and
delivered to the Web mail provider's junk mail box. This is an
astoundingly high false-positive rate! Even more troubling was how
those numbers break down.
Pivotal evaluated the effectiveness
of the sender policy framework (SPF) e-mail sender authentication
system. SPF allows organizations to electronically publish the IP
addresses of their e-mail servers: Antispam systems can then reject
incoming mail if it doesn't come from the correct server. Citibank, for
example, publishes an SPF record for its domain Citibank.com, which
states that mail from that domain should come from the IP address
beginning 192.193.195 or 192.193.210, or else that e-mail should be
discarded. In theory, this prevents a hacker in Yemen from sending
e-mail that claims to be from Citibank—at least, with SPF it is
possible for a mail filter to automatically detect this mail and
discard it.
But what Pivotal discovered is that SPF had no
impact on whether e-mail was identified as spam. Roughly 75 percent of
companies in the survey implemented SPF, but there was no significant
difference in the false-positive rates of those companies sending mail
with SPF and those that did not.
Another troubling finding is
Pivotal's evaluation of the so-called e-mail accreditation programs,
such as those offered by Truste and Bonded Sender. These programs are
designed to help businesses improve the chances that their mail will
actually get delivered. But Pivotal found that companies that used
Truste or Bonded Sender actually had a higher incidence of false
positives than those that did not subscribe—57 percent of Truste
members suffered false positives, as did 55 percent of Bonded Sender
users, versus 53 percent of nonusers.
Pivotal's assessment of
Bonded Sender and Truste has come under some criticism. For example,
only 13 of the 100 companies that Pivotal tested were members of Bonded
Sender. Also, the Bonded Sender seal is ignored by Gmail and Yahoo—only
Hotmail honors it. While these might be valid criticisms of the study,
the criticism itself seems to imply that Bonded Sender and other seal
programs are not tremendously significant in today's e-mail landscape.
"Accreditation is intended for those senders who have no reputation,"
says Deirdre Baird, Pivotal's president. But today, most antispam
systems base their decisions not on whether the sender is accredited
but on that sender's reputation. A sender's reputation is determined by
how users tend to respond to messages from that sender. That is, most
Web mail companies have a button that allows the recipient to "report
spam." If a lot of users report that messages sent by a particular
company are spam, then future messages from that company are more
likely to be identified as spam.
The problem with using this
kind of collaborative filtering approach to fighting spam is that
consumers click that "report spam" button for a variety of reasons. A
message might not be truly spam, but consumers might click "report
spam" because they want to unsubscribe from a mailing list for which
they've previously signed up. And since consumers repeatedly have been
told not to reply to spam or click "unsubscribe" links, really the only
thing left for them to do is click the "report spam" button.
Pivotal
calls itself a Deliverability Service Provider, working with
organizations that send e-mail to tailor messages and mailing practices
to increase the chances that mail will successfully get through. For
example, says Baird, Pivotal can work with e-mail designers to make
sure that HTML messages actually validate according to the standards
specified by the World Wide Web Consortium (e-mail with HTML that
doesn't validate has a higher chance of being marked as spam). Other
companies in this space include eDiagnostix, EnhanceRate, Piper
Software and Return Path.
The Way Forward
While many
choices are now available to help companies fight incorrect spam
filtering, the complexity and difficulty surrounding this issue is bad
news for CSOs. Spam is almost certain to increase in coming years, and
the hostility of e-mail-borne threats is sure to increase as well. But
unless our systems for filtering mail get dramatically better, we face
the very real possibility that e-mail might become a lost communication
medium—much in the way that CB radio was lost in the 1970s. But unlike
CB radio, businesses and individuals now depend on e-mail. Every time
e-mail becomes less effective, it costs us all dearly.
My
suspicion is that the only way to satisfactorily solve the problem of
e-mail reliability is for companies sending mail to sign their messages
with the S/MIME digital signature standard. Once that starts happening,
antispam systems can be modified to let through mail that is signed by
senders known to be legitimate. Spammers can make their content as
close to the content of a legitimate sender as they want, but they
can't fake a digital signature. As a result, digitally signed mail
probably represents our last, best chance for saving e-mail.
Simson
Garfinkel, PhD, CISSP, is spending the year at Harvard University
researching computer forensics and human thought. He can be reached at machineshop@cxo.com.
|
Most Recent Responses:
While I see how digital signatures would be a simple way out, I believe that the easiest way to not only filter spam, but eliminate (most of) it altogether would be to simply put a small charge on every e-mail sent. It would be small enough that any one user would pay maybe a dime per month (depending on how many messages they sent of course) but still high enough to discourage spam from being sent because it would cost so much to send thousands of e-mails. That would make e-mail better for everyone. With less spam, it would be significantly easier to filter what little was left so the false negative rate would surely decrease. A combonation of this, signature recognition, and individual users' own filters would almost guarantee that users would only recieve mail that they want.
Mike
Thanks for a great article. I just want to talk about two fundamental issues around spam that the article did not cover. One is that spam exists because it works; spamming is an extremely inexpensive way to get your message out to a wide audience and enough people respond to make the effort worthwhile. Please do what you can to persuade all your end users to simply say ‘no’ to spam. We can beat them if we work together. The other point I want to make is about the old SMTP protocol which harks back to a simpler more trusting time. We could control email messages so much more easily if the protocol were altered to include sender verification etc…Student
St. Laurence Catholic School
Email
Print
Tom Hartley
Corporate Information Security
Mellon Analytical Solutions
Print