KDnuggets : News : 2003 : n16 : item18 < PREVIOUS | NEXT >


Bayesian filtering against spam?

A Bayesian filter uses statistics and probability theory to analyze the entire message instead of focusing on key terms, and it does not rely on an artificial scoring system. The user teaches the filter to recognize spam by classifying emails as such, while the filter itself extracts rules from those classifications that enable it to evaluate new messages. Self-employed software engineer Paul Graham, who developed a practical open-source deployment of the Bayesian filter, says the program's accuracy is boosted because it takes into account not just words that frequently pop up in spam, but those that do not. The Bayesian filter was also incorporated into the MSN8 Internet reader from Microsoft, and will be included in the upcoming version 11 of Microsoft Outlook.

Steven Curry of EarthLink states that the elimination of false positives is more likely if humans are kept within the equation, and advocates an approach in which people study email first and confirm if it is spam, adding such recognition to the filtering protocol. Alternative strategies to controlling spam, such as anti-spam legislation, are hampered by the lack of a clear definition over what constitutes spam, while Jupiter Research analyst Jared Blank argues, "The true problem is that spam is effective."

See IEEE Spectrum (08/03); Vaughan-Nichols, Steven J.

KDnuggets : News : 2003 : n16 : item18 < PREVIOUS | NEXT >

Copyright © 2003 KDnuggets.   Subscribe to KDnuggets News!