iPedia.net
Home - Information - Dictionary - Articles - Video - Web

Stop Spam With A Bayesian Filter

What is a Bayesian filter and how does it work? Learn how a Bayesian Filter can help you stop spam.


One of the most effective ways to stop the spam emails from filling up your inbox is the use of some form of Bayesian filter. The term (pronounced Bays - ee - en) has become a popular method of stopping spam, or filtering the 'spam' from the 'ham'. So how does it work?

Without knowing it, I had developed my own Bayesian filter long before I ever got active in my spam crusade. I got bombarded with emails regarding financing my house. At the time, I was 17 and a long way from thinking about purchasing a house. I couldn't think of a single reason that any e-mail I received would have the word 'mortgage', so I created a filter that sent every email with 'mortgage' in it to my trash. Later I would employ the same filtering technique on other triggers such as 'Viagra'. For a while the filters did the trick.

But spammers aren't stupid. Viagra became V1agra, mortgage became m0rtgage and my simple filters were quickly made redundant(Well, not redundant, they still stop hundreds of messages, but they miss hundreds more).

Another problem that arose from this method was the possibility of an email I did want containing the blacklisted word. As my friends began getting married and buying houses the chance of emails getting deleted on the basis of my 'mortgage' filter increased.

Bayesian filters take this basic filtering concept a step further. Rather than simply trashing a message on a single word, they assign scores to words, and combinations of words, they assign a score to each word and calculate the average across the entire email. This of course only works if the filter knows what words appear in spam regularly and what words don't. Thus filters need to be 'trained' with a number of messages processed by a user, and the filter assigning ratings to words that appear in these emails based on whether they were marked as spam or non spam.

As more and more e-mails containing the word 'Viagra' are marked as Spam, the filter will assign a higher and higher 'spam value' to it. As more and more emails that contain the words 'internet marketing' are marked as non spam - the words will get a progressively lower score, to the point that 'internet marketing' appearing in an email is as good an indication that an email is not spam as 'viagra' is that it is.

After the training period the Bayesian filter will usually filter around 99% of spam effortlessly. But it does have another advantage beyond the native efficiency of it's spam filtering. Many methods of filtering spam result in 'false positives', such as the example I cite above of my friends buying houses and mentioning banned words such as mortgages. The Bayesian filter combats this in two ways. Firstly, the more training you give a Bayesian filter, the more it becomes individualised to the mails you want to receive. While the word 'breasts' would usually attract a fairly high 'spam value', for a doctor specialising in breast enhancement, or breast cancer it would be quite a common feature in legitimate emails.

Secondly, the end result of a Bayesian filter analysis is not a pass or fail, it is a 'likelihood of spam'. The filter does not say 'this is spam', rather - 'this is 98% likely to be spam'. The distinction is important when dealing with false positives. Firstly, if a user is experiencing false positives they can lower the sensitivity of their filter, meaning that it will treat emails with 70% chance of being spam as spam, rather than 90% chance etc. Along with avoiding false positives this will of course let more spam through, but even this has it's advantages. The more messages that are marked as spam, the more highly trained the Bayesian filter becomes at recognising them.

Overall the Bayesian filter is probably the best single tool we have in the fight against spam.

With an estimated 70-90 billion spam messages sent every day, the problem is not going away. Don't wait for someone to solve the problem for you, visit The Stop Spam Now Site and review the very best methods of stopping spam.


...click on link for more information and related articles.


AddThis Social Bookmark Button    Digg this article.

Other articles

Article Categories
 

uDic.us - Albums, songs and lyrics - Music & Cinema Encyclopedia
All content on this website, including articles, information, pictures, dictionary, and other reference data is for informational purposes only. This information should not be considered complete, up to date, and is not intended to be used in place of a visit, consultation, or advice of a legal, medical, or any other professional.
page counter