« Spam Exploration with R | Main | Back to the Pacific »

It's Spam, but Is it Normal?

After looking at the relationship between spam rules and spam scores in my private spam collection, I decided it was time to look at how spam rules and spam scores are distributed.

The first thing I did was create a histogram of spam scores. It looks a bit like the normal distribution, but the spam threshold makes it hard to determine, given that we're only getting spam scores over a certain threshold.

But what about the number of rules triggered? Does it conform to the normal distribution? Again, it's close, but it is somewhat skewed to the right.

But how much does it depart from the standard distribution? And just what kind of distribution is it, anyhow?

TrackBack

TrackBack URL for this entry:
http://prospero.bluescarf.net/cgi-bin/mt/mt-tb.cgi/11

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)

About

This page contains a single entry from the blog posted on August 29, 2007 8:24 AM.

The previous post in this blog was Spam Exploration with R.

The next post in this blog is Back to the Pacific.

Many more can be found on the main index page or by looking through the archives.

Powered by
Movable Type 3.35