A few weeks ago I published a post regarding upgrading my blog to BlogEngine 1.6 and implementing it on .Net 4.0. The main reason was to get control over comment spam, and BE 1.6 has some improved comment spam controls. Let’s be honest, if it weren’t for comment spam, I may not have any comments. But I would at least like to stop blatant spam. Spam that says what a great writer I am, the ones that say I am intelligent and handsome, well those are probably legitimate comments.
My intent on this posting was to simply compare the Askimet spam report to what I see on the site. Did Askimet stop most of the spam? Did it flag legitimate comments as spam? However, this trial turned out to be a little more interesting than I expected. I was forced to consider what defines spam. With exceptions, spam tends to ramp up from 1-3 after posting. If your are Gutherie, Hanselman or many of the other intellectual celebrities, you probably don’t need to worry as much about spam. Within the first week, they have filled several pages with comments, and they can close the posting to comments – but they still keep comments open without spam (I hate them for that). That’s not where I am, and I want to keep my posts open for comments so I may be able to connect with someone with similar interests or challenges. Before I can evaluate Askimet, I need to know what spam is.
Comment spam is not like pornography. I may not know it when I see it. In fact, comment spam seems to be getting smarter. Spam bots used to be unimaginative, clearly promoting links to their target site and mentioning nothing about the posting. Spam comments will reference your blog engine (BlogEngine.Net in my case), but still not the posting. Much of the spam comments during the period of this experiment were of higher quality. Comments sometimes seem legitimate until you start getting the exact same post a week or two later, or you start to see multiple comments posted from different people using the same IP address. So how did Askimet do? Let’s look at the numbers.
It has been just 3 1/2 weeks since I upgraded my blog and published my posting about the upgrade. During that period I received 330 comments on the various postings. Askimet flagged 255 comments as spam, and 75 as legitimate comments. If Askimet is accurate, that is about 255 more spam comments than would have been caught prior to the upgrade. On the other hand 75 comments in 3 1/2 weeks tells me that either Askimet missed some spam or my audience has grown significantly over the last couple of months.
Legitimate Comments
Let consider how Askimet and BlogEngine faired in identifying legitimate comments. The first thing I am going to look for are obvious spam. For example, one recent post has an author name of “dating affiliates”, a website name of “howtoattractwomentips.com/how-to-get-your-ex-back-fast/” and the comment relates making money with affiliate programs rather than anything in the posting. While I can easily believe that people who offer tips on how to attract women could be software engineers, this smells like spam. Going through the legitimate comments, I identified 27 of the 75 (48 remaining) as blatant spam. I could have flagged several more, but we have more work to do. Link services that flood us all with comment spam identify a unique author and website, typically their customer, but the IP addresses repeat over and over again. When 10 comments from unrelated authors, emails and websites have the same IP address, it’s spam. Also, link services tend to use legitimate email addresses so they can monitor feedback. Therefore, email addresses often repeat themselves just like IPs. Clearing the duplicate IP’s and emails and I take out 29 of the 48 remaining. That leaves 19 comments.
Trust me, it gets pretty drafty when the kimono is this open. In 3 1/2 weeks, I have 19 not-blatant comments, but I’m not ready to call them all legitimate. This is where we need to decide what constitutes spam and what is legitimate. If a comment is not obviously spam but mentions nothing from your posting, is it spam? “Good to know… thanks for sharing”. “This is a great blog I really loved it. I am bookmarking and definitely come back”. Well, it is a great blog. What makes this spam? I believe these comments to be from spam bots, and that makes them spam. Two of the comments directly complain about the spam, and they should. I believe these comments are legit, although even these comments are not a stretch for a spam engine. Several of the others are likely spam, but I think I would rather have a few soft spams than risk deleting a comment from a legitimate commenter. I only have so many friends.
Spam Comments
After reviewing the hundreds of comments flagged as spam by Askimet, I did not find any that convinced me they were legit. There were a few that I wish were legit; they were very complimentary. But I was schooled in Georgia. When a posting compliments my grammar, I know it’s spam.
Grading Askimet and BlogEngine.Net Spam Controls
Askimet did a good job on the comments flagged as spam. I estimate that 100% of the comments flagged as spam were spam. Grade: A+
The track record on Legit Comments is not as good. Yes,if in doubt, a comment should not be flagged as spam. However, with 56 of 75 “legit” comments clearly spam and the remaining 19 comments iffy, the spam filters could do better. This is still not good enough. I still have a spammed blog, and I am still spending a couple hours per week cleaning out spam that could be spent blogging. Grade: D
Overall, I’ll give the spam filters a C. They did nothing wrong, but they did not do enough right.
This is Not Good Enough
I don’t mind maintaining my blog, but several hours of spam maintenance per week is not what I want. There are some alternatives.
- Captcha is an obvious control to put in place. BlogEngine provides captcha, but I have chosen not to use it. As an end user, I do not like using captcha, and the blogs I admire the most do not use captcha.
- Manual moderation is an option. I can review every comment and decide if it is spam. I can then be sure that only legit (or at least soft spam) comments are visible on my site. However, my couple of hours of maintenance per week would become a couple of hours per day. Did I mention I have a day job? Manual moderation is not an option.
- Turn off comments. Many people I know have chosen to turn comments off, but I blog because I hope to inspire conversation from time to time. Even if I get no legit comments, I will stop blogging before I stop accepting comments.
- I can change blog engines. I have noticed that BlogEngine and blogspot.com typically use captcha to block spam. DasBlog and SubText bloggers typically do not. In my post about upgrading to BlogEngine and starting this comment spam experiment, I stated that the upgrade was one last effort to get control over comment spam. This last effort was not good enough.
I must now decide if I plug in captcha (yech!) or change blog engines (double yech!).
Questioning Spam Engines
At the risk of speaking blasphemy, why aren’t comment spam engines smarter? Although comment spam seems to be getting smarter, it is still not smart. If link services were such a lucrative business, I would think they would have better spamming engines. As they crawl blogs, many are smart enough to identify the site as a BlogEngine blog, why can’t they identify the gist of the post? How hard would it be to scan the content of the blog to pick up some key words and form a comment that at least partially relates to the story? Automated real estate classified ads were doing this 15 years ago. Even the 2 comments mentioned above that complained of spam could be spam. A spam engine could post a “why all the spam” comment on any posting with more than 40 comments and be on the mark 90% of the time. Is this a business opportunity or are the margins so low in link sales that the development is not worthwhile? This is not a business that I would enter. I’m just saying, why aren’t they smarter?
27acacd6-a560-43db-9374-eecbfd6120ee|1|5.0