The latest spam

Submitted by Larry on 23 May 2008 - 7:54pm

I have been running Mollom as my spam-fighter on this site for not quite two months now. It's been fairly effective overall. The nifty flash meter shows me just how bad the spam problem is (good grief, 593 blocked spam messages just on 15 May!), and I haven't gotten any spam in my comment list yet.

That is, until today, when a new form appeared.

You won't see them, because I already deleted them. However, earlier today I noticed a few comments that didn't quite make sense in context. They were on blog posts that were already several months old (not itself unusual). They were well-written, on-topic, and had no links in them. But they seemed, well, familiar.

They were, in fact, duplicates of previous comments on the same story; in one case, it was the first paragraph of the blog post on which it was submitted. So I checked the home page link for the submitter. It wasn't your typical spam site, at least I don't think so, but it did look, well, overly commercial. Lots of banner ads (no porn, fortunately), links to some sort of YouTube service... Hm. Still, they added nothing of value so I simply canned them.

Is this the next form of comment spam? Filtering that, even with a system like Mollom that supports a "sausage" status for questionable content, will be quite difficult even if it's possible. Perhaps it's time for a reputation system? Hmmm...

I've seen this method used to spam Drupal.org a lot frequently. Some of the more sophisticated bots will actually meld several posts together to form something new. It can be very hard to detect, especially when a site has a lot of "english as a second language" users.

I've been fighting the same problem on my blog for a couple months now and (just this weekend) finally had the free time to write a similar post/rant -- but now I can just write this comment. ;) I am getting the same three or four "spam" sites as the user's homepage link which makes me wonder if it may be a few human "bots" spamming me.

CAPTCHAs, first of all, are bad for usability. That's well documented in hundreds of places. And the more "secure" the CAPTCHA, the harder it is for real people to use.

Also, many many CAPTCHAs have already been broken. It's just image matching, and not very complex image matching at that. OCR software has been doing it for years. I saw a demonstration a while back of a relatively simple CAPTCHA breaker written in PHP. It's not a conceptually difficult problem to solve.

Mollom also falls back to CAPTCHA for sausage, in case it isn't sure if something is spam or not. The problem with this new breed is that it looks like real content, because it is real content, so content analyzers don't realize that it's actually just an excuse for a link in the "posted by" field.

Addendum: Because there's a link in this comment, Mollom flagged it as sausage. Ironically, the link is to Dries' web site, talking about Mollom. Awesome. :-)

I have been running Mollom as my spam-fighter on this site for not quite two months now. It's been fairly effective overall. The nifty flash meter shows me just how bad the spam problem is (good grief, 593 blocked spam messages just on 15 May!), and I haven't gotten any spam in my comment list yet.

. . .

Ok... I've had my fun. I'm just happy to see you like Lyle's term for questionable posts. ; )