I hate spam. Of course, I imagine the overworked, underpaid dupes in Pakistan dishing it out at 5¢ per hundred comments don't particularly like it much either. It's just their job.
So anyway, about a year ago, the spam on this site was getting a bit out of control. Fortunately, Mollom had just whipped out their new, free spam-blocking service about the same time, so I gladly installed it. As you can see in the graph below (the orange being 'Spam attempts blocked'), this has been a fantastic boon for the site, with over 700,000 spam attempts blocked in the past year.
Looking at that graph, you can see the spam attempts really dropped off sometime in April or May. I really don't know why; if anything, the traffic to this site has steadily increased over the year. I suspect that whatever methods spammers were using were not paying off as well, perhaps in part due to the diligence of the great folks over at Mollom?
How it's been fairing lately...
The purpose of this post is not to speculate about such things. The purpose is to zoom in a bit to the past two weeks. Let's do that now:
Here you can see in more detail recent activity. Notice the little green lumps at the bottom, which represents "Ham (not spam) operations accepted".
Well, not quite. Each of those little green lumps actually represents about 20 minutes a night filtering through all my comments and deleting a bunch of new spam that's managed to bypass their filters. Not really how I want to spend my free time.
I don't know what Mollom thinks about it all, or what new things they have planned (which I suspect they do). But I did decide to review my options and add a new line of defense to things.
First, how does Mollom work? For the end user, what seems to happen, at first, is nothing unusual. They simply submit a comment as normal. But before publication, the comment is sent to Mollom's servers, who compare it against a bunch of arcane things, such as what kind of text is there, are there suspicious links, does it come from a known spamming IP? If it fails any of these tests, back it comes to the user, with a Captcha challenge, similar to the following:

Then the user (or intelligent program) tries to figure out what those letters and numbers say, and enter them, a little Turing test to weed out the humans from the robots.
Since (for now) Mollom's letting a few more slip through (it used to be about 4-5 a week, and now it's about 25-50 a night, then up to about 100 over the past two nights), I decided to add a new line of defense.
Hashcash to the Rescue (I hope)!
I remembered a demo of WebHashcash written by David Schneider-Joseph, a former student of mine at a Sudbury School (though I can take no credit for his mad computer skills, and he can also play a mean game of Go). This, in turn was based off the Hashcash algorithm by Adam Back, which was developed originally to fight e-mail spam.
The Hashcash algorithm, when used on the Web, is a JavaScript function that's run to fill a hidden text field, to be validated after submitting the form. The catch is that it can take a few seconds for most computers to compute the challenge (which might look something like '1:20:090723010931:example_hash::5OSqavyzeco:2z2O' in the end). The end user won't even notice it. Sadly, neither will some poor chap in Pakistan.
However, the automated spam servers will notice it. They'll be sending out a million pieces of spam, and suddenly, *blip* the computer slows down for a bit. And that's assuming it even processes JavaScript. (The routine intentionally does not degrade gracefully.) Enough sites use it, the CPU cycles end up costing more than the return, and they're forced to remove the sites from their lists.
I thought tonight that I would write a Drupal module for the routine. Luckily for me, I learned that Simon Rycroft (sdrycroft) had long ago beaten me to the punch.
So now I've added the Hashcash module to this site as a second line of defense against spam. I'll let you know how it goes!
Please note that this post is not meant as a criticism of Mollom! They provide an invaluable service, which I also intend to continue using. Even though 25+ spam comments a day is annoying, they're still blocking sometimes a thousand more a day. Thank you, Mollom!
Comments
Might also want to check out
Might also want to check out the Spamicide Module. I was a bit skeptical about it but it catches over 99% of spam on our sites.
- Sean Bannister
Same as Drupal.geek.nz
This blog post describes the same as what I have experienced on http://drupal.geek.nz over the last year. Please keep us posted with progress.
Do you find that the manual spam comments that get through always either:
1. contain only links to invalid one-letter domains, such as www.m, www.t
1. contain only links to captcha websites or pages such as recaptcha.org and drupal.org captcha modules.
1. contain pornographic text and/or links
That sums it up
Yes, yes, and yes! Good to know there's a pattern to that.
Same here
I have seen the same on several sites I run. It is getting tedious over the last 3 or so weeks.
Links to one letter domains without a top level domain.
The names are either first and last name combinations, or just one name.
The title sometimes is very different from the text, with sometimes tech buzzwords (e.g. MySQL, Sun, Oracle, ...etc.)
There are no links inside the body of the comment.
I brought it to the attention of Dries and they are working on it.
Update from David
He writes:
"You might wish to know, however, that wp-hashcash (and presumably the Drupal plug-in based on it) has very little to do with hashcash in the sense of Adam Back's invention or WebHashcash, which is based on that. wp-hashcash merely ensures that the client has a JavaScript interpreter, which is very common for spambots these days... It does not, however, guarantee that the client has expended any sizable amount of CPU effort.
"Why hashcash.org links to wp-hashcash, or why wp-hashcash's creator's surname is also "Back", I have no idea. But it's not hashcash, and I think it somewhat misleading that they call it that."
I'll have to look into that.
So far, so good...
Mollom blocked 1200 of 1200 attempts today... Possibly because of the new challenge...
D5 or D6?
Are you running this site on D5 or D6? I could never get Mollom to function with D6 sites (tested ~4 months ago) and resorted to just using captcha. Seems the math questions are getting blown out of the water by spammers and i'm left with deleting many comments.
This site has always been d6.
This site has always been d6. I had no problem getting it installed and updated over the past year. Not sure what to say; perhaps you could dig through Mollom's issues or give them a ring?
So let me get this straight...
You're basically trying to annoy the spammers into leaving you alone by making spamming YOUR site too costly in server cycles? That's crazy enough it just might work! Keep us updated.
Eclipse
Thanks for the tip off about
Thanks for the tip off about this module. I wasn't even aware of Hashcash. 1,000 pieces of spam a day... wow... that's a lot.
Post new comment