Spam Filtering in MovableType 3.2
People seemed to jump off the MovableType bandwagon when the comment spam started getting out of hand. Having stuck with MovableType myself, I’ve come to discover the new spam filtering power that is currently available.
Bundled with a standard install of MT are three plugins under the SpamLookup umbrella: Lookups, Link, and Keyword Filter. The settings can be configured for all blogs or tweaked for each blog individually (What’s that, you actually write a blog about Viagra and Casinos?).
For the various options, you can push comments to moderation or to junk them entirely. When junked, you can determine how long items should remain in the system before being deleted entirely. Junk settings can only be set on a per blog basis and are available under Settings > Feedback. You can set the threshold for which a comment is considered junk which I haven’t changed. I just modify the weighting on other factors which I’ll get to. By default, auto-deletion of junk is turned off. I’d recommend turning it on. No sense letting hundreds to thousands of junk comments collect in your database for nothing. Then specify the number of days you feel comfortable storing. I’d still recommend a few days because it’s always good to take a peek from time to time to see if any legitimate comments get caught (for me, that’s about 1 in a 1000).
At the core of all three plugins is a weighting system that allows you to give a higher or lower weight to various factors. If it’s below a threshold, it’s junk. If all looks good, it goes live. Beside various options is a link called “Adjust scoring” that when clicked will display a box allowing the weight to be increased or decreased. Scoring is only adjustable on junk settings. If you set a feature to moderate then it’ll moderate right away.
Lookups
Lookups allows you to specify IP and Domain Name lookup servers. Often comment spam pours in from addresses known to be used by spammers. Sometimes legitimate commenters have their IP address on the blacklists. In this case, other weightings should hopefully put them in the clear. You can also specify specific IP or Domain Names to appear on the whitelist. The default settings for this plugin have worked dandy for me.
Link
Link limits the number of links that people can have in their comments. Since spammers tend to be link happy, this manages to catch quite a few of them. Keep in mind that the URL field is also included in the count. That means that a URL plus two links in the comments will hit the moderation default of 3 links and throw the comment into moderation.
The Link Memory and Email Memory is a handy way of rewarding repeat commenters and saves you from having to constantly moderate comments. However, the default weight of 1 is a little low. I’ve bumped it up to 2 to minimize the need for moderation on what are likely legitimate users. If you don’t publish email addresses publicly (which I don’t believe you should) then you could even bump up the score on the email memory.
Keyword Filter
Keyword filtering is likely the one you’d need to update from time to time as you find yourself being inundated with certain comments. For example, I had been seeing an inordinate amount of Viagra spam of late. So I added the keyword to my junk list. What’s handy about this plugin is that you can use regular expressions or even specify a score for certain words.
Tweaking
It’s always good to check your junk comments from time to time to make sure that comments are inadvertently getting caught and if you find yourself with a large batch of spam getting through, it should be easy enough to tweak the settings to capture what you don’t want. The junk comments are accessible via the tab on the main comments screen.
Here’s to a life without spam!
Conversation
I think the biggest problem I've been facing with spam is when they use authentic email addresses (especially from people you know) with their junk content.
The biggest issue I've had lately is the useless spam comments with absolutely no links. Completely random stuff like "I've managed to save up roughly $88847 in my bank account, but I'm not sure if I should buy a house or not..."
The only reason I can see for these is they're trying to take advantage of the "email memory" feature you talked about Jonathan. Posting seemingly innocuous comments gets the email address accepted allowing for unmoderated spam later. Just a guess.
I wish there was some combination of the old MT-Blacklist and SpamLookup. Some sort of central home of a bunch of keywords and regular expressions that SpamLookup could check so that I don't have to maintain my list of keywords so carefully.
You could very well be right about the email memory. I noticed one day that this person left a comment that seemed half relevant. The second time it came in, I realized it was actually spam. I had to go back and delete both messages to make sure the email didn't have the sticky factor.
I try and keep a close eye on my comments so at least the really bad stuff doesn't show up as often.
The spamlookup function of movable type is a great function built in in since mt3.2. No spam comment or trackback will be showed in my blog. Some more keywords to junk and some IP adresses to block. thats o.k.
Wow I love your site
Great page I will be a return visitor!.
I am from Canada and also now teach English, please tell me whether I wrote the following sentence: "As late as hollywood studios made virtually all of their money from a single source the box office."
Waiting for a reply :p, Xuxa.
Keep your eyes wide open before marriage, half shut afterwards.