One thing I’ve noticed is that Blogger.com seems to have a pretty good handle on which of their own blogs are in fact splogs – spam blogs. As I mentioned in a previous post, an analysis of the ping traffic that comes into Weblogs.com’s ping server indicates that not only are popular ping servers process a lot of pings from splogs, the lion’s share of these splogs appear to come from Google’s Blogger.com. This makes perfect sense, as blogger.com is a well-known, free service that lends itself fairly well to automated blog creation and deployment.
If you go to www.blogger.com and click the “Next Blog” button, you’re taken to a random blog in the blogger.com family. At the top of each blog is this familiar control bar:
Click “Next Blog” a few times, and see what you get. If you click through enough blogs, you will probably navigate to a splog or two, but notice how few and far between they are, if you get any splogs at all. Clearly, there is some filtering going on here. Blogger is performing some analysis on their blogs and suppressing the blogs it believes to be splogs. Also, there’s the “Flag” button, which according to Google plays a role in identifying splogs as well, but at the rate at which splogs are currently being spawned at blogger.com, human flagging has to be just a small part of the equation.
This isn’t a complaint; having tried it out a bit, I’m impressed with the apparent efficiency of whatever heuristics Google is using to identify the splogs in their midst. Dave Winer and I had a conversation last week about what it would take to effectively (and quickly) identify splogs. It’s not an easy problem. It used to be that link analysis was a pretty strong indicator. In past months though, black hat SEO types have upped the ante. State-of-the-art splogs now mix in a healthy portion of links to legitimate sites, and now sport “imported” content for posts that makes them hard to easily identify as search engine spamming – posts aren’t just long lists of keywords anymore. Distinguishing real blogs from splogs is getting harder by the day, but apparently Google/Blogger are ahead of the curve, at least compared to everyone else.