Blog Spam - annoying junk or a source of intelligence?

Published: 2013-07-18
Last Updated: 2013-07-18 04:30:23 UTC
by Chris Mohan (Version: 1)
5 comment(s)
Can blog spam be of any real use to security teams? Here’s my take on turning a piece of what some consider internet background noise in to information ripe to becoming actionable intelligence.
I get waves of blog spam – comments that posted to a blog site advertising someone else’s wares (including links to malware!), services or attempts to increase search engine rankings – to my small corner of the internet at infrequent cycles. To many of my fellow blog owners this is a source of constant annoyance, but for me I get a little, gleeful smile and promptly dump the user agent [1], body text (extracting any embedded URLs), and posting IP address in to my pile of “all things to observe and search on”.  
Once carefully added in to my speed optimized database*, I then sort it, note the duplicate posts and do some passive look-ups on free resources, such as the Internet Storm Center (ISC) IP lookup tables [2], to see if it’s a known or reported as malicious/bad. I then pipe the IP address, domains and URLs into a local copy of Collective Intelligence Framework (CIF), regardless if the passive searching didn’t yield any information, to see if anyone else has run into it. For those unfamiliar with CIF, fellow Handler, Russ McRee, did a nice write up on the basics [3] on the Collective Intelligence Framework (CIF) by Wes Young [4]. CIF pools data from numerous sources and can quickly help identify if any of the collected data points to botnets, infected systems, malware hosts, etc. All of which is an huge informational leap up from an annoying automated posting with an IP address and URL.  
With the results from those searches completed, I can then compare those results back to historical data or logs from other sources (firewalls, proxy logs or spam filters [5]). All of this is automated via some ‘internet researched’ code - poorly shunted together by yours truly. After any matches and final results are spat out, it allows me to then make decisions whether to add the IP address, net block, user agent or URL to a block or monitor list. I’m not a fan of trusting my scripts or intelligence feeds to be completely accurate for automatic blocking IP ranges, but don’t worry so much on pushing alerted URLs in to the Suspicious category on web proxy system. I’ve found my human web surfing anomaly detection systems are really good at ring up and moaning if we, that’s the Royal We (meaning me), accidently blocks Google.
If you want to go to visually to town with the data, pop the resolved spamming IP addresses in to a geo-IP, the ISC has a page to help with that [6] and show friends, family or the management where the bad IP addresses live. Who says the whole family can't enjoy an evening of PowerPoint together, listing the towns, cities and countries that spam your blog sites. Surely that beats watching re-runs of some random TV show?
All this possible intelligence from humble blog spam, so what could you do with that data?
As ever, feel free to pitch in any thoughts or comments.
*It’s not a text file – well, not any more …


Chris Mohan --- Internet Storm Center Handler on Duty

5 comment(s)


This often seems to come from (supposed) VPN services, along with a lot of other malicious activity. As the netblocks are so numerous, fragmented and not always documented by rwhois, I wonder if it's acceptable to block whole providers for taking on this sort of customer.
I often block whole netblocks when it's an outside the US hosting company or ISP. Internal the US is harder for obvious reasons, so I have to resort to only blocking the IP.
Is it OK to block an entire country? Consider these statements from my quotes file:

"I wish the Chinese would use their country-wide firewall to block all outbound traffic on port 25." -- Shawn McMahon

"Almost all spam originates from China. - I've concluded that China has declared war on the rest of the world beyond it's wall, and hasn't bothered mentioning the fact to anyone outside." -- Robert L. Ziegler, author of Linux Firewall
I recently setup DNS filtering at home and at work using the new RPZ feature in named (RPZ = response policy zones). The spamvertised URLs in your blog spam and the IPs they point to would be a great source of data to drop into an RPZ. I'm always on the lookout for more sources of known-bad domains, hostnames, IPs, etc.
Actually, I've taken a similar approach these past 14 years. SPAM of various forms provides incredibly valuable information as to how to tune block lists, content filters and spot evolving threats. Almost fifteen years of analysis has allowed me to construct some rather robust denial lists as well as content inspection rules. I'm happy to say the efficacy of these lists is somewhat impressive. Much of the same logic expressed by Chris was employed in compiling the lists and rules.

Diary Archives