Threat Level: green Handler on Duty: Johannes Ullrich

SANS ISC: All of your pages are belonging to us - SANS Internet Storm Center SANS ISC InfoSec Forums


Sign Up for Free!   Forgot Password?
Log In or Sign Up for Free!
All of your pages are belonging to us

We received a report of a very aggressive web spider that apparently is not obeying robots.txt.

The report claims the spider is from http://www.80legs.com/webcrawler.html

Here are a few interesting tidbits from that site.

"008 runs on a grid computing platform that consists of several thousand computers, which is why you may see our web crawler access your site from many different IP addresses."

"If you block 008 using robots.txt, you will see crawl requests die down gradually, rather than immediately. This happens because of our distributed architecture. Our computers only periodically receive robots.txt information for domains they are crawling."

And my personal favorite ...

"Blocking our web crawler by IP address will not work. Due to the distributed nature of our infrastructure, we have thousands of constantly changing IP addresses. We strongly recommend you don't try to block our web crawler by IP address, as you'll most likely spend several hours of futile effort and be in a very bad mood at the end of it."

Several thousand computers?  Sounds like a recipe for a DDoS attack if I ever saw one and I don't even want to think about what could happen if that site got 0wn3d.

Has anyone else seen this?  Let us know.

Christopher Carboni - Handler On Duty

Chris

140 Posts
I'm not sure that technical deficiency has ever been a valid excuse for not following legal requirements / standards / ethical practices...
hacks4pancakes

48 Posts
Two servers, all of my domains run RavenNuke with NukeSentinel and one feature is the ability to deny access by UA or partial UA. 80Legs is "legless" on my servers.
hacks4pancakes
20 Posts
Grid computing platform? Perhaps even Tor nodes or open proxies that were simply hijacked for this purpose.

In any case, it might be a good idea to benchmark your web applications (eg. ApacheBench), put a reverse proxy in place (nginx, squid, varnish), and/or tune your HTTP servers (Apache MPM type and settings, FastCGI, APC accelerator module for PHP) to be able to cope with this sort of punishment. Then you'll be quite well-positioned if you're hit by a more deliberate DDoS, some new threat that targets HTTP, or the Slashdot/Digg effect or whatever name it goes by these days.
Steven C.

171 Posts
Looks like a service hosted by the following subscription-based distributed computing company: http://www.pluraprocessing.com/technology.html
Steven C.
2 Posts
be real careful. if you're caught installing any security to prevent it, it may cause you're system to crash. Also, check all your storage devices for disk size to see if you have any hidden sectors. If so, you're infected
Steven C.
1 Posts
@victim: Would you elaborate, please?
Steven C.
1 Posts
We tracked down a similar scuzbag Bot operator, based out of the United States, who was using quite a number of different hosting companies in China and a few other countries. They had quite a spectrum of IP ranges to hide behind... but ultimately the greediness of their Bot made them easy to spot and blog. The owner of the company and most of the principles in the company were from China ( although living and operating their scuzzy company from offices in the USofA.)

The only thing we never did figure out was whether they were actually payiing for all the various IP hosting packages throughout Chinese domains or whether they had simply hijacked a huge number of web sites hosted in China and were running zombies through those compromised web sites in order to create their 'grid computing' network.

What annoys me the most is that China takes such great pride in the strength and capabilities of their 'Great Firewall' encompassing the Chinese citizens... and yet their 'Great Firewall' seems totally useless or powerless to stop abusive or malware traffic from within their country that is outgoing to other countries. It leaves you to wonder if their powerlessness at stopping such traffic is intentional or simply a reflection of how lame their 'Great Firewall' is at stopping such Internet traffic
Steven C.
4 Posts
He won't...victim is, well, how do I say this...probably part of the syndicate behind the subject of this diary post. I bet Dr. J is already digging into victim's IP ^_^
HackDefendr

65 Posts
I meant to say that " their Bots greediness made them easy to spot and BLOCK " - not 'blog' ;)
HackDefendr
4 Posts
I meant to say that " their Bots greediness made them easy to spot and BLOCK " - not 'blog' ;)
HackDefendr
4 Posts
I meant to say that " their Bots greediness made them easy to spot and BLOCK " - not 'blog' ;)
HackDefendr
4 Posts
echo ... echo ... echo
HackDefendr

65 Posts
I80legs, yes they also have an API that allows you to create you own web spider very easily. I have seen it used by people with malicious intent and/or some RED teams. Just block them, I think legless already posted.. OUT

http://webcache.googleusercontent.com/search?q=cache:M7o8P7ugzwQJ:www.80legs.com/who-uses-80legs.html+80legs+api&cd=4&hl=en&ct=clnk&gl=us&client=safari

http://www.businessinsider.com/blackboard/80legs

http://www.infoq.com/news/2009/12/80legs-web-crawler
drStrangeP0rk

11 Posts
80 legs lets you set up completely customized web crawlers. With this service you can customize which websites and how many pages to crawl, what data to extract, and even choose specific file types to analyze. Taken from: http://www.creeris.com/portfolio.html
MGuirao

13 Posts
I just checked the server logs on my domains, I haven't seen any signs of 80legs coming to visit. Doesn't mean they won't soon, though.

Updating my robots.txt files now.
No Love.

37 Posts
I'm wondering if fail2ban has this listed as a bad spider, if it does my logs are going to go crazy banning all their IP's.... however it'll only take as long as it takes them, and they'll still be wasting their time.
djsmiley2k

5 Posts
Their "grid computing platform" consists of programs that run on normal web users computers apparently. They embed it in certain webpages as a java applet, in downloadable programs and flash games. As well as crawling webpages, they also use peoples CPU cycles to work out things like stock market trades...

For example, if you go to http://www.handdrawngames.com/ a Java applet loads and your computer joins their 'grid' network until you change sites and the applet terminates; just look for the "Plura" logo at the bottom of the screen - that's the iframe with the applet in.

An example of a downloadable program is http://www.superdonate.com/

so if you start blocking IP addresses, and don't unblock, you're basically blocking random dynamic IPs from normal ISPs.

So, it looks like a rather shadier version of BOINC.
Alex

19 Posts
Yeah, this beastie brought down one of our clients sites. Telling the hoster to block on user agent solved it.
Anonymous

Sign Up for Free or Log In to start participating in the conversation!