All of your pages are belonging to us

Published: 2010-11-18. Last Updated: 2010-11-18 21:53:16 UTC
by Chris Carboni (Version: 1)
18 comment(s)

We received a report of a very aggressive web spider that apparently is not obeying robots.txt.

The report claims the spider is from http://www.80legs.com/webcrawler.html

Here are a few interesting tidbits from that site.

"008 runs on a grid computing platform that consists of several thousand computers, which is why you may see our web crawler access your site from many different IP addresses."

"If you block 008 using robots.txt, you will see crawl requests die down gradually, rather than immediately. This happens because of our distributed architecture. Our computers only periodically receive robots.txt information for domains they are crawling."

And my personal favorite ...

"Blocking our web crawler by IP address will not work. Due to the distributed nature of our infrastructure, we have thousands of constantly changing IP addresses. We strongly recommend you don't try to block our web crawler by IP address, as you'll most likely spend several hours of futile effort and be in a very bad mood at the end of it."

Several thousand computers?  Sounds like a recipe for a DDoS attack if I ever saw one and I don't even want to think about what could happen if that site got 0wn3d.

Has anyone else seen this?  Let us know.

Christopher Carboni - Handler On Duty

18 comment(s)

Comments

I'm not sure that technical deficiency has ever been a valid excuse for not following legal requirements / standards / ethical practices...
Two servers, all of my domains run RavenNuke with NukeSentinel and one feature is the ability to deny access by UA or partial UA. 80Legs is "legless" on my servers.
Grid computing platform? Perhaps even Tor nodes or open proxies that were simply hijacked for this purpose.

In any case, it might be a good idea to benchmark your web applications (eg. ApacheBench), put a reverse proxy in place (nginx, squid, varnish), and/or tune your HTTP servers (Apache MPM type and settings, FastCGI, APC accelerator module for PHP) to be able to cope with this sort of punishment. Then you'll be quite well-positioned if you're hit by a more deliberate DDoS, some new threat that targets HTTP, or the Slashdot/Digg effect or whatever name it goes by these days.
Looks like a service hosted by the following subscription-based distributed computing company: http://www.pluraprocessing.com/technology.html
be real careful. if you're caught installing any security to prevent it, it may cause you're system to crash. Also, check all your storage devices for disk size to see if you have any hidden sectors. If so, you're infected
@victim: Would you elaborate, please?
We tracked down a similar scuzbag Bot operator, based out of the United States, who was using quite a number of different hosting companies in China and a few other countries. They had quite a spectrum of IP ranges to hide behind... but ultimately the greediness of their Bot made them easy to spot and blog. The owner of the company and most of the principles in the company were from China ( although living and operating their scuzzy company from offices in the USofA.)

The only thing we never did figure out was whether they were actually payiing for all the various IP hosting packages throughout Chinese domains or whether they had simply hijacked a huge number of web sites hosted in China and were running zombies through those compromised web sites in order to create their 'grid computing' network.

What annoys me the most is that China takes such great pride in the strength and capabilities of their 'Great Firewall' encompassing the Chinese citizens... and yet their 'Great Firewall' seems totally useless or powerless to stop abusive or malware traffic from within their country that is outgoing to other countries. It leaves you to wonder if their powerlessness at stopping such traffic is intentional or simply a reflection of how lame their 'Great Firewall' is at stopping such Internet traffic
He won't...victim is, well, how do I say this...probably part of the syndicate behind the subject of this diary post. I bet Dr. J is already digging into victim's IP ^_^
I meant to say that " their Bots greediness made them easy to spot and BLOCK " - not 'blog' ;)
I meant to say that " their Bots greediness made them easy to spot and BLOCK " - not 'blog' ;)

Diary Archives