All of your pages are belonging to us
We received a report of a very aggressive web spider that apparently is not obeying robots.txt.
The report claims the spider is from http://www.80legs.com/webcrawler.html
Here are a few interesting tidbits from that site.
"008 runs on a grid computing platform that consists of several thousand computers, which is why you may see our web crawler access your site from many different IP addresses."
"If you block 008 using robots.txt, you will see crawl requests die down gradually, rather than immediately. This happens because of our distributed architecture. Our computers only periodically receive robots.txt information for domains they are crawling."
And my personal favorite ...
"Blocking our web crawler by IP address will not work. Due to the distributed nature of our infrastructure, we have thousands of constantly changing IP addresses. We strongly recommend you don't try to block our web crawler by IP address, as you'll most likely spend several hours of futile effort and be in a very bad mood at the end of it."
Several thousand computers? Sounds like a recipe for a DDoS attack if I ever saw one and I don't even want to think about what could happen if that site got 0wn3d.
Has anyone else seen this? Let us know.
Christopher Carboni - Handler On Duty
Comments
Tisiphone
Nov 18th 2010
1 decade ago
OldDad
Nov 18th 2010
1 decade ago
In any case, it might be a good idea to benchmark your web applications (eg. ApacheBench), put a reverse proxy in place (nginx, squid, varnish), and/or tune your HTTP servers (Apache MPM type and settings, FastCGI, APC accelerator module for PHP) to be able to cope with this sort of punishment. Then you'll be quite well-positioned if you're hit by a more deliberate DDoS, some new threat that targets HTTP, or the Slashdot/Digg effect or whatever name it goes by these days.
Steven Chamberlain
Nov 18th 2010
1 decade ago
flyingkiwiguy
Nov 18th 2010
1 decade ago
victim
Nov 18th 2010
1 decade ago
RonCo
Nov 19th 2010
1 decade ago
The only thing we never did figure out was whether they were actually payiing for all the various IP hosting packages throughout Chinese domains or whether they had simply hijacked a huge number of web sites hosted in China and were running zombies through those compromised web sites in order to create their 'grid computing' network.
What annoys me the most is that China takes such great pride in the strength and capabilities of their 'Great Firewall' encompassing the Chinese citizens... and yet their 'Great Firewall' seems totally useless or powerless to stop abusive or malware traffic from within their country that is outgoing to other countries. It leaves you to wonder if their powerlessness at stopping such traffic is intentional or simply a reflection of how lame their 'Great Firewall' is at stopping such Internet traffic
Carbonie
Nov 19th 2010
1 decade ago
JeffS
Nov 19th 2010
1 decade ago
Carbonie
Nov 19th 2010
1 decade ago
Carbonie
Nov 19th 2010
1 decade ago