Threat Level: green Handler on Duty: Renato Marinho

SANS ISC: Search engines that are no search engines - SANS Internet Storm Center SANS ISC InfoSec Forums


Sign Up for Free!   Forgot Password?
Log In or Sign Up for Free!
Search engines that are no search engines
The DShield database was running a bit "hot" earlier today, so I took a closer look at the web log and found that one particular "search engine" was indexing the site rather aggressively:

a.b.c.d - - [09/Nov/2007:15:24:35 +0000] "GET /portreportascii.html?date=2007-11-09 HTTP/1.0" 200 500572 "-" "gsa-crawler (Enterprise; S5-FTNF3BWZPUJAS; nobody@google.com)" "-"

At first, I thought "oh well, its google". But looking at the user agent string closer, reveals some subtle differences. This is a Google search appliance, not the uber-google-bot we all love. The regular Google bot looks like this:

66.249.65.233 - - [09/Nov/2007:15:24:37 +0000] "GET /date.html?port=47109&date=2007-10-25 HTTP/1.1" 200 7538 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "-"

I have seen similar cases a few times now. While this one was not malicious, in some cases attacks used google's (or other search engine) user agent strings. I can only assume that this is an attempt to fit in better, and maybe retrieve a search engine version of the page. If anybody knows a good reference where to find IP address ranges used by certain search engines: let us know.

(and btw... if you need bulk data access to dshield data: Please ask. Spidering the site is just not very efficient and you will run into some anti-harvesting traps sending you in circles)

-----
Johannes B. Ullrich
Chief Research Officer, SANS Technology Institute
I will be teaching next: Defending Web Applications Security Essentials - SANS Brussels September 2019

Johannes

3605 Posts
ISC Handler

Sign Up for Free or Log In to start participating in the conversation!