Announcing: The "404 Project"

Published: 2011-07-28
Last Updated: 2011-07-28 13:53:07 UTC
by Johannes Ullrich (Version: 1)
18 comment(s)

We all know that web applications are the new firewall. However, so far we had a hard time collecting web application logs. The hard part is to balance ease of install of a sensor (without disrupting the web application), fidelity of the log information and privacy.

With firewall logs, it is pretty simple. A rejected packet in a firewall has very little information and privacy isn't a big issue. Web application are different as the actual "meat" of the log event is in the request content, which may contain personal information. Parsing web logs isn't so easy either. Administrators frequently customize log formats for special purposes.

To balance these different issues we decided to focus on errors, but instead of parsing logs, we set up a little php script that you can add to your error page. In its current form, the script will work with PHP web servers (tested with Apache) that support the curl extension. Curl is installed by default in current versions of PHP.

Now all you need is an "error page". In Apache, just use the ErrorDocument configuration directive. For example:

ErrorDocument 404 /error.html

Will redirect users to "/error.html" in case of a 404 error [1].  You may already have a page like that configured. All you need to do is add the php snippet to the end, sending us the intended URL, the user agent and the IP address of the client access the missing page.

The hope is to collect data from automated probes, similar in how DShield's firewall logs reflect portscan activity.

In particular if you are running a personal / home web server: Please consider adding the collector script.

Once we get a few submitters, we will start adding continuously updated reports to the site, just like we do for the DShield data. However, we can't do this until we have at least a dozen submitters (better 100 or more) . We can not publish "one off" errors as they will likely be specific to your site and again could cause privacy issues.

Why do we only support PHP? Well, that's the language I know. Feel free to submit a .Net/Java/Ruby/Perl or whatever version of the script.

Simple steps to sign up:

  1. Login to retrieve your authentication key here
  2. Download the php snippet here
  3. paste it into your Error Document
  4. test...

Please contact us if you have any questions.


Johannes B. Ullrich, Ph.D.
SANS Technology Institute

18 comment(s)


I have a better idea! If you are using a home server to run PHP based web access, TURN OFF PHP!

PHP is very dangerous if you do not configure it correctly, and if you do not understand EXACTLY what your applications are capable of doing!

A home user??? PHP??? What were you thinking!

Sorry but that is just crazy. Please change your article to warn would-be home server people of the dangers. For professionals, yes fine, but I for one will not run PHP do to it's hack history.

PHP does not even exist on my servers. I like it that way! I take the time to code in alternate languages. It is worth the time to learn, and the extra time to code.

PHP is also the language choice of hackers. Probably for two reasons..

1 - It is easy to learn.

2 - It is easy for the server administrator or the web application developer to make critical mistakes that can open the door to your data or your entire network.

Write that RUBY, PERL or APACHE Module in C now :-)
C ... a language that brought us fun features like buffer overflows and format string errors ;-)

If you don't need php, turn it off like any feature you don't need. But having a home web server to experiment is perfectly fine. Manage it well, monitor it, and make your mistakes with it before you start coding a real site with real customer data. started out as an experiment like that. Just having it hosted in a "real datacenter" didn't make it any more secure.
As far as C goes, Microsoft was guilty of sloppy coding. When you compile a program you get warnings. You can set the severity of warnings to extreme and produce clean compiled code with some effort, or you can ignore the errors and release the project anyway as Microsoft and many others did. There is no excuse for releasing "features" like they did :-) Today we still are cleaning up those "features".

In the average home environment (there are many exceptions of course) you do not usually have all the technology you have in a data center to control flow, monitor traffic patterns, limit applications by their signature, etc. It is a risk, but true, the data the "visitor" gets will most likely be worthless to them.

The problem is when a machine gets owned. It can become a menace, like those IP addresses in China that attack regularly. Chances are the owners don't even know it is happening and are just pawns in many cases. Not all cases of course, but many.

So, yes I agree that it can be beneficial to test in a non-critical environment, but you also miss out on all of the great tools data center engineers have in place to control events.
On the positive side, well over 90% of the alerts from our web app firewall are for scans and attacks against PHP code. So you should get a lot of data. :-)

If we were running PHP I would already have recommended we re-write everything based just on the web app firewall data.

Although we whitelist everything to minimize our attack surface, we did make an exception and blacklisted requests looking for .php files and set an automatic IP Block on the source regardless of what the request contained. It's that prevalent.
Well, can anybody offer up versions of the script in any other languages?

I know how to write perl script but not perl script to run on a webserver and not perl that I'd want to be probed by everybody and their uncle from the Internet.

I looked at the PHP and have about half of an idea on how to convert it to Perl. It would require libcurl and at least two modules WWW::Curl and MIME::Base64, but that's as far as I got.

I don't know how to pull the variables out of Apache to get User Agent, Redirect URL, or Remote URL.

Also, I'm not sure if perl has built-ins that would be better suited than calling out to libcurl.
I run php only because there is an application that is open sourced that we need. I do not write in php. All my web stuff is in perl.

Nonetheless, mt apache web server writes a log file to /var/log/apache2/errors that can be post-processed to generate a report, similar to the dshield firewall logs stuff. Why do you need a script at all to run when the web server runs? Just batch process the logs in a cron job and ship them off to isc.
PERL-101 ...

Create a list of valid environment variables. These are the ones you will want.


..then filter upon them to make sure nothing else gets by.

Example of how to use the variables..

$host = $ENV{'REMOTE_ADDR'};
BTW the variables are from APACHE, not PERL. PERL is just the interface to them. You can get them all at They would work the same in PHP by the way.
"Al of Your Data Center" you need to chill. Who makes you the authority on what Home users do with their Internet?

And while you are a book on Netiquette. All the unnecessary capitalization and punctuation marks are how 9 year olds type.
Too funny! Glad you enjoyed reading my posts. I always find it interesting when someone takes a post I make to heart. I don't know why you would be defensive here, but no matter. Flames are fun too. They expose human vulnerabilities, which are just like those on the Internet... open to exploitation :-)

Diary Archives