Assessing websites for malicious content

Published: 2007-03-19
Last Updated: 2007-03-19 19:56:13 UTC
by Maarten Van Horenbeeck (Version: 4)
0 comment(s)

If you are responsible for information security within your organization, part of your job content probably consists of reviewing sites that have been submitted by your users. Perhaps in a response to a request to allow access to the site through a URL filtering application, to review whether a site may have contributed to a virus outbreak or to review your own corporate website. Last week, Michael wrote in wondering what the best approach is to do these types of review.

Fact is that you can make this process as heavy as you wish, depending on the importance of the site to your organization and the risk/impact of allowing access to it. Each review however should start with an off-line component, that is without accessing the site itself, followed by an on-line component, which entails a connection. The first is important to assess what the site could logically be expected to do, the latter to see how it makes your end user's systems behave.

As part of an off-line review:

  • Verify the whois information. The corporate website of a bank is not often registered yesterday. A US bank is also not often registered by a contact in Nigeria. Some of the information can be checked for validity, such as zip codes and telephone numbers.
  • There are on-line tools that can help you assess the site. McAfee's SiteAdvisor for example allows you to submit sites for review. If the site has already been submitted before, it gives you information on downloadable executables, a history of spam after registering on the site, as well as information on outbound links and usage of cookies. Ville wrote in with the Exploit Prevention Labs linkscanner, which does automated analysis of a URL.
  • Web blocklists such as malware.com.br provide realtime HTTP blocklists. You can download the existing blocklist and match domains with it, or submit a new URL and have it tested.

If these do not turn up anything unusual, it's time to make a connection to the site:

  • In order to have the most objective view of what the site is doing, I download the site using the common wget tool. Use of the the -p option will download all files that are necessary for a browser to interpret it (such as inline images and stylesheets) that can then be scanned manually for malicious script tags as well as through any AV solution. Smart use of -r and -l  will make the tool download the site to the depth that you require it to be analyzed. Keep in mind that these may pose undue stress on the web server, especially if pages are dynamically generated, so be gentle. A good alternative is curl. As some sites base their response on the type of browser connecting, you can imitate specific ones by using the --header option.
    When you're dealing with new malware, an AV solution that has good heuristics detection can prove valuable. Back in the '90s, one specific solution, no longer on the market, reported in great detail on what an executable was up to - did it become memory resident (the good old DOS TSR code), did it scan the disk for other executables, did the extension not match with the code. Something to that degree is difficult to find, but excellent for this purpose.
  • Connecting through a proxy can pre-empt execution of the more obvious threats and help in identifying malicious or potentially dangerous links. Two weeks ago we covered SpyBye, a proxy tool specifically written for this purpose.
  • If this does not turn up anything malicious, I generally use a virtual machine such as VMWare, with a browser installation very similar to that on the corporate desktops. In addition, the box is running at least regshot, filemon and tcpview to assess for any strange activity taking place upon connecting to the site. I also run a sniffer to see whether any strange traffic is originating from or being generated towards the virtual machine. Additional toolbars such as the Firefox web developer toolbar allow you to see much more information than you usually would (CSS, ...)

Especially in the case of targeted malware, using a sacrificial lamb machine on dialup, using even different DNS servers, might be wortwhile. However, if your organisation is the only one targeted, it might also reveal you are investigating the matter. This is an obvious tradeoff. Also note that even simple tools you use during the investigation can have vulnerabilities, such as wget.

This overview merely covers basically how to assess a site. It doesn't go into detail on assessing the actual malicious code, should you find any. Other diary entries have covered this in more depth here and here.

No doubt you have many other great ideas on how to approach this issue. Do you know of good browser plugins, proxies, websites and other tools fit for this purpose ? Let us know.

v2: Thanks to Swa for his feedback and suggestions;
v3: Thanks to Ville for suggesting the LinkScanner and Dr Web Link checker;
v4: Chris wrote in suggesting the use of the Microsoft Fiddlertool which allows you to set breakpoints in HTTP requests and responses.

Keywords:
0 comment(s)

Comments


Diary Archives