How Do I Report Malicious Websites? Take 2
A Diary Entry that “Writes Itself”
On my last shift, a reader asked: “How do I report Malicious Websites?” (http://isc.sans.org/diary.html?storyid=8719) I provided three ways one could report malicious URLs, IP addresses or hosts and requested your comments. There were a lot of suggestions, so I wanted to do a quick round up on this shift.
Unfortunately it Became Complex.
There was a long list of sites where you could submit a URL to a particular product, some that focused on particular service-providers, others that focused on certain types of malware (e.g. Zeus) or crime (e.g. phishing.)
There was no simple one-stop-shop for the end customer to use. Some browsers and ad-ons gives something resembling that functionality, but it too is still limited to protecting the users of that tool.
Upon reflexion, I realize why a one-stop-shop doesn't exist. A single collection and repository of information is not the correct model. It wouldn't scale, it wouldn't be resilient, and it would be expensive. What I suggest is a framework for exchanging this information.
A Diversity of Clients
The ultimate client is the end-user. We all know how uniquely diverse this population is, especially with respect to their technical skills, and security-awareness. This requires a diversity of solutions to serve this population: browser ad-ins, client software, proxy-servers, specialized DNS clients, etc.
A Diversity of Sources
The intelligence comes from a similarly diverse collection of sources, end-users, help-desk technicians, incident-handlers, malware-researchers, etc. The accuracy and reliability of this information is similarly diverse; I'm stealing from the old saying: Timely, Accurate, Cheap-- pick two.
Consumers Define the Requirements
I consume a lot of malware-related IP addresses, domains and URL each day. This information comes in from a lot of sources: mailing lists, blogs, sandbox analysis reports, online repositories, etc. My focus is on protecting my users, so I look at this information in a certain light. For most users, a simple bad vs. good determination is good enough. I use the following classifications:
- Suspicious – this is the state that all reports start off with, it looks a little better than “Unknown.”
- Exploit Site-- this is for links to exploit kits or sites that launch attacks
- Download – for URLs where downloaders or exploit-sites pull secondary payloads
- Phone-Home/Command-and-Control-- this is for tracking the requests made by malware after it's installed.
- Redirect/Compromised Site-- some systems get owned and get included in the long lists of intelligence that circulate
These classifications are important when an analyst is looking through alerts generated from this watchlist. For example, if a user hits what is classified as a Redirect/Compromised site, but the Exploit Site is blocked by the proxies, you don't have an incident, on the other hand, if you have a system that is consistently probing out to a Phone-Home site that is blocked by your proxies then you do have an incident.
For my purposes, the redirect/compromised site list is low priority. Now, if I were a hosting provider, that list would be of greater importance, but only if the entries were in my network. It is for precisely this reason why I avoid having a “risk” or “severity” rating associated with these entries.
What should it record? How should the records be organized? In my database I track based on individual IP or domain, because it's easy to search proxy and firewall logs via hostname, or IP address. I link the more verbose URL to the domain. In the framework that I propose, URLs would be classified as Suspicious, Exploit, Downloader, etc. while IP addresses, hostnames, and domain names would be their own records that link to these URLs.
For example, consider this fictitious exploit URL: hxxp://abcd.efghijkl.ab/invoice.pdf. In our data-set we could classify this URL was and Exploit URL. If we had better analysis we could tack on a sub-classification of the particular CVE that this exploit leverages. The URL would then link to the hostname of abcd.efghijkl.ab, the domain of efghijkl.ab, and at the time of the report abcd.efghijkl.ab resolved to 3 IP addresses 1.2.3.4, 1.2.3.7, and 8.5.6.4. and these may further link to a particular ASN.
Belief and Feedback
Just like in the IDS and AV worlds, this information has it's fair share of false-positives. This comes in mostly from automated sources-- simply because they don't know better. For example, a bot-client might reach out to myip.ru while another may make a google-search using a direct IP address call. Another pain-point is how advertisers redirect requests, examining the network trace of a web-exploit can sometimes lead an analyst down the rabbit-hole of researching the complexities of one of Doubleclick's competitors.
For this reason the framework would have to support multiple reports per URL, and cluster the URLs to account for unique elements in the URL. Additionally reports would have to identify their sources so consumers could rate sources, or filter out unwanted sources.
Why a Framework and Not a Centralized Repository?
Although the aim is interoperability, I understand that not everyone wants to share everything with everyone, so I imagine this resembling a number of diverse feeds that are consumed and transformed by vendors and end-users. Some services may evolve that correlate and fact-check a large number of feeds to provide a stable and reliable source of good versus bad decisions for end-users, while other vendors may pick and choose their sources to craft a unique solution for their market. Enclaves of researchers would form their own webs of trust via the feeds that they subscribe-to and self-produce.
I'm going to noodle a bit more on this, I welcome your feedback.
Comments
Montey
May 28th 2010
1 decade ago
KL
May 28th 2010
1 decade ago
Jayakumar
May 28th 2010
1 decade ago
KL
May 28th 2010
1 decade ago
This would additionally trigger each vendor to have a look at the URL and (re) categorize it.
i don't see false positives on 'this is a bad website" as anywhere near as threatening to the community as in an antivirus tool. You won't bluescreen your computer with a false positive for a malicious link.
On the flip side, there does need to be accountability. The spam blacklists, run by volunteers and often completely unaccountable, contain examples of an all-volunteer solutions that are not helpful, as well as examples that are.
If you wind up getting a feed that is run by folks who're feeling their oats, you may find that you've dropped all email from Yahoo.com because your feed maintainers are upset with a config change in the Yahoo email lists. (Yep, that really happened.)
You may find that it's difficult/impossible to get yourself removed, or that some of the anti-spam folks hate your upstream provider and have decided to flag not just your mailserver (as happens regularly now) but also to flag port 80.
I would like to see some real accountability behind some of these projects, and some real integration. It's frustrating to me that search poisoning is picked up so differently by different folks, for instance.
A virustotal-like interface, perhaps maintained by SANS or a CERT, which permits users to auto-query a large set of databases and which can also be configured to feed data back to a Firefox plugin would be great. End users could choose which feed(s) to accept, but would not need to know where to look for them.
peter
May 28th 2010
1 decade ago
Maxim
Jun 1st 2010
1 decade ago