Last year, over the Thanksgiving break, Justin Henderson and I worked on a tool to provide a web API interface for another tool I released last year called freq.py. freq.py is used to identify randomized DNS names used by malware. Providing a web API would allow a Security Information and Event Management (SEIM) system to automatically score character frequency data from a variety of log sources. So freq_server.py was born. Since then it has been great to see different organizations using it and finding malware in their domain. This Thanksgiving Justin contacted me again with a new project and I was immediately intrigued.
This time Justin wanted to automatically score phishing domains based upon the "born on" date of the domain registration. Justin told me about some really great techniques that organizations can use to identify potential threats if you could query the data "at the speed of SEIM". The difficulty is in processing the huge number of records that a SEIM collects without rudely overwhelming the whois servers. The system has to be quick and we need to cache the data for frequent domains. Python makes building these types of interfaces quick and easy. Justin told me what he wanted and a few hours later I sent him working prototype. The tool is available for download and integration into your SEIM now. Download a copy here. Check out SANS SEC573 to learn how to quickly develop programs like this on your own. There are two opportunities to take the Python course from me in the near future. Come see me in London or come see me in Orlando Florida at SANS 2017
So what can you do this new tool? Well, it was Justin's idea, so I'll let him tell you. Take it away Justin!
Thanks Mark. We live in a day were data is everywhere. The security community is constantly stepping up in new ways to defend and protect this data. They are constantly inventing new techniques, tools, and processes. Yet some good techniques have been lost due to lack of publication or an inability to easily apply the technique at scale.
One such technique is using WHOIS information and specifically creation dates. Many who came before me have made mention that companies typically do not perform business with new domains. I dub these “baby domains” as they are new born and not mature. Normally businesses access websites that are well over 90 days in age. For example, the website sans.org was created over twenty years ago. Where defenders will regularly see baby domains is with phishing domains.
My name is Justin Henderson and I am the author of an up and coming SANS course namely, SEC555: SIEM with Tactical Analytics. One of the main concepts behind the course is using free or commercial SIEM solutions to operationalize techniques previously considered difficult or impossible. One such technique is gathering the creation date of domains for analysis at scale and without tanking performance. While attempting to get this working at scale I reached out to Mark Baggett. Thanks to his help and ingenuity, the security community now has domain_stats.py.
The script domain_stats.py solves two key problems: how to gather WHOIS information for massive amounts of requests without tanking performance and provides a method for whitelisting domains. The main reason for the scripts creation is to perform a WHOIS call against a domain name. This is easy to do.
The old school method was to invoke WHOIS using things like whois or pwhois.
Running a simple whois kicks back lots of information included a creation date. The problem: performance. Each run of whois varies greatly. In fact, in my testing it would vary between about 0.50 seconds and 10 seconds which if ran against millions or billions of domains would not be able to keep up. Then there is Mark Baggett’s domain_stats.py script.
First we start it by running:
/usr/bin/python /opt/domain_stats/domain_stats.py -a /opt/domain_stats/top-1m.csv 20001
Then a whois call can be made to look for a creation date by using this command:
In this case since /domain/creation_date specifies to only return the creation date of the given domain. The major value Mark has added is that the result is cached. For example, the first run per domain will take the same amount of time a standard whois lookup takes. However, subsequent calls for sans.org take approximately 0.004 to 0.006 seconds. Speed is the enabler here.
However, the second component Mark built into domain_stats.py further speeds up the process as well as implements an additional technique that can be applied. This functionality was originally built to be used with the Alexa top 1 million sites even though free support for the top 1 million has been discontinued. As a result, it still references /alexa. What it does is allows a domain to be queried to see if it matches a list. If it matches it references the number associated with the domain.
Assuming the script is running from the previous command this functionality can be called using this command:
The number returned if using an older version of the Alexa top 1 million is the sites rank. Just remember that instead of using an old Alexa file you can also generate a custom list of most frequently accessed sites or known good sites or even known bad sites. This can then be used to tag domains in order to skip certain checks that are more time consuming such as WHOIS lookups or perform other tasks/logic.
So how would you put this all together? Where I regularly use domain_stats.py is by invoking it from SIEM products. For example, I take incoming DNS logs and use Logstash, a log aggregator and parser, to query domain_stats.py. If a DNS log matches certain query types such as an A record or MX record, then Logstash applies the following logic:
Step 1 - Does the query match an internal domain name? If yes, no additional processing required. If no, move to step # 2.
Step 2 – Pass the domain to domain_stats.py using /alexa. If a result is returned the domain matches a whitelist and no additional processing is required. If 0 is returned it is not a well-known domain. Move to step # 3.
Step 3 – Pass the domain to domain_stats.py using /domain/creation_date. Store the creation date for manual analysis. (I then use a dashboard/report to display baby domains that are less than 90 days old)
The ultimate deliverable is a list of baby domains being accessed and knowing which systems are possibly engaging them. This could provide early detection of emails coming in from phishing domains or end users accessing phishing websites. When combined with other techniques such as fuzzy phishing (https://www.hasecuritysolutions.com/blog/fuzzy-phishing) catching phishing domains becomes much more likely and quite frankly a lot more fun.
A big shot out to Mark Baggett for his awesome Python scripting skills that enabled this. While written in Python it has proven through repetitive testing to outperform other solutions. I encourage everyone to consider using domain_stats.py for WHOIS queries (such as for creation dates) or its whitelisting capabilities.
Follow Justin Henderseon @securitymapper
Follow Mark Baggett @MarkBaggett
Jan 17th 2017
9 months ago