PCRE for malware audits

Published: 2014-11-13
Last Updated: 2014-11-13 00:12:26 UTC
by Daniel Wesemann (Version: 1)
4 comment(s)

When auditing a company for their malware defense savvy, you are likely used to be presented with colorful pie charts of all the malware that their Anti-Virus (AV) product of choice "successfully" intercepted. Odds are that your auditee can show statistics for the past five years, and related "trends" of doom and gloom.

The problem is, we aren't really interested in that. Counting what the AV caught is like counting the number of hits on the final "drop" rule of the Internet firewall: It shows scary numbers, but who cares, given that this is stuff that was STOPPED.  Way more interesting is stuff that managed to sneak by and was missed ... but how do we find it?

One approach that I like to use involves "Perl Compatible Regular Expressions" (PCRE).  You likely encountered PCREs before - Perl has one of the most versatile sets of regular expression language that can be used to match any text pattern imaginable. Snort, for example, supports PCREs in its rule language. Amazing Perl, of course, supports PCRE natively. And .. lo and behold, the lowly Unix "grep" command, on many Unix flavors, supports a "grep -P", which gives it alien PCRE powers.

What to do with them powers, you ask?  Well, in an audit, obtain the last 10 days or so of proxy server logs. Most companies have them, and be prepared that they are HUGE. Plunk them onto a Unix system of your choice that supports "grep -P". Then, if you are reading malware blogs like Kafeine's http://malware.dontneedcoffee.com and Brad's http://malware-traffic-analysis.net, you have an ample reservoir of URLs for currently active threats. If you "speak" PCRE, turning these URLs into "patterns" is no big deal, and provides good fresh intel. If you don't speak PCRE, you (well, should learn!!) can make use of the "current events" ruleset of Emergingthreats, for example http://rules.emergingthreats.net/open/snort-2.9.0/rules/emerging-current_events.rules. Look for recently added rules that cover trojan activity.

Then, for the analysis, piece the various PCREs together into one big bad*ss PCRE, and run it in a "grep -P" command, like thusly:

daniel@debian$ grep -P "(http:\/\/[^\x2f]+\/[a-z0-9]{6,}_[0-9]+_[a-f0-9]{32}\.html|\/[a-f0-9]{60,66}(?:\x3b\d+){1,4}|\/\??[a-f0-9]{60,}\x3b1\d{5}\x3b\d{1,3}|\/[0-9a-z]{32}.php\?[a-z]{1,3}=[0-9a-z]{32})" bigproxylogfile.txt

Nov 10 11:43:18 local7.info squid[20791]: time='2014-11-10 11:43:18'; rc='TCP_MISS/200'; ip='';  head_type='application/x-shockwave-flash'; size='10751'; req='GET'; url='http://tblwynx.ddns.net/cp9ne2q/65add93b06c4d0042d2fae8cc3585a400ccb629729ae981d3849b1bb7c26a1ec;130000;214';  referrer='http://tblwynx.ddns.net/nrll3fpihn5lzyvnrkk8klq88cnfyapoeivvkbieeeff';

Yes, it will take a while, but if you get any hits, like the "Fiesta" exploit kit hit shown above, I guarantee that it will be highly entertaining to ask the auditee if and whether they noticed anything amiss on the PC on November 10. The "fresher" your PCRE, the better the results that you will pull out of the log. With a decently up to date PCRE, I have yet to see an auditee who doesn't have several "hits" in ten days worth of proxy logs.

If you have any cool PCRE malware detection tricks up your sleeve, please share via the comments below!



4 comment(s)


I use PCREs a lot for auditing proxy logs. I have several custom tools, some of them will be released soon.

If you want to use several PCREs at the same time, I recommend to put them inside a text file, one per line, and use grep's -f option.
This is easier to maintain than one large compound PCRE.

Many example of PCREs for exploit kits you'll find online are anchored: they start with ^ to match the beginning of the line and end with $ to match the end of the line. When you do a grep on proxy logs, you must remove these anchors, otherwise your PCREs will not match.
And while you remove these anchors, you could add your own anchors for word boundaries: \b

I have tools to distribute grep over all the cores of my machine, to speed up grepping.
For example, if you have a 8 core machine, you can start 7 instances of grep in the background, one for each day of the week.
Awesome post!!

I've been toying with a similar idea of pulling ET rules and making Sagan signatures out of them. For example, converting you "Fiesta" malware PCRE into a Sagna rule like this:

alert tcp $HOME_NET any -> $EXTERNAL_NET $HTTP_PORTS (msg: "[PROXY-MALWARE] Fiesta malware request"; pcre: "/(http:\/\/[^\x2f]+\/[a-z0-9]{6,}_[0-9]+_[a-f0-9]{32}\.html|\/[a-f0-9]{60,66}(?:\x3b\d+){1,4}|\/\??[a-f0-9]{60,}\x3b1\d{5}\x3b\d{1,3}|\/[0-9a-z]{32}.php\?[a-z]{1,3}=[0-9a-z]{32})/"; parse_src_ip: 1; parse_dst_ip: 2; reference:url,wiki.quadrantsec.com/bin/view/Main/5002214; classtype:trojan-activity; sid: 5002214; rev:1;)

Now I can detect the even in my proxy logs in real time. I can then view the alert in a console like Snorby, Sguil, etc.

See http://sagan.io for more info.
Thx Daniel,

Another similar project is http://etplc.org.

It's Emerging Threats Proxy Logs Checker, or an IDS for your proxy or webserver Logs.

On each log, check around ~9000 sigs! (etplc project rewrited ~4000 specificaly)

Two "engine" exist: Perl or Python v2/v3.

Lastly write a Elasticsearch "Connector" for ETPLC project.

Thx Community and @EmergingTreats Open Signature.

Happy Detecting!
Thanks Daniel for the interesting post. While I'm not nearly the expert on PCRE as you, your idea about gathering from emergingthreats.net got me to thinking about the criteria to do that.

Here's a quick way to grab the latest pcre strings attached to CURRENT_EVENTS and trojan-activity (grep version 2.0+ is required)

wget -q -O - http://rules.emergingthreats.net/open/snort-2.9.0/rules/emerging-current_events.rules | grep 'CURRENT_EVENTS.*pcre:.*trojan-activity' | grep -oP 'pcre\:"\K.*?(?=\"\; |$)' | sed 's/\(.*\)\/.*/\1/' | sed 's/\$$//' | grep -v "\!"

It goes farther to list only the pcre strings, strips any Snort-specific options (everything including and after the last slash), strips any trailing dollar sign (since it will likely not be at the end of the line in our logs), and then removes any rule with an exclamation point (since bash shell wants to interpret it).

I tried processing this output through tr '\n' '|' to create a large PCRE, but wouldn't recommend it. Found that ET rules repeat PCRE variables that doesn't play nicely in one giant PCRE statement.

Thanks again!

Diary Archives