Last Updated: 2015-09-07 13:50:36 UTC
by Xavier Mertens (Version: 1)
Threat intelligence became a hot topic for a while. The food of threat intelligence is based on IOC's (Indicators of Compromise) which contains technical information like:
- Files, path
- IP addresses
Mixed with other sources of information or tools, they help in detecting malicious behaviors of programs or networks. They are plenty of sources to collect IOC's. Some are publicly available while others are compiled and maintained by organizations for their customers or restricted users. DShield is of course a good source of IP addresses but Lenny (another ISC handler) is maintaining a nice list of resource on his website(1). Usually, free services offer lists of IOC's in common format that are reusable in your own environment. But sometimes, you will find interesting information published online. Many security researchers analyze pieces of malware and publish the results on their blog. Big organizations like to publish nice PDF reports containing juicy information. In both case, IOC's can be present but how to extract them automatically?
ioc-parser(2) is a nice Python script which might be very helpful in this case. It parses an input file and generates a list of IOC's in another format. It supports the following input formats: Text files, PDF files or HTML (URLs). Results can be generated in CSV, JSON, YARA or NetFlow. The idea is simple, it searches for patterns based on regular expressions. Everything is configurable and your own regexp can be added.
Here is the list of IOC's extracted from an old PDF report about Duqu 2.0 written by Kasperky Lab:
But you can access URLs directly and extract IOC's present in the HTML code of the latest MalwareMustDie blog article:
And the same results generated in YARA format:
This is a nice script to keep in your personal toolbox. Of course, be careful to not re-use the generated data "as is", there could be false positives or bad regular expression matches.
Happy IOC's hunting!