Reducing False Positives with Open Data Sources

Published: 2016-02-22. Last Updated: 2016-02-22 09:08:42 UTC
by Xavier Mertens (Version: 1)

Today, the number of daily attacks is so important that we can’t rely on a single solution to protect us. In a previous diary, I spoke about how “Unity Makes Strength” (link). The idea behind this concept is to collect useful information from one side and re-inject them into another side. This increases chances to detect/block interesting activity. On paper, this solution looks nice but it can also introduce false positives that can have a disastrous impact. These false positives can be introduced by mistakes or by the attacker himself having the knowledge of the collection process in place.

The project described in my previous diary has been completed: the integration of FireEye and Palo Alto Networks firewall. URLs flagged by the FireEye appliances are smoothly injected into the firewall configurations, great! But, we also detected that some pieces of malware are using well-known URLs. The best example we faced was a ping to www.oracle.com. You can imagine the impact for developers or DBAs who could not access Oracles’s website because it was detected as malicious and blocked in the firewall. It could be easy for an attacker to write some code which will “ping” websites like google.com, microsoft.com or ...

To decrease the risk of such false positives, why not use other types of open data and add extra checks? Alexa is a company providing analytics tools for websites. Amongst different types of subscriptions, they provide for free a list of top-ranked websites updated daily (available here). To prevent sites like oracle.com to be blocked, an extra check has been added in the information flow:

For performance reasons, we limited the list to the top-5000 websites. A new lookup lookup file was created in Splunk:

[alexa_5000]
filename = top-5000.csv
case_sensitive_match = false
match_type = WILDCARD(domain)

To test it, you can search for the presence of any top-5000 website in your Squid logs:

sourcetype=squid | top uri_host | lookup alexa_5000 domain as uri_host

And now you can use the followed lookup to prevent URLs from the top-5000 to be automatically processed. Here is an example of query extracting malicious URLs from FireEye CEF events:

index=malwares eventtype="fe" (category="infection-match" OR category="malware-object") cs6=* 
| rex field=cs6 "~~Host:\s(?.*?)::~~" 
| dedup reURL 
| lookup top_5000 domain as reURL OUTPUTNEW 
| table reURL

This query generates a table of URLs that are _not_ present in the top-5000 Alexa file. Now you can use this output in alerts, scripts, etc.

Xavier Mertens
ISC Handler - Freelance Security Consultant
PGP Key

Keywords: alexa data false positive open data sources

0 comment(s)