CSAM Month of False Positives: Appropriately Weighting False and True Positives

Published: 2014-10-31
Last Updated: 2014-11-01 14:24:52 UTC
by Russell Eubanks (Version: 1)
5 comment(s)

This is a "guest diary" submitted by Chris Sanders. We will gladly forward any responses or please use our comment/forum section to comment publicly. 

If you work with any type of IDS, IPS, or other detection technology then you have to deal with false positives. One common mistake I see people make when managing their indicators and rules is relying solely on the rate of false positives that are observed. While false positive rate is an important data point, it doesn't encompass everything you should consider when evaluating the effectiveness of a rule or indicator. For instance, consider a scenario where you have a rule that looks for a specific byte pattern in outbound traffic like this:
alert tcp $HOME_NET any -> $EXTERNAL_NET any (msg:"Random Malware"; flow:to_server,established; content:"|AB BF 09 B7|"; sid:12345; rev:1;)
You can see that this rule isn't incredibly specific as it examines all TCP traffic for four specific outbound bytes. As a result, there might be potential for false positives here. In this case, I ran this rule on a large network over the course of a month, and it generated 58 false positive alerts. Using that data point alone, it sounds like this rule might not be too effective. As a matter of fact, I had a few people who asked me if I could disable the rule. However, I didn't because I also considered the number of true positive alerts generated from this rule. Over the same period of time this rule generated 112 true positive alerts. This means that the rule was effective at catching what it was looking for, but it still wasn't entirely precise.
I mention the word precise, because the false positive and true positive data points can be combined to form a precision statistic using the formula P = TP / (TP + FP). This value, expressed as a percentage, can be used to describe exactly how precise a rule is, with higher values being more desirable. In the case of our example rule, the rule has 65.9% precision, meaning that it successfully detected what it was looking for 65.9% of the time. That doesn’t sound like a rule that should be disabled to me. Instead, I was able to conduct more research and further tune the rule by looking for the byte pattern in a specific location in the packet.
When examining rules and indicators for their effectiveness, be sure to consider both true and false positives. You might miss out on favorable detection if you don't.
Chris Sanders
Twitter: @chrissanders88
Blogs: http://www.appliednsm.com & http://www.chrissanders.org
5 comment(s)


P = TP + (TP + FP) should be P = TP / (TP + FP)
Thanks! The typo has been corrected.
No it hasn't. The error is still in the article.
Updated the article to show the proper formula of P = TP / (TP + FP).

Russell Eubanks
We want to find an attack pattern only among the traffic data pattern. And basically rule written by attack pattern only. But attack pattern always is not a mean attack, according to traffic data context. So we must figure out our network traffic data context. And must write the rule to match attack context or to evade false positive context. So we analyzed every ids log by the rules. I encourage analyzed the log per 1 week unit. Refer to follow the link.

Diary Archives