Health or Performance monitoring to detect security events.

Published: 2011-07-19
Last Updated: 2011-07-20 13:10:34 UTC
by donald smith (Version: 1)
1 comment(s)

Brent wrote in in response to ChrisM's diary about helping us help you.

"One of the things I stress to other admins is the importance of performance monitoring. Not only is it useful for
diagnosing performance bottlenecks, but it's useful from a security perspective too, provided someone is willing to
skim performance graphs on a regular basis to get a feel for what "normal" is.

For instance, we track the query stats on our DNS servers and back in March I saw an odd jump in query failures on
one of our external DNS servers. 

A look at a 2nd graph

showed that these queries were for A records.  When I see an anomaly like this (things that make me say "hmmm") I go investigate.  In this case, it was a flood of queries for hostnames/domains our DNS servers weren't authoritative for (and, of course, they're set up to refuse recursive queries).

What was interesting was these queries initially came from a wide variety of IPs (many of which were in RBLs as
compromised systems) and soon thereafter, they were coming from our IP space, but mostly from blocks not currently
in use.

Checking performance stats has exposed all sorts of things - misbehaving software doing dozens of queries per second
for the same hostname, a compromised system looking up millions of MX records to try to send spam, someone running a
portscanner (and causing a big spike in rejected packets from our egress filters), etc.  Ya never know what you'll  find, if you just go look regularly. :-)"

I couldn't agree with Brent more. Health and performance monitoring tools can and should be used to detect security related events. "Peacetime learning" or monitoring while not under attack or unusual load is used in DDOS detection. Netflow which is commonly used to detect DDOS attacks today was originally designed for BILLING on "burstable" pipes:)
SNMP monitoring is frequently used to detect attacks against a system. If the memory or other resources suddenly goes
WAY UP you can bet something is wrong and in many cases that will be a security related event. So if your performance and health monitoring team isn't tied tightly to your security team you may want to introduce them.

Lastly the "triad" of security are frequently referred to by the TLA, CIA.
Confidentiality, Integrity, and Availability (2 "new" ones were added a while back Authenticity and Non-Repudiation).
Availability is either one third or one fifth of security practitioner's job, depending on which version of the "triad" your following.

1 comment(s)


In some recent research I have been doing on looking at monitoring cloudy solutions espcially IaaS the performance monitoring API's where probably one of the best way to natively and in a scalable fashion try and do some basic security monitoring and I know in my day to day job I use our performance monitoring systems all the time to check for gremlins running around the network.

Diary Archives