Cisco Security Advisory: Cisco IOS Software NAT (Network Address Translation) Vulnerabilities -
Cisco Security Advisory: Cisco IOS Software SIP (Session Initiation Protocol) Denial of Service Vulnerabilities -
Cisco Security Advisory: Cisco IOS Software H.323 Denial of Service Vulnerabilities -
Cisco Security Advisory: Cisco IOS SSL VPN Vulnerability -

Network Reliability - the Good, the Bad, the Ugly, and the Not-so-Bright

Published: 2010-09-22
Last Updated: 2010-09-22 15:03:38 UTC
by Rob VandenBrink (Version: 1)
4 comment(s)

As a self confessed and self described "Network Person", I design and build redundant systems every day.  The kind of  systems where you can lose entire racks and still be up, where you can do upgrades and reboots without down time.  What  struck me recently is two things: 

A big part of this job lately seems to be in education - going over redundancy and recovery mechanisms and options with clients in  advance of doing a design, and certainly well before a build.  While many of these mechanisms have been with us for years,  in many large companies decision makers don't seem to be aware of what is available to them, native on the box and for free.

The other thing that struck me is that even when I'm "preaching to the choir" - when the folks I'm working with know what  their gear can do, and in many cases have already built their infrastructure the right way - people aren't aware of the  security implications of the tools we use to make our infrastructure reliable and/or redundant.  Each thing we do to make  things better gives an attacker another method to attack or compromise things.

I recently heard this described as "Thinking  Backwards", and can't think of a better way to describe this aspect of things.  (from SANS SEC542 Audio, Kevin Johnson and an unnamed student)

While I seem to be drawing the same "Enterprise 101" diagram a few times per week, lately it's been a coin toss whether it's  been for an "Enterprise 101" discussion, or an "Enterprise Attack / Defense 101" whiteboard talk.  This entire ball of worms seems like a good discussion for an ISC Diary.  I'll start the conversation with an "Enterprise  101" review, outlining each of the mechanisms we'll discuss.  In upcoming diaries, we'll tackle each of the reliability  methods, and discuss how they're sometimes not as reliable as you think they are, what security pieces they are missing, why defaults are BAD, and how  to secure them (if possible).

I'm hoping that our readers will help out.  If I've missed a topic you'd like to see, please let us know in the comment  form.  If I've overlooked a topic, or if I haven't explained things completely (or just plain errored out or otherwise  missed the boat), please use our comment form and fill us all in. 

Enterprise 101 - "The Good"

I'll start off this week with the textbook descriptions.  What the more common reliability methods are, what they do, why you  might implement them.  This discussion will be a tad non-technical, but don't worry, when we start breaking stuff in later diaries, we'll get to see some configuration examples, "real" tools and in some cases packets.  Even this high level conversation I think has a lot of value - lots of folks aren't aware of basic mechanisms for ensuring network availability.  With information security often defined in the context of the CIA triad (Confidentiality, Integrity and Availability), I find that we often neglect the "Availability" aspect - we tend to consider it as more of an operational thing than a security thing.  We'll base our discussion on the diagram below.  Again, if you'd like to see something added, please use our comment form - I'll update this diary based on comments.

At the heart of many of these protocols and methods is either a primary/backup concept, or an active/active pairing.  You'll start this as a pretty consistent pattern as we go through them.



Layer 3: HSRP / VRRP

What these protocols give you is layer 3 redundancy.  If the default gateway on a subnet should go offline, then no-one on  that subnet can access resources off of that network.  If things like DNS servers are affected in such an outage, it's  likely that even resources on that same subnet won't be accessible.  HSRP and VRRP are two protocols that allow you to set up another router (or layer 3  switch) as a backup to the primary.  If the primary fails, the backup takes over the gateway IP, and the clients on that subnet are none the wiser.  On most days what this means to the network maintainer is that hardware or software upgrades can be  done with minimal interruption, often during business hours (we all get enough late nights in this biz).

HSRP (Hot Standby Router Protocol) has been around forever, it's the Cisco answer to this problem.  VRRP (Virtual Router  Redundancy Protocol) is the open standards answer to this - the current version is defined in RFC5798 (previously in RFC3768  and before that in RFC2338).

Layer 2: Spanning Tree (and TRILL)

What spanning tree does is prevent loops in the network.  If a layer 2 frame is sent out on the wire, and the switch does  not have the destination MAC address in it's local table, it sends the packet to all of it's ports in hopes that somebody  will claim the packet and reply to it.  In a single switch environment, that's the end of it.  In a muliple switch  environment, this broadcast is potentially repeated on every switch in the environment.  The problem is that if you form a  loop  - in the simplest case, having two wires connecting a pair of switches - this process can easily repeat infinitely.   The frame will come in one port from switch A to B, then get forwarded back to switch A on the other link, then back to B  and so on.  In short order the network "melts" and becomes unusable, as these frames never go away.  In a complex network  loops may not be so simple or obvious, but the bigger the network the bigger the impact.   What spanning tree does is  simplify this - it defines a "root bridge", and creates a single, least cost path with no loops between all the switches.  Path "costs"  are determined in an algorithm based mostly on hop count and port speed, but can be overridden by configuration on the  boxes.  With a single path (the designated path) to every bridge on the network, frames to unknown destinations transit each  switch once, then eventually die if the destination host is not on the network.

In the case of a link failure on a designated path, the switches detect that the failure occurred, and one of the backup  links takes over (this gets a lot more complicated, stay tuned).

The obvious downside to this is that we tend to connect switches together using our fastest, most expensive links.  A pair  of 10GB links can really add up, cost wise, and even 1GB links can be expensive if it's over single mode fiber or long reach ethernet.  Plus it  seems a real shame to leave all that bandwidth idle "just in case".  The answer to this is a new standard called TRILL  (Transparent Interconnection of Lots of Links).  In a TRILL config, all of these links are live, and in it's simplest  explanation the switches discover and maintain an SPF (Shortest Path First) table of MAC addresses, which defines the best path in a multihop  environment from any source to any destination MAC address.  TRILL is bright-shiny-new, and is not yet widely deployed.  Some vendors have TRILL compliance  in their highest-end products, look for TRILL to show up in smaller switches over the next few years.

Layer 2: Etherchannel / LACP (802.3ad) / PAGP

Etherchannel is a common method of taking a few network links and "ganging them up" to make a faster link.  For instance, 4  100mbps links can be combined to make a faster channel.  The common misconception is that combining these links simply adds  them up - in our example, you'd think that 4x100 = 400Mpbs, but that's not the case.  What happens is that for every source  and destination, a path is chosen.  So all the traffic from host A to host B will take one link, and all the traffic from  Host A to C might take another.  This means that, in our example of 4 links, each conversation has 100 Mbps available to it, but 4 conversations can happen at once. 

The source and destination can be defined in several ways, commonly by IP address or MAC  address.  So if you have one large file copy or backup job, it'll likely only use one link.  Using Source/Destination MAC address for balancing etherchannel links is generally easier on the hardware, but keep in mind that the default gateway on any subnet has a single MAC, so if you're  communicating off your subnet, source/destination IP address might make better use of your multiple paths.

Spanning tree is disabled on Etherchannel links.  PaGP (Port Aggregation Protocol) is the Cisco implementation of  etherchannel.  LACP is the standards based protocol (802.3ad).

Layer 1/2 Redundancy on Servers - NIC Teaming

NIC teaming is redundancy at the hardware level for servers.  This is usually implemented by installing additional drivers, creating a virtual NIC, then adding the physical NICs to that to create a "team".  In most cases the team operates as an active/passive pair, where the passive NIC only kicks in if link fails (ie the cable is pulled) on the active.  However, there are usually more advanced options, up to and including support for 802.3ad (see etherchannel above).

Layer 3 Redundancy - Routing Protocol

Routing protocols offer LOTS of avenues for redundant paths and path selection based on various metrics.  They give us lots of ways of defining best paths between networks, combining links for performance and detecting and routing around failed links.

They also are, almost without exception, based on the "I trust you" model.  If you speak their language, you can reroute or hijack any traffic you want.  From an attackers perspective, the trick is to then send the traffic back on it's way, so that you can capture a useful datastream - simply being a black hole for packets doesn't accomplish anything, unless you are trying to DOS someone. 

There have been some noteworthy illustrations of intentional and accidental Denial of Service based on routing protocols - last month's BGP experiment-gone-wrong launched by RIPE-NCC and Duke for instance, or any number of BGP mistakes made by one ISP or another over the years (Pakistan's PieNet DOS of Youtube in 2008 comes to mind for instance).

What's next?

Look for ISC diaries coming up that discuss each of these topics in a lot more depth, from the perspective of defense against an attacker.  We'll try to break each one of them, and discuss how best to protect them from compromise.


4 comment(s)


eweew<a href="">mashood</a>
dwqqqwqwq mashood
[ |]
What's this all about ..?
password reveal .
<a hreaf="">the social network</a> is described as follows because they respect your privacy and keep your data secure:

<a hreaf="">the social network</a> is described as follows because they respect your privacy and keep your data secure. The social networks are not interested in collecting data about you. They don't care about what you're doing, or what you like. They don't want to know who you talk to, or where you go.

<a hreaf="">the social network</a> is not interested in collecting data about you. They don't care about what you're doing, or what you like. They don't want to know who you talk to, or where you go. The social networks only collect the minimum amount of information required for the service that they provide. Your personal information is kept private, and is never shared with other companies without your permission

Diary Archives