Ping is Bad (Sometimes)

Published: 2011-08-08. Last Updated: 2011-08-08 03:29:51 UTC
by Rob VandenBrink (Version: 1)
8 comment(s)

Ok, maybe ping isn't in and of itself "bad".   But 2 or 3 times per week, I hear someone say something like "it must be down, I can't ping it".  I have to say, this is a bit of a pet peeve for me - in our modern world of firewalls, it's very likely that ping - ICMP echo requests and ICMP echo replies (ICMP Types 8 and 0), are very likely blocked by one firewall or another in the path between the requester and the server or service being tested.  

Even more dangerously, I'll hear "the link / server / network / internet is slow - look at my ping times!".  Using Ping as a measure of RTT (Round Trip Time) performance is no longer a good way to go.  Many ISPs now depress the priority of ICMP packets, so that they'll transport it, but they'll give priority to "real" traffic like HTTP, HTTPS or SMTP..   Network administrators will often also use PING to measure performance of corporate WANs.  This can be *very* misleading, as on most such networks, the protocols deemed important are prioritized at various levels, and protocols such as ICMP that are not defined with a QOS will be transported at the default priority, on a best efforts basis.   So using ICMP to measure networks that are used for VOIP (Voice over  IP), Video over IP, or any traffic governed by QOS (Quality of Service) can be very misleading.

So for a lot of reasons, PING is simply a bad test in many situations.  Either it shows things are down when they're up, or if you are using it as a measure of performance, it's not measuring what you think it's measuring.

What should people do?  Well, first, test hosts for up/down status on transports that they will receive and reply with.  So a webserver should probably be tested using tcp/80, not icmp echo and echo reply.  Similarly, RTT (Round Trip Time) performance of networks should be measured using the protocols that we actually wish to measure.  Protocols such as tcp/80 (http), tcp/443 (https), or tcp/445 (Server message block (SMB) over IP (Microsoft-DS)).

How do we do this?  Well, there are several tools to test exactly this way.  Let's cover a few of them:

HPING3:

HPING3 is an nifty little packet crafter, available for source or sometimes binary install on most linux/*nix distros

Let's test a common internet destination:

robv@robv-desktop:~$  hping3 -p 80 -c 2 -S www.google.ca
HPING www.google.ca (eth0 74.125.115.104): S set, 40 headers + 0 data bytes
len=46 ip=74.125.115.104 ttl=128 id=45928 sport=80 flags=SA seq=0 win=64240 rtt=19.6 ms
len=46 ip=74.125.115.104 ttl=128 id=45929 sport=80 flags=SA seq=1 win=64240 rtt=19.0 ms

--- www.google.ca hping statistic ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 19.0/19.3/19.6 ms


Let's use HPING3 to test a DNS server::

robv@robv-desktop:~# hping3 8.8.8.8  --udp -V -p 53
using eth0, addr: 192.168.169.145, MTU: 1500
HPING 8.8.8.8 (eth0 8.8.8.8): udp mode set, 28 headers + 0 data bytes
^C
--- 8.8.8.8 hping statistic ---
6 packets transmitted, 0 packets received, 100% packet loss
round-trip min/avg/max = 0.0/0.0/0.0 ms
 

Wait, what happened there?  We send udp/53 packets to a DNS server, and didn't get a reply - is it down?  Nope, it's not down, it just won't reply unless it gets a properly formatted DNS request.  This is common in many UDP services - there isn't a 3 way handshake that we can take advantage of to test up/down status of a service.  A packet trace shows us exactly what's going on here (wireshark properly sees this as packets with a DNS destination that are not properly formed DNS packets):

NPING

But what if you're on Windows?  NMAP now comes with NPING, a much more flexible echo tool than the traditional PING we've all used for years.  Let's test response time to that web server again.  I'm using the  "-q" option to reduce the output of the command.

C: >nping --tcp -p 80 -q www.google.ca

Starting Nping 0.5.51 ( http://nmap.org/nping ) at 2011-07-20 21:04 Eastern Daylight Time
Raw packets sent: 5 (270B) | Rcvd: 8 (368B) | Lost: 0 (0.00%)
Tx time: 4.39000s | Tx bytes/s: 61.50 | Tx pkts/s: 1.14
Rx time: 5.39000s | Rx bytes/s: 68.27 | Rx pkts/s: 1.48
Nping done: 1 IP address pinged in 6.02 seconds

If you need to explicitly set the QOS values in the test, nping will also do that (a handy reference from TOS - DSCP - binary/decimal/hex flag values can be found here ==> http://www.cisco.com/en/US/docs/voice_ip_comm/bts/4.1/command/reference/93PktCbl.pdf )

This example will send UDP packets on port 17000 (a port the RTP range normally used for VOIP calls), the QOS shown here is DSCP EF, or IP precedence FLASH (the 2 QOS values normally assigned to VOIP)


C: >nping --udp -g 17000 -p 17000 -q --tos 184 172.17.1.209

Starting Nping 0.5.51 ( http://nmap.org/nping ) at 2011-08-07 21:54 Eastern Daylight Time
Raw packets sent: 5 (210B) | Rcvd: 0 (0B) | Lost: 5 (100.00%)
Tx time: 4.00100s | Tx bytes/s: 52.49 | Tx pkts/s: 1.25
Rx time: 5.00100s | Rx bytes/s: 0.00 | Rx pkts/s: 0.00
Nping done: 1 IP address pinged in 6.27 seconds

 

But again, UDP testing problems strike again - note that nping tells us that we have 100% loss on this test.  If you do a packet trace, you'll see that the UDP packets are sent, nothing comes back at all - RTP is the right protocol, but the session needs to be negotiated properly before you'll see traffic. The packet trace below shows that there are no return packets.

Cisco Routers IP SLA

As a side note, Cisco routers have an "IP SLA" feature, which will (at the requesting router), send test TCP or UDP packets on any port, and at the other end, reply back on the same protocol.  This neatly solves the "how can I measure my WAN QOS?"  problem, but what it doesn't do is measure from a real client to a real destination, so this method won't tell you if the webserver in your datacenter is up or not.  I won't show an example here, the product documentation does a good job of that.

PYTHON / SCAPY

Finally, what if you don't have these tools, or can't install tools, or can't change the config on your routers?  You can do all of this in a short python script using the scapy library (it's python month for me).  Note that the overhead of an interpreted language like python will throw off any RTT times, plus, while Scapy is just about the coolest thing ever, it isn't a speed demon (I think the majority of the delay is in Scapy actually).  This method is not a good way to test performance, but it will accurately give you up/down status through ACLs.

The nice thing about using python for this is that it is so portable - if you can't install a tool but have python, you can generally throw your own tool together in short order (I put TCPING and UDPING together during a lunch break at SANSFIRE), especially if you can google for similar examples or documentation.

Here's an example TCPING run (note the high echo time due to the overhead of this method - over 1 second ! ):

robv@robv-desktop:~# python tcping.py www.google.ca 80
WARNING: No route found for IPv6 destination :: (no default route?)
RECV 1: IP / TCP 74.125.226.49:www > 192.168.169.145:55154 SA / Padding
RECV 1: IP / TCP 74.125.226.49:www > 192.168.169.145:46152 SA / Padding
RECV 1: IP / TCP 74.125.226.49:www > 192.168.169.145:31151 SA / Padding
RECV 1: IP / TCP 74.125.226.49:www > 192.168.169.145:18754 SA / Padding
RECV 1: IP / TCP 74.125.226.49:www > 192.168.169.145:39331 SA / Padding

Sent 5 packets, received 5 packets. 100.0% hits.
Host is up  , approximate RTT is  1002.07920074 ms

I've attached the tcping.py script, as well as a companion udping.py (with the same syntax).  They're certainly not the finest python coding you'll ever see, but feel free to review them, and mod them to fit your own requirements if you find them useful.  Again, take care when testing UDP services.

I hope this is useful.  If you use PING frequently, I hope this sheds some light on why PING might be a good test in some cases, but not in others, and what tools you might use to deal with reachability and QOS issues.

As always, your comments are welcome !

 

===============
Rob VandenBrink
Metafore

8 comment(s)

Comments

With respect to DNS, I frequently use dig to check liveliness; I sometimes use this to check my connection itself by firing a request off to Google's 8.8.8.8 service.
I find iPerf the most accurate measure of throughput - it allows you to specify the port, protocol and packet size and reports on bandwidth, jitter and datagram loss. It's also available on both Windows and *nix platforms....
Rob - It is truly sad to see even senior system and network engineers dependent on ping for performance monitoring. I have IPSs that QoS traffic - ICMP is the lowest priority. They cross an IPS and immediately jump to the conclusion that is the performance problem. I have always recommended telnet for checking listening ports remotely but you have listed a number of great tools. Thanks!
For your UDP packet test, you can use Nping's echo mode. It acts as a server that listens on any port and returns an encapsulated packet to show exactly what it received. This is also useful when you suspect your traffic is being modified by some hop along the way.
'Couldn't agree more with the notion of measure the latency of the services you care about, not the latency of icmp packets. It goes along with having monitoring servers actually test services, not just ask some remote agent if service XYZ is running - after all the service may be running but not actually functioning (or too slow to be useful).

Along the lines of QoS interfering with things, I also made a nagios plugin for checking the size of QoS queues to let us know if we're ever moving enough traffic that we're beginning to drop lower-priority traffic for more than X minutes at a time. Newish Ciscos make this fairly easy once you find the QoS stuff in their MIB tree.
There are also the dedicated tools httping and tcping. I you they are available for *nix systems but I'm not sure about Windows sufferers. Httping is available for Cygwin, at least.
I use a compiled win32 binary called tcping frequently to test through to the service level:

http://www.elifulkerson.com/projects/
SmokePing is a good monitoring/graphing tool which can use ICMP or agents such as echopinghttp/s or curl to do layer 7 testing.

Diary Archives