Create a Summary of IP Addresses from PCAP Files using Unix Tools

Published: 2010-03-27. Last Updated: 2010-03-27 21:22:43 UTC
by Guy Bruneau (Version: 1)
8 comment(s)

Every once in a while we collect large PCAP files for analysis. However, there are times when we are looking for a summary list of either source or destination addresses in those PCAP that were seen over a period of time in those files. The two examples shown here represent two suspicious ports that I noticed targeted this week and wanted to know the source IPs of this traffic.

First, if needed, we need to remove the IP or IPs we don't want to include in our summary. If we are going to reuse a PCAP filter several times, it is better to create a libpcap filter in a file and use tcpdump -F filter to use it. (tcpdump -nr file.pcap -F parsing_filter).


Breaking down the filter

In order to be able to manipulate the data to our advantage, we need to determine what we are looking for. With our two examples, we are going to find which source IP addresses sent a TCP SYN packet to our gateway IP 192.168.21.32 to port 465 and 2522 with the number of occurrence that happened in each of the PCAP files.

My complete traffic parsing looks like this:

guy@seeker$ tcpdump -ntr 2010032501 'dst host 192.168.21.32 and tcp[13] = 0x02 and dst port 2522' | awk '{print $2}' | tr . ' ' | awk '{print $1"."$2"."$3"."$4}' | sort | uniq -c | awk ' {print $2 "\t" $1 }'

reading from file 2010032501, link-type LINUX_SLL (Linux cooked)
XX.169.170.84 10

Breaking Down each Sections

- Part 1 is the tcpdump switches and we are using -n (don't resolve), -t (don't print date/time) and -r 2010032501 (file name to replay).

- Part 2 is the libpcap filter ('dst host 192.168.21.32 and tcp[13] = 0x02 and dst port 2522') which filter all inbound TCP SYN packets (tcp[13] = 0x02) to our gateway (dst host 192.168.21.32) to TCP port 2522.

IP xx.169.170.84.50316 > 192.168.21.32.2522: S 2853915482:2853915482(0) win 65535 <mss 1412,nop,wscale 3,nop,nop,timestamp 895725079 0,sackOK,eol>
IP xx.169.170.84.50316 > 192.168.21.32.2522: S 2853915482:2853915482(0) win 65535 <mss 1412,nop,wscale 3,nop,nop,timestamp 895725088 0,sackOK,eol>
IP xx.169.170.84.50316 > 192.168.21.32.2522: S 2853915482:2853915482(0) win 65535 <mss 1412,nop,wscale 3,nop,nop,timestamp 895725098 0,sackOK,eol>
IP xx.169.170.84.50316 > 192.168.21.32.2522: S 2853915482:2853915482(0) win 65535 <mss 1412,sackOK,eol>
IP xx.169.170.84.50316 > 192.168.21.32.2522: S 2853915482:2853915482(0) win 65535 <mss 1412,sackOK,eol>
IP xx.169.170.84.50316 > 192.168.21.32.2522: S 2853915482:2853915482(0) win 65535 <mss 1412,sackOK,eol>
IP xx.169.170.84.50316 > 192.168.21.32.2522: S 2853915482:2853915482(0) win 65535 <mss 1412,sackOK,eol>
IP xx.169.170.84.50316 > 192.168.21.32.2522: S 2853915482:2853915482(0) win 65535 <mss 1412,sackOK,eol>
IP xx.169.170.84.50316 > 192.168.21.32.2522: S 2853915482:2853915482(0) win 65535 <mss 1412,sackOK,eol>
IP xx.169.170.84.50316 > 192.168.21.32.2522: S 2853915482:2853915482(0) win 65535 <mss 1412,sackOK,eol>


- Part 3 we add a pipe with awk (| awk '{print $2}') to print only the source IP from our tcpdump result. Field $2 (source IP) could be changed to $4 to use the destination address.

reading from file 2010032501, link-type LINUX_SLL (Linux cooked)
xx.169.170.84.50316
xx.169.170.84.50316
xx.169.170.84.50316
xx.169.170.84.50316
xx.169.170.84.50316
xx.169.170.84.50316
xx.169.170.84.50316
xx.169.170.84.50316
xx.169.170.84.50316
xx.169.170.84.50316


- Part 4 we add a pipe with tr (| tr . ' ') to change the period to a space so we can remove the source port (50316) in the next step.

reading from file 2010032501, link-type LINUX_SLL (Linux cooked)
xx 169 170 84 50316
xx 169 170 84 50316
xx 169 170 84 50316
xx 169 170 84 50316
xx 169 170 84 50316
xx 169 170 84 50316
xx 169 170 84 50316
xx 169 170 84 50316
xx 169 170 84 50316

- Part 5 we add a pipe with awk (| awk '{print $1"."$2"."$3"."$4}') to reconstruct the source IP address(es).

reading from file 2010032501, link-type LINUX_SLL (Linux cooked)
xx.169.170.84
xx.169.170.84
xx.169.170.84
xx.169.170.84
xx.169.170.84
xx.169.170.84
xx.169.170.84
xx.169.170.84
xx.169.170.84
xx.169.170.84

- Part 5 we add a pipe with sort ( | sort) to sort our traffic by IPs. In this case we only have one source.

reading from file 2010032501, link-type LINUX_SLL (Linux cooked)
xx.169.170.84
xx.169.170.84
xx.169.170.84
xx.169.170.84
xx.169.170.84
xx.169.170.84
xx.169.170.84
xx.169.170.84
xx.169.170.84
xx.169.170.84

- Part 6 we add a pipe with uniq -c (| uniq -c) to count the number of times a source IP was see in the PCAP file.

reading from file 2010032501, link-type LINUX_SLL (Linux cooked)

10 xx.169.170.84

- The last part is just for formatting purposes, we reverse the order of the last output and insert a tab (| awk ' {print $2 "\t" $1 }') to show the IPs in the first collumn and the number of time seen in the second.

reading from file 2010032501, link-type LINUX_SLL (Linux cooked)

xx.169.170.84 10

 

Another example with its results to destination port TCP 465.

guy@seeker$ tcpdump -ntr 2010032508 'dst host 192.168.21.32 and tcp[13] = 0x02 and dst port 465' | awk '{print $2}' | tr . ' ' | awk '{print $1"."$2"."$3"."$4}' | sort | uniq -c | awk ' {print $2 "\t" $1 }'

reading from file 2010032508, link-type LINUX_SLL (Linux cooked)

XX.237.148.241 3
XXX.197.208.107 3
XXX.199.183.68 3
XXX.22.87.36 3

-----------

Guy Bruneau IPSS Inc. gbruneau at isc dot sans dot org

8 comment(s)

Comments

Instead of tr and awk you could use cut to get rid of the ports:
cut -d. -f1-4

Or you could tell awk to use . as the delimiter, getting rid of the tr:
awk -F. '{print $1"."$2"."$3"."$4}'
I'm always a fan of the Unix toolbox, but know that recent versions of wireshark have a lot of summarization functionality built in.

On version 1.0.x (and earlier, but I think some of the submenus changed), there's a Statistics menu item. Statistics->Endpoints will yield a pop-up window with tabs for Ethernet, IPv4, and any other layer 2, 3, and 4 protocols found in the capture. Each tab shows a sorted unique list of endpoints, plus packets and bytes related to each, then further broken down by transmit and receive. There's a copy button to enable pasting into other applications for further processing. Statistics->Conversations shows similar details, but broken down pairwise by "conversations" between any two hosts.

Lots of other interesting application-specific stuff on Statistics menu, too.
That is true there are other options to get rid of the destination port and your example is simpler.

As for Wireshark, there now exist some great filters in the Statistics menu but if you work with file greater than 2 or 3 GB (I often do), Wireshark becomes impractical.
When dealing with datasets that larger (or much much larger) I prefer to use a tool that is designed for flow analysis instead of packet analysis. Argus (http://www.qosient.com/argus/) is my preferred tool, it will typically process multiple gigs of raw packets down into only a few megs of flow records, then using the secondary tools you can do searches and generate statistics.
I think the best command-line option that is likely to be readily available would be: tshark -r <pcapfile> -q -z conv,ip
Have you tried xtractr which does both flow/packet analysis with full-text search? http://code.google.com/p/pcapr/wiki/Xtractr
I wasn't aware of this new tool Xtractr. It looks like a great tool.
I prefer tshark for command line jobs, from network capturing until specific filtering and decoding work.

To get IP addresses I use a simple bash loop, opening all pcap files, get all source (and if needed destination) addresses, and store the output into a file:

for i in `cat filelist`; do /usr/bin/tshark -n -t a -Tfields -e ip.src -r $i > a ; /usr/bin/tshark -n -t a -Tfields -e ip.dst -r $i > b ; cat a b | /usr/bin/sort -u > IP/$i ; rm -f a b ; done

filelist is created by "ls -1 DIR/*pcap > filelist".

The result list is stored in a different sub-dir with the same name as the source pcap file. When importing these files (see 2nd step) into MySQL I also store a reference to the source pcap file.

In a second step I store all IP addresses into a MySQL database, combined with a Whois and GeoLocalisation. This is done by a small and simple PHP script, which also check if an IP address is already stored. So I don't need uniq to remove double IP.

This IP list and Whois database table is only a small part of a bigger solution for forensic network analysis. Beside this I use some different UNIX tools to decode and analyse pcap files and all important results will be stored in MySQL, too. A simple PHP frontend will combine all different results, allow fulltext search and more.
More tools: tcpflow, dsniff tools (sshow, dsniff, urlsnarf, mailsnarf, msgsnarf).

I'm always looking for new/more Linux tools for forensic network analysis.
Currently I'd like to see a newer release of the dsniff tools which is able to handle newer protocols - or other tools doing the same job!

Furthermore I'd need a decoding tool to get binary file transfers from pcap or tcpflow result files, containing the HTTP header "application/msdownload". Unfortunately uudeview won't decode such files.


Diary Archives