Converting PCAP Web Traffic to Apache Log

Published: 2018-06-06
Last Updated: 2018-06-06 06:26:38 UTC
by Xavier Mertens (Version: 1)
6 comment(s)

PCAP data can be really useful when you must investigate an incident but when the amount of PCAP files to analyse is counted in gigabytes, it may quickly become tricky to handle. Often, the first protocol to be analysed is HTTP because it remains a classic infection or communication vector used by malware. What if you could analyze HTTP connections like an Apache access log? This kind of log can be easily indexed/processed by many tools.

Haka[1] isn’t a new tool (the first version was released in 2013) but it remains below the radar for many people. Haka is defined as "an open source security-oriented language which allows to describe protocols and apply security policies on (live) captured traffic”. Based on the LUA[2] programming language, it is extremely powerful to extract information from network flows but also to alter them on the fly (playing a man-in-the-middle role). 

I had to analyze a lot of HTTP requests from big PCAP files and I decided to automate this boring task. I found on the Haka blog an article[3] that explained how to generate an Apache access log from a PCAP file. Unfortunately, it did not work anymore probably due to the evolution of the language. So, I jumped into the code to fix it (with some Google support of course).

Let’s start a docker container based on Ubuntu and install the latest Haka package:

$ docker run -it --name haka --hostname haka ubuntu
root@haka:~# apt-get update && apt-get upgrade
root@haka:~# apt-get install libpcap0.8 # Required by Haka!
root@haka:~# curl http://github.com/haka-security/haka/releases/download/v0.3.0/haka_0.3.0_amd64.deb
root@haka:~# dpkg -i haka_0.3.0_amd64.deb
root@haka:~# akapcap -h
Usage: hakapcap [options] <config> <pcapfile>
Options:
    -h,--help:              Display this information
    --version:              Display version information
    -d,--debug:             Display debug output
    -l,--loglevel <level>:  Set the log level
                              (debug, info, warning, error or fatal)
    -a,--alert-to <file>:   Redirect alerts to given file
    --debug-lua:            Activate lua debugging
    --dump-dissector-graph: Dump dissector internals (grammar and state machine) in file <name>.dot
    --no-pass-through, --pass-through:
                            Select pass-through mode (default: true)
    -o <output>:            Save result in a pcap file

Ready!

Basically, Haka works with hooks that are called when a condition is matched. In our example, we collect traffic from interesting ports:

http.install_tcp_rule(80)
http.install_tcp_rule(3128)
http.install_tcp_rule(8080)

Then we created a hook that will trigger HTTP response detected in the PCAP files:

hook = http.events.response,
    eval = function (http, response) {
        ... your code here ... 
    }

The hook extracts information from the HTTP response to build an Apache log entry:

<clientip> - - [<date>] “<request> HTTP/<version>” <response> <size> “<referer>” "<useragent>”

Let’s try it with a PCAP file generated on a network:

$ docker cp test.pcap haka:/tmp
$ docker exec -it haka bash
root@haka:~# hakapcap http-dissector.lua /tmp/test.pcap | grep “GET /“
192.168.254.222 - - [05/Jun/2018:18:34:13 +0000] "GET /connecttest.txt HTTP/1.1" 200 10 "-" "Microsoft NCSI”
192.168.254.215 - - [05/Jun/2018:18:34:14 +0000] "GET /session/...HTTP/1.1" 200 10 "-" "AppleCoreMedia/1.0.0.15E216 (iPad; U; CPU OS 11_3 like Mac OS X; en_us)"
192.168.254.215 - - [05/Jun/2018:18:34:19 +0000] "GET /session/...m3u8 HTTP/1.1" 200 10 "-" "AppleCoreMedia/1.0.0.15E216 (iPad; U; CPU OS 11_3 like Mac OS X; en_us)"
192.168.254.66 - - [05/Jun/2018:18:34:21 +0000] "GET / HTTP/1.1" 200 0 "-" "check_http/v1.4.16 (nagios-plugins 1.4.16)"

For now, the script returns a request size of ‘10’. It is hardcoded like usernames (default to "- -"). I’m still looking for a way to get the number of bytes per HTTP transaction. Also, you get only the client IP address and not the destination one. If you've improvement ideas, let me know!

My script compatible with Hack 0.3.0 is available on github.com[4].

[1] http://www.haka-security.org/
[2] https://www.lua.org/
[3] http://www.haka-security.org/blog/2014/03/18/transform-a-pcap-to-an-apache-log-file.html
[4] https://github.com/xme/toolbox/blob/master/haka_http_log.lua

Xavier Mertens (@xme)
ISC Handler - Freelance Security Consultant
PGP Key

6 comment(s)

Comments

have you had a look at "justsniffer"?
http://justniffer.sourceforge.net/
Cheers, Gebhard
Thank you for sharing. On Twitter, somebody mentioned also urlsnarf (hxxps://linux.die[.]net/man/8/urlsnarf).
Many ways to achieve the same result, that's why I like open source software!
Why not just push the pcap file back through tcpdump and apply an advanced bpf filter so you can get specifically what you are after packet wise or not after, and then before or after that use whatever private keys you have for decrypting the TLs traffic if it is yours... Idk maybe I am thinking this wrong...
The goal here isn't just to isolate the traffic, but instead, to extract the payload, reassemble TCP sessions and then create output that resembles apache logs (so you can then use tools to process that further).

tcpdump may be able to get you the packets isolated if you filter by port and maybe IP address. But it will not reassemble sessions. Newer versions of tcpdump will actually recognize some of the HTTP payloads, but I don't think you can write a filter for them.
See I actually love the logic puzzles of stdin-stdout and you could easily get
With tcpdump -v -n | grep -o "[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}.[0-9]+\s>\s[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}.[0-9]+" | [unique|sort -u]
Unique or sort unique depending how much and how you wanted to look at the conversations.

You could even sed away the ports for just IPs for cursory glance.

This would give you all the conversations in a scriptable way, from there you could make a script with gawk or simple bash to get tcpdump to output the data payloads with the proper bpf filter, and concatonate them. I would first decrypt with private key before doing this...

I am everything bash it helps me as a solo admin for my many projects and all my many many verbosely made logs xD

I'm by far not perfect or all knowing, but I have never found a more useful cli tool than tcpdump, and ngrep
I use these very same concepts not with tcpdump to take the last six months to a year of my routers logs and drops to make a massive ipset of all the subnets that are randomly tip tapping away at my firewall all day long...

In other words respect for all that you do, it's just more fun and enlightening, if I sift through the muck myself like all I had was a openwrt or a pi shell...

XD

Diary Archives