Extracting Information From "logfmt" Files With CyberChef
Last Updated: 2022-11-12 13:15:59 UTC
by Didier Stevens (Version: 1)
I recorded a video for this diary entry.
I regularly have to look into log files that have a format that seems to be informally called "logfmt" (I'm not sure of the name, if you know a better definition, on Wikipedia for example, please post a comment).
Every log line is a sequence of name1=value1 name2=value2 ... Thus each line contains the name of the field and the value of the field.
For this diary entry, and other examples, I created this "conn.ips.logfmt" file (it's based on conn.log.gz from this repository).
So, the problem: you have a log file with network events, and you want to know to which public IPv4 addresses a particular client connected to.
Here is a solution with CyberChef.
Log files can be very large, sometimes gigabytes of data. A browser running CyberChef can not process such files efficiently. My browser often crashes when I try that.
In that case, it's best to grep the file for the IPv4 address of the client you are interested in (192.168.202.106 in our example).
Like this: grep -F 192.168.202.106 logfile.log > logfile.grep.192_168_202_106.log
You don't have to bother with boundaries, this is something we will deal with in CyberChef (say that you grep for 192.168.202.10, then you will also select 192.168.202.100, 192.168.202.101, 192.168.202.102, ...)
I start with a GZip compressed version of the file, conn.ips.logfmt.gz, and I load it into CyberChef:
Next, I apply the Gunzip operation to obtain the decompressed log (if your log is not compressed, you can ignore this step):
A filter operation with a regular expression allows me to select all log lines where 192.168.202.106 is the source of a connection. I use this regular expression (without double quotes): "srcip=192\.168\.202\.106 ".
Since the dot (.) is a special character in regular expression syntax (it represents any character), I have to escape it: \.
Notice the space character at the end of the regular expression: this is how I handle boundaries in this example. If my source IPv4 address would have a LSB byte that is smaller than 100, for example 192.168.202.10, then a regular expression like "srcip=192\.168\.202\.10" would also select IPv4 source addresses like 192.168.202.100, 192.168.202.101, ...
The space character is a field separator in this log, so I add a space character at the end of the regular expression.
Notice that this would not work if the srcip field is the last one in the log, because then there would be no space character after that field.
Another solution would be to use the meta character for word boundaries: \b. Like this: "srcip=192\.168\.202\.106\b".
But some time ago, I found an issue with this word boundary character: I assumed that it meant: anything that is not a letter or digit is not part of word, and is thus a boundary. That's incomplete: it's actually: "anything that is not a letter or digit or underscore".
Next operation of the CyberChef recipe: apply a regular expression to select all the destination addresses:
Regular expression "dstip=[^ ]+" selects all lines with a dstip field and its value. By defining a capture group ("dstip=([^ ]+)") and changing the output format to "list capture groups", I can select the individual IPv4 addresses.
Next, I want only public IPv4 addresses. This can be done with operation "Extract IP addresses" and selecting the option "Remove local IPv4 addresses":
And finally, a unique operation with counters:
This CyberChef recipe can be found here.