Filter JSON Data by Value with Linux jq
Since JSON has become more prevalent as a data service, unfortunately, it isn't at all BASH friendly and manipulating JSON data at the command line with REGEX (i.e. sed, grep, etc.) is cumbersome and difficult to get the output I want.
So, there is a Linux tool I use for this, jq is a tool specifically written to manipulate and filter the data I want (i.e. like scripting and extract the output I need) from large JSON file in an output format I can easily read and manipulate.
The most common form of logs I work with are JSON arrays (start and end []). For example, using a basic example like this to demonstrate how to iterate over an array:
echo '["a","b","c"]' | jq '.[]'
which will result to this using the object value iterator operator .[] will print each item in the array on a separate line:
"a"
"b"
"c"
In this next example, I take the data from the bot_ip.json file, parse the list of IP addresses and which site they came from. Before parsing this file, here is how the raw output of the file starts:
cat bot_ip.json | jq '.objects[].ip + ": " + .objects[].source' | sort | uniq
The output looks like this:
"212.39.114.139: Botscout BOT IPs"
"216.131.104.82: Botscout BOT IPs"
"2607:90:6628:470:0:4:0:801: Botscout BOT IPs"
Since this file contains objects before the open [, I use it as an anchor to start parsing the data I want to see. I added the column (:) separator between the IP and the data source.
This second example is with mal_url.json which contain know malware URL location. Before parsing this file, here is how the file starts:
cat mal_url.json | jq '.objects[].value + ": " + .objects[].source + ": " + .objects[].threat_type' | sort | uniq
"http://103.82.81.37:42595/Mozi.a: URLHaus: malware"
"http://103.84.240.226:45940/Mozi.m: URLHaus: malware"
"http://110.178.73.97:34004/Mozi.m: URLHaus: malware"
Using this test file available here, it contains several records that can be used to manipulate JSON data. Using wget, download the file to a Linux workstation [2] and ensure that jq is already installed (i.e. CentOS: yum -y install jq). Next take a quick look at the raw file using a Linux command of your choice (less, more, cat, etc) before parsing some of the data with jq. To view the data properly formatted and readable, use this command:
cat large-file.json | jq | more
Manipulate the data to get a list of actors with the current information, run this command:
cat large-file.json | jq '.[].actor' | more
To get just the list of actor login information, add .login to .actor:
cat large-file.json | jq '.[].actor.login' | more
What are some of your favorite tools to manipulate JSON data?
[1] https://stedolan.github.io/jq/manual/
[2] https://github.com/json-iterator/test-data/raw/master/large-file.json
[3] https://gchq.github.io/CyberChef/
[4] http://iplists.firehol.org/?ipset=botscout
-----------
Guy Bruneau IPSS Inc.
My Handler Page
Twitter: GuyBruneau
gbruneau at isc dot sans dot edu
Comments