Binary Breadcrumbs: Correlating Malware Samples with Honeypot Logs Using PowerShell [Guest Diary]
[This is a Guest Diary by David Hammond, an ISC intern as part of the SANS.edu BACS program]
My last college credit on my way to earning a bachelor's degree was an internship opportunity at the Internet Storm Center. A great opportunity, but one that required the care and feeding of a honeypot. The day it arrived I plugged the freshly imaged honeypot into my home router and happily went about my day. I didn’t think too much about it until the first attack observation was due. You see, I travel often, but my honeypot does not. Furthermore, the administrative side of the honeypot was only accessible through the internal network. I wasn’t about to implement a whole remote solution just to get access while on the road. Instead, I followed some very good advice. I started downloading regular backups of the honeypot logs on a Windows laptop I frequently had with me.
The internship program encouraged us to at least initially review our honeypot logs with command line utilities, such as jq and all its flexibility with filtering. Combined with other standard Unix-like operating system tools, such as wc (word count), less, head, and cut, it was possible to extract exactly what I was looking for. I initially tried using more graphical tools but found I enjoy "living" in the command line better. When I first start looking at logs, I was not always sure of what I’m looking for. Command line tools allow me to quickly look for outliers in the data. I can see what sticks out by negating everything that looks the same.
So, what’s the trouble? None of these tools were available on my Windows laptop. Admittedly, most of what I mention above are available for Windows, but my ability to install software was restricted on this machine, and I knew that native alternatives existed. At the time I had several directories of JSON logs, and a long list of malware hash values corresponding to an attack I was interested in understanding better. Here’s how a few lines of PowerShell can transform scattered honeypot logs into a clear picture of what really happened.
First, let’s start with the script in two parts. Here’s the PowerShell array containing malware hash values:
$hashes = @(
"00deea7003eef2f30f2c84d1497a42c1f375d802ddd17bde455d5fde2a63631f",
"0131d2f87f9bc151fb2701a570585ed4db636440392a357a54b8b29f2ba842da",
"01ba4719c80b6fe911b091a7c05124b64eeece964e09c058ef8f9805daca546b",
"0291de841b47fe19557c2c999ae131cd571eb61782a109b9ef5b4a4944b6e76d",
"02a95dae81a8dd3545ce370f8c0ee300beff428b959bd7cec2b35e8d1cd7024e",
"062ba629c7b2b914b289c8da0573c179fe86f2cb1f70a31f9a1400d563c3042a",
"0be1c3511c67ecb8421d0be951e858bb3169ac598d068bae3bc8451e883946cc",
"0cbd5117413a0cab8b22f567b5ec5ec63c81b2f0f58e8e87146ecf7aace2ec71",
"0d2d316bc4937a2608e8d9a3030a545973e739805c3449b466984b27598fcdec",
"0d58ee0cd46d5908f31ba415f2e006c1bb0215b0ecdc27dd2b3afa74799e17bd"
)
The $hashes = @( ) between quoted, comma-separated values, establishes a PowerShell array of strings which represents the hashes we want to search for. Now let’s look at how we put this array to use.
Get-ChildItem -Path "C:\Users\Dave\Logs" -Filter 'cowrie.json.*' -Recurse |
ForEach-Object {
$jsonContent = Get-Content $_.FullName
write-output $_.FullName
foreach ($hash in $hashes) {
$searchResults = $null
$searchResults = $jsonContent | Select-String $hash
if (![string]::IsNullOrEmpty($searchResults)) {
write-output $searchResults
}
}
}
Let's walk through the execution of the script. The first statement, Get-ChildItem, recurses every folder in the specified path (C:\Users\Dave\Logs\) and passes along all filenames that match the filter argument. Each filename is passed through the "pipe" (|) directly into the first ForEach-Object statement. You can see what’s passed by observing the output of the write-output $_.FullName line. The $_ is a variable designation which represents whatever is passed through the pipe. In this case, we know what kind of data to expect (a filename) so we can access it’s attribute, "FullName". This tells us the specific JSON log file currently being searched.
Now let’s get into the meat of the script. The main body of the script contains two nested For-Loops. The outer loop begins with the first "ForEach-Object" block of code. The inner loop is described by the lowercase "foreach" block. We already know the name of the JSON log we’ll be searching next, so the next line, $jsonContent = Get-Content $_.FullName sets that up to happen. It takes the content of the first filename passed to $_ though the pipe, reads the contents of that filename, and stores the text in a variable named $jsonContent. Now we’ve got our first log to search, all we have to do is run through the list of hash values to search for! This takes us to the point of the script where we reach the inner-loop. The foreach inner-loop is similar to the outer loop with the exception of how it processes data. The statement, foreach ($hash in $hashes) takes each hash value found in the $hashes array and puts a copy of it into $hash before executing the code block it contains.
When the inner-loop runs it does three things. First, $searchResults = $null empties the value of the $searchResults variable. This is also called "initializing" the variable, and it’s a good practice whenever you're working with loops that re-use the same variable names. Second, with the variable clear and ready to accept new values, the next line accomplishes a few things.
$searchResults = $jsonContent | Select-String $hash
Starting to the right of the equals sign, we’re passing the JSON log text $jsonContent into the command "Select-String" while also passing Select-String a single argument, $hash. Remember earlier when the lowercase foreach loop started, it takes each value found in the $hashes array and (one at a time) places their values into $hash before executing the block of code below it. So we’re passing the text in $jsonContent through another pipe to Select-String, which takes that text and searches for the value $hash within the contents of $jsonContent. The results of Search-String are then stored in the variable named $searchResults.
if (![string]::IsNullOrEmpty($searchResults)) {
write-output $searchResults
}
Third and finally, we have an if statement to determine whether the prior Select-String produced any results. If it found the $hash value it was looking for, the $searchResults variable will contain data. If not, it will remain empty ($null). The if statement makes that determination and prints the $searchResults it found. Note the ! at the beginning of the statement which tells it to evaluate as, "if not empty."

While compact in size, this script introduces the PowerShell newcomer to a variety of useful functions: traversing files and folders, retrieving text, searching text, and nested loops are all sophisticated techniques. If you save this script, you can adapt it in many ways whenever a quick solution is needed. Understanding the tools that are available to us in any environment and having practice adapting those tools to our circumstances makes us all better cybersecurity professionals.
[1] https://www.sans.edu/cyber-security-programs/bachelors-degree/
-----------
Guy Bruneau IPSS Inc.
My GitHub Page
Twitter: GuyBruneau
gbruneau at isc dot sans dot edu
Updates to Domainname API
For several years, we have offered a "new domain" list of recently registered (or, more accurately, recently discovered) domains. This list is offered via our API (https://isc.sans.edu/api). However, the size of the list has been causing issues, resulting in a "cut-off" list being returned. To resolve this issue, I updated the API call. It is sort of backward compatible, but it will not allow you to retrieve the full list. Additionally, we offer a simple "static file" containing the complete list. This file should be used whenever possible instead of the API.
To retrieve the full list, updated hourly, use:
We also offer past versions of this list for the last few days. For example:
I have not decided yet how long to keep these historic lists. The same data can be retrieved via the API request below. Likely, I will keep the last week as a "precompiled" list.
For the API, you may now retrieve partial copies of the list. The full URL for the API is:
https://isc.sans.edu/api/recentdomains/[date]/[searchstring]/[start]/[count]
For example:
https://isc.sans.edu/api/recentdomains/2025-11-05/sans/0/1000?json
Will return all domains found today (November 5th) that contain the string "sans". The first 1,000 matches are returned.
date: The date in "YYYY-MM-DD" format. The word "today" can be used instead of the current date if you only want the most recent data. The default is "today".
searchstring: only domains containing this string will be returned. Use "+" as a wildcard to get all domains. This defaults to returning any domain.
start: The number of the record to start with (defaults to 0)
count: How many records to return (defaults to all records)
In return, you will receive XML by default, but you may easily switch to other formats by adding, for example, "?json" to the end of the URL, which will return JSON.
The data returned remains the same:
{
"domainname": "applewood-artisans.com",
"ip": null,
"type": null,
"firstseen": "2025-11-04",
"score": 0,
"scorereason": "High entropy: 3.57 (+0.36)"
},
domainname: The domain name
ip: IPv4 address (if available)
type: currently not used
firstseen: Date the domain name was first seen
score: The "anomaly score"
scorereason: reason behind the score
One of the sources of this data is the Certificate Transparency logs. It is possible that we will see new certificates for older domains that have not yet made it into our list of "existing" domains. As a result, you will see some older domains listed as "new" because they were not previously included in our feeds.
Regarding all our data: Use it at your own risk. The data is provided on a best-effort basis at no cost. Commercial use is permitted as long as the data is attributed to us and not resold. We do not recommend using the data as a block list. Instead, use it to "add color to your logs". The data may provide some useful context for other data you collect.
Why do we have a somewhat unusual API, rather than a more standard-compliant REST, GraphQL, or even SOAP API? Well, the API predates these standards (except for SOAP... and do you really want me to use SOAP?). At one point, we may offer something closer to whatever the REST standard will look like at the time, but don't hold your breath; there are a few other projects I want to complete first.
Feedback and bug reports are always welcome.
--
Johannes B. Ullrich, Ph.D. , Dean of Research, SANS.edu
Twitter|

Comments