Enrichment Data: Keeping it Fresh
I like to enrich my honeypot data from a variety of sources to help understand a bit more about the context of the attack. This includes the types of networks the attacks are coming from or whether malware submitted to a honeypot is new. I use a variety of sources to enrich my cowrie data using cowrieprocessor [1]:
- Internet Storm Center / DShield API [2]
- URLhaus [3]
- SPUR [4]
- VirusTotal [5]
I was curious how often the data changed and how "fresh" the data needs to be in order to be accurate. In a previous diary I went into details about VirusTotal data and vendor comparisons [6].
Data Collection
Data was pulled from the above sources using my cowrieprocessor script [1]. The script keeps a local copy of most enrichment data, which means I can always go through the JSON files at a later date to extract different information. The data I have goes back as far as May 2022. My honeypots schedule this data enrichment to happen once a day for attacks that happened the previous day. This means a minimum gap of time of 24 hours between enrichment data queries. This process was scheduled to run more frequently in 2022 and 2023 and may have a smaller gap between enrichment queries of 6-12 hours.
VirusTotal Data
I extracted the following fields for comparison:
- ID (file hash)
- Malicious (number of vendors/engines that label the file as malicious)
- Suspicious (number of vendors that label the file as suspicious)
- Undetected (number of vendors that did not have any detection)
- Harmless (number of vendors that label the file as harmless)
- Timeout (number of vendors that had a timeout)
- Confirmed-timeout (number of vendors that confirmed the timeout)
- Failure (number of vendors where a failure was reported)
- Type-unsupported (number of vendors that did not support the indicator type)
- Type_tag (type of file)
- Type_description (type description)
More details about the VirusTotal data fields can be found in their documentation [7]. The data was reviewed to look for hashes that showed a wide range of total "malicious" indicators as determined by different products.
Date | Time | Hash | Malicious | Suspicious | Type Description |
---|---|---|---|---|---|
12/29/2023 | 120001 | 062ba629c7b2b914b289c8da0573c179fe86f2cb1f70a31f9a1400d563c3042a | 1 | 0 | ELF |
12/29/2023 | 180002 | 062ba629c7b2b914b289c8da0573c179fe86f2cb1f70a31f9a1400d563c3042a | 1 | 0 | ELF |
12/30/2023 | 003001 | 062ba629c7b2b914b289c8da0573c179fe86f2cb1f70a31f9a1400d563c3042a | 1 | 0 | ELF |
3/3/2024 | 003001 | 062ba629c7b2b914b289c8da0573c179fe86f2cb1f70a31f9a1400d563c3042a | 2 | 0 | ELF |
4/21/2024 | 003002 | 062ba629c7b2b914b289c8da0573c179fe86f2cb1f70a31f9a1400d563c3042a | 2 | 0 | ELF |
7/18/2024 | 003001 | 062ba629c7b2b914b289c8da0573c179fe86f2cb1f70a31f9a1400d563c3042a | 22 | 0 | ELF |
8/10/2024 | 003002 | 062ba629c7b2b914b289c8da0573c179fe86f2cb1f70a31f9a1400d563c3042a | 26 | 0 | ELF |
8/13/2024 | 003002 | 062ba629c7b2b914b289c8da0573c179fe86f2cb1f70a31f9a1400d563c3042a | 25 | 0 | ELF |
8/15/2024 | 003003 | 062ba629c7b2b914b289c8da0573c179fe86f2cb1f70a31f9a1400d563c3042a | 25 | 0 | ELF |
Figure 1: VirusTotal results over time for hash 062ba629c7b2b914b289c8da0573c179fe86f2cb1f70a31f9a1400d563c3042a [8].
Date | Time | Hash | Malicious | Suspicious | Type Description |
---|---|---|---|---|---|
12/21/2023 | 180002 | 47b268c21591069bfe4099833ad66b8138a53ab2dcb866e040d466aee1f8624c | 1 | 0 | ELF |
12/22/2023 | 003002 | 47b268c21591069bfe4099833ad66b8138a53ab2dcb866e040d466aee1f8624c | 1 | 0 | ELF |
4/7/2024 | 003001 | 47b268c21591069bfe4099833ad66b8138a53ab2dcb866e040d466aee1f8624c | 2 | 0 | ELF |
7/31/2024 | 003002 | 47b268c21591069bfe4099833ad66b8138a53ab2dcb866e040d466aee1f8624c | 29 | 0 | ELF |
Figure 2: VirusTotal results over time for hash 47b268c21591069bfe4099833ad66b8138a53ab2dcb866e040d466aee1f8624c [9].
Date | Time | Hash | Malicious | Suspicious | Type Description |
---|---|---|---|---|---|
5/7/2023 | 060002 | 306f0c79ad9ee76e996556f909306fda5704b456d670aa9daeb54760b4b5e4f6 | 2 | 0 | ELF |
5/7/2023 | 120001 | 306f0c79ad9ee76e996556f909306fda5704b456d670aa9daeb54760b4b5e4f6 | 3 | 0 | ELF |
5/7/2023 | 180002 | 306f0c79ad9ee76e996556f909306fda5704b456d670aa9daeb54760b4b5e4f6 | 3 | 0 | ELF |
5/8/2023 | 003002 | 306f0c79ad9ee76e996556f909306fda5704b456d670aa9daeb54760b4b5e4f6 | 3 | 0 | ELF |
5/10/202 | 003001 | 306f0c79ad9ee76e996556f909306fda5704b456d670aa9daeb54760b4b5e4f6 | 24 | 0 | ELF |
Figure 3: VirusTotal results over time for hash 306f0c79ad9ee76e996556f909306fda5704b456d670aa9daeb54760b4b5e4f6 [10].
This demonstrates that VirusTotal data can take months to have a large increase in malicious hits. The hash from Figure 3 was first submitted on March 10, 2023, so even though it looks like a very quick change in my sample of data, this was approximatey two months from the initial submission. Even if the data looks stable, it may have a dramatic change.
URLhaus Data
URLHaus can be a good location of malicious URLs that may be used in phishing or other attacks, such as those seen in Cowrie honeypots. The data was reviewed to look for IP addresses that had a reported URL count change over time. In figure 4, the URL count increased by approximately 1 URL a day until it increased more dramatically between 11/8/2022 and 11/11/2022.
Figure 4: URLhaus reported URL changes over time for 179.43.175.5.
In figure 5, the IP address URL count almost doubled in a couple of days.
Figure 5: URLhaus reported URL changes over time for 193.42.33.81.
SPUR Data
The data compared was retrieved from SPUR, but this kind of WHOIS data is available from a variety of sources. First, I wanted to take a look at how many differences were seen in the registration data by IP address. I limited the information compared to the IP address, organization and location information.
Figure 6: Breakdown of IP addresses and how many unique sets of data were seen per IP address.
Over 3/4 of the IP addresses didn't have any change in the information reported. For the most part, the data doesn't change often. However, there were several IP addresses that had multiple changes. In figure 7, there were changes on average about once a month for the location.
Figure 7: IP Address 201.186.40.250 showing changes in geographic regions over time.
In figure 8, the organization changed every couple of months between March and July of 2024. It may have changed more frequently, but was not recorded by my honeypot.
Figure 8: IP Address 101.32.114.105 showing changes in organization name over time.
The raw WHOIS information for 101.32.114.105 contains information that refers to both organizations listed from the SPUR data.
% Information related to '101.32.112.0 - 101.32.175.255'
% Abuse contact for '101.32.112.0 - 101.32.175.255' is 'qcloud_net_duty@tencent.com'
inetnum: 101.32.112.0 - 101.32.175.255
netname: ACEVILLEPTELTD-SG
descr: 16 COLLYER QUAY
country: SG
admin-c: APA7-AP
tech-c: APA7-AP
abuse-c: AA1875-AP
status: ALLOCATED NON-PORTABLE
mnt-by: MAINT-ACEVILLEPTELTD-SG
mnt-irt: IRT-ACEVILLEPTELTD-SG
last-modified: 2022-02-16T17:35:17Z
source: APNIC
irt: IRT-ACEVILLEPTELTD-SG
address: 16 COLLYER QUAY, # 18-29, INCOME AT RAFFLES, SINGAPORE
e-mail: qcloud_net_duty@tencent.com
abuse-mailbox: qcloud_net_duty@tencent.com
admin-c: APA7-AP
tech-c: APA7-AP
auth: # Filtered
remarks: qcloud_net_duty@tencent.com is invalid
mnt-by: MAINT-ACEVILLEPTELTD-SG
last-modified: 2024-05-22T13:07:48Z
source: APNIC
role: ABUSE ACEVILLEPTELTDSG
address: 16 COLLYER QUAY, # 18-29, INCOME AT RAFFLES, SINGAPORE
country: ZZ
phone: +000000000
e-mail: qcloud_net_duty@tencent.com
admin-c: APA7-AP
tech-c: APA7-AP
nic-hdl: AA1875-AP
remarks: Generated from irt object IRT-ACEVILLEPTELTD-SG
remarks: qcloud_net_duty@tencent.com is invalid
abuse-mailbox: qcloud_net_duty@tencent.com
mnt-by: APNIC-ABUSE
last-modified: 2024-05-22T13:08:48Z
source: APNIC
role: ACEVILLE PTELTD administrator
address: 16 COLLYER QUAY, #18-29, INCOME AT RAFFLES, SINGAPORE
country: SG
phone: +8613923479936
fax-no: +8613923479936
e-mail: qcloud_net_duty@tencent.com
admin-c: APA7-AP
tech-c: APA7-AP
nic-hdl: APA7-AP
mnt-by: MAINT-ACEVILLEPTELTD-SG
last-modified: 2023-03-17T12:36:41Z
source: APNIC
Regardless of where you get your enrichment data, it will change over time. Get updated information when you can and use multiple sources of enrichment data for comparison.
[1] https://github.com/jslagrew/cowrieprocessor
[2] https://isc.sans.edu/api/
[3] https://urlhaus.abuse.ch/
[4] https://spur.us/
[5] https://www.virustotal.com/
[6] https://isc.sans.edu/diary/VirusTotal+Result+Comparisons+for+Honeypot+Malware/29040
[7] https://github.com/demisto/content/blob/master/Packs/VirusTotal/Integrations/VirusTotalV3/README.md
[8] https://www.virustotal.com/gui/file/062ba629c7b2b914b289c8da0573c179fe86f2cb1f70a31f9a1400d563c3042a
[9] https://www.virustotal.com/gui/file/47b268c21591069bfe4099833ad66b8138a53ab2dcb866e040d466aee1f8624c
[10] https://www.virustotal.com/gui/file/306f0c79ad9ee76e996556f909306fda5704b456d670aa9daeb54760b4b5e4f6
[11] https://bgpview.io/prefix/101.32.114.0/23#whois
--
Jesse La Grew
Handler
Comments