32 or 64 bits Malware?
Last week, I was teaching FOR610 in Amsterdam. When we review ASM, we have a module about the difference in 32-bits VS. 64-bits code (how parameters are passed to functions/API calls, calling convention, etc). It's important to have an understanding of this because most computers are build around a 64-bits CPU today. But attackers are still deploying a lot of 32-bits malware for compatibility reasons and also because this code can be run without (if you respect Microsoft guidelines and API's) problems. A student asked me if there was a lot of native 64-bits malware in the wild. Is there a real trend? I decided to have a look at a bunch of samples and see practically if this trend was real.
The problem is to get enough samples. I've my own "malware zoo" but it's pretty small. You can try to get samples from major players like VirusTotal but your API quotas won't probably allow you to download a lot of samples. I decided to have a look at free resources (but still trusted). My choice was to use MalwareBazaar[1]. I like this service provided by abuse.ch. They allow to download samples for free and report also some interesting stats based on YARA rules[2].
I downloaded all daily archives from Feb 27 2020 until last week (217GB of zip archives). To detect if a PE file is 32-bits or 64-bits code, you just check a few bytes at the beginning of the file:
00000000: 4d5a 9000 0300 0000 0400 0000 ffff 0000 MZ..............
00000010: b800 0000 0000 0000 4000 0000 0000 0000 ........@.......
00000020: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000030: 0000 0000 0000 0000 0000 0000 8000 0000 ................
00000040: 0e1f ba0e 00b4 09cd 21b8 014c cd21 5468 ........!..L.!Th
00000050: 6973 2070 726f 6772 616d 2063 616e 6e6f is program canno
00000060: 7420 6265 2072 756e 2069 6e20 444f 5320 t be run in DOS
00000070: 6d6f 6465 2e0d 0d0a 2400 0000 0000 0000 mode....$.......
00000080: 5045 0000 4c01 0900 8406 f862 0000 0000 PE..L......b....
00000090: 0000 0000 e000 2e03 0b01 0223 0004 ac00 ...........#....
000000a0: 005a e900 0008 0000 b014 0000 0010 0000 .Z..............
000000b0: 0020 ac00 0000 4000 0010 0000 0002 0000 . ....@.........
000000c0: 0400 0000 0100 0000 0400 0000 0000 0000 ................
000000d0: 00c0 e900 0004 0000 179d e900 0200 4001 ..............@.
If you read "PE..L", it's a 32-bits sample, if it's "PE..d". I wrote a quick and dirty YARA rule to match these sequences of bytes:
rule pe32bits { meta: description = "Match a 32-bits PE" strings: $a = {50 45 00 00 4c} condition: $a in (0..500) } rule pe64bits { meta: description = "Match a 64-bits PE" strings: $a = {50 45 00 00 64} condition: $a in (0..500) }
Because I had a lot of ZIP archives to process and to not use too much storage, I used Python to process all files from ZIP archives and use the YARA rule against them. I focussed only on ".exe" and ".dll" files:
#!/usr/bin/python3 import datetime import glob import re import yara from zipfile import ZipFile rules = yara.compile(filepath='3264.yar') print("data,file,arch") zipList = glob.glob('*.zip') for zip in zipList: day = datetime.datetime.strptime(zip.split(".")[0], '%Y-%m-%d').strftime("%d/%m/%Y %H:%M:%S") with ZipFile(zip, 'r') as zipObj: zipObj.setpassword(b"infected") files = zipObj.infolist() for f in files: if re.match(r'[0-9]+.*\.(exe|dll)', f.filename): with zipObj.open(f.filename,mode='r') as fdata: matches = rules.match(data=fdata.read()) if len(matches) > 0: print("%s,%s,%s" % (day, f.filename, matches[0]))
Let's have a look at the results. I loaded the CSV file in my Splunk.
- 175.962 samples have been inspected (only EXE & DLL files)
- 10.952 were detected as 64-bits code (6.224%)
- Only 1 DLL was detected as 64-bits code (HASH:86150c570e2d253d54fd5f70c9fe62ff37897dc3a7b21658fa891263a843790d)
If we check on a timeline, we have a small trend:
I've no idea about the peak of samples submitted in November 2021 but we see that, especially the last months, they are more and more 64-bits samples in the wild. Can we rely on these statistics? Samples downloaded from MalwareBazaar are only the visible part of the iceberg but, as it became popular, many security researchers use it. If you have other statistics, please share with us!
[1] https://bazaar.abuse.ch
[2] https://bazaar.abuse.ch/export/json/yara-stats/
Xavier Mertens (@xme)
Xameco
Senior ISC Handler - Freelance Cyber Security Consultant
PGP Key
Reverse-Engineering Malware: Advanced Code Analysis | Singapore | Nov 18th - Nov 22nd 2024 |
Comments