YARA Rule for OOXML Maldocs: Less False Positives
In this diary entry, I introduce an updated version of the YARA rule I presented in diary entry "Simple YARA Rules for Office Maldocs" for OOXML files with VBA code. Here is the OOXML YARA rule I presented yesterday:
rule pkvba {
strings:
$vbaprojectbin = "vbaProject.bin"
condition:
uint32be(0) == 0x504B0304 and $vbaprojectbin
}
This rule will generate false positives, if it finds instances of string "vbaProject.bin" that are not a filename.
To improve this rule (generate less false positives), I will add clauses to check that the instances of string "vbaProject.bin" are found inside a PKZIP file record, and correspond to the filename field.
Here is an updated version of the rule:
rule pkvbare {
strings:
$vbaprojectbin = /[a-zA-Z\/]*\/?vbaProject\.bin/
condition:
uint32be(0) == 0x504B0304 and
$vbaprojectbin and
for any i in (1..#vbaprojectbin): ((uint32be(@vbaprojectbin[i] - 30) == 0x504B0304) and
(!vbaprojectbin[i] == uint16(@vbaprojectbin[i] - 4))
)
}
In this updated rule, I use a regular expression (/[a-zA-Z\/]*\/?vbaProject\.bin/) to find filename vbaProject.bin. That's because the full filename is preceded by a path, and that path differs per type of Office document. For example, inside Word documents, that filename is "word/vbaProject.bin":
30 bytes before string "word/vbaProject.bin", one will find the header of the PKZIP file record:
The header of a PKZIP file record starts with magic sequence "50 4B 03 04".
I check this with the folowwing clause in my YARA rule:
(uint32be(@vbaprojectbin[i] - 30) == 0x504B0304)
Since more than one instance of $vbaprojectbin can be found, I need to tests all instances, to find one that fullfills all the conditions. I do this with a for expression:
for any i in (1..#vbaprojectbin): (...)
#vbaprojectbin is the number of instances (#) found.
i is an index (integer) that varies between 1 and the number of found instances.
@vbaprojectbin[i] represents the position of the found instance with index number i. Subtracting 30 from that position, brings me to the start of the PKZIP file record header. I check that this is indeed the case, by comparing with the magic sequence:
(uint32be(@vbaprojectbin[i] - 30) == 0x504B0304)
Another test I perform in this rule: I check if the length of the found instance of string vbaprojectbin corresponds to the integer that is stored inside the filenamelength field of a PKZIP file record. That field is 4 bytes in front of the filename:
!vbaprojectbin[i] represents the length of the found instance with index number i.
This length is compared with the 16-bit little-endian integer, found inside the length field of the PKZIP file record: that is 4 bytes in front of the filename:
!vbaprojectbin[i] == uint16(@vbaprojectbin[i] - 4)
When all these clauses are true for at least one instance of string $vbaprojectbin, then it's very likely that a PKZIP file record was found with a filename like */vbaProject.bin. I try to decrease the number of false positives by performing more tests.
Didier Stevens
Senior handler
Microsoft MVP
blog.DidierStevens.com
Comments