YARA Rule for OOXML Maldocs: Less False Positives

In this diary entry, I introduce an updated version of the YARA rule I presented in diary entry "Simple YARA Rules for Office Maldocs" for OOXML files with VBA code. Here is the OOXML YARA rule I presented yesterday:

rule pkvba {
        $vbaprojectbin = "vbaProject.bin"
        uint32be(0) == 0x504B0304 and $vbaprojectbin

This rule will generate false positives, if it finds instances of string "vbaProject.bin" that are not a filename.

To improve this rule (generate less false positives), I will add clauses to check that the instances of string "vbaProject.bin" are found inside a PKZIP file record, and correspond to the filename field.

Here is an updated version of the rule:

rule pkvbare {
        $vbaprojectbin = /[a-zA-Z\/]*\/?vbaProject\.bin/
        uint32be(0) == 0x504B0304 and
        $vbaprojectbin and
        for any i in (1..#vbaprojectbin): ((uint32be(@vbaprojectbin[i] - 30) == 0x504B0304) and
                                           (!vbaprojectbin[i] == uint16(@vbaprojectbin[i] - 4))

In this updated rule, I use a regular expression (/[a-zA-Z\/]*\/?vbaProject\.bin/) to find filename vbaProject.bin. That's because the full filename is preceded by a path, and that path differs per type of Office document. For example, inside Word documents, that filename is "word/vbaProject.bin":

30 bytes before string "word/vbaProject.bin", one will find the header of the PKZIP file record:

The header of a PKZIP file record starts with magic sequence "50 4B 03 04".

I check this with the folowwing clause in my YARA rule:

(uint32be(@vbaprojectbin[i] - 30) == 0x504B0304)

Since more than one instance of $vbaprojectbin can be found, I need to tests all instances, to find one that fullfills all the conditions. I do this with a for expression:

for any i in (1..#vbaprojectbin): (...)

#vbaprojectbin is the number of instances (#) found.

i is an index (integer) that varies between 1 and the number of found instances.

@vbaprojectbin[i] represents the position of the found instance with index number i. Subtracting 30 from that position, brings me to the start of the PKZIP file record header. I check that this is indeed the case, by comparing with the magic sequence:

(uint32be(@vbaprojectbin[i] - 30) == 0x504B0304)

Another test I perform in this rule: I check if the length of the found instance of string vbaprojectbin corresponds to the integer that is stored inside the filenamelength field of a PKZIP file record. That field is 4 bytes in front of the filename:

!vbaprojectbin[i] represents the length of the found instance with index number i.

This length is compared with the 16-bit little-endian integer, found inside the length field of the PKZIP file record: that is 4 bytes in front of the filename:

!vbaprojectbin[i] == uint16(@vbaprojectbin[i] - 4)

When all these clauses are true for at least one instance of string $vbaprojectbin, then it's very likely that a PKZIP file record was found with a filename like */vbaProject.bin. I try to decrease the number of false positives by performing more tests.


Didier Stevens
Senior handler
Microsoft MVP


677 Posts
ISC Handler
Nov 23rd 2021

Sign Up for Free or Log In to start participating in the conversation!