YARA Rule for OOXML Maldocs: Less False Positives

Published: 2021-11-23
Last Updated: 2021-11-23 16:59:32 UTC
by Didier Stevens (Version: 1)
0 comment(s)

In this diary entry, I introduce an updated version of the YARA rule I presented in diary entry "Simple YARA Rules for Office Maldocs" for OOXML files with VBA code. Here is the OOXML YARA rule I presented yesterday:

rule pkvba {
        $vbaprojectbin = "vbaProject.bin"
        uint32be(0) == 0x504B0304 and $vbaprojectbin

This rule will generate false positives, if it finds instances of string "vbaProject.bin" that are not a filename.

To improve this rule (generate less false positives), I will add clauses to check that the instances of string "vbaProject.bin" are found inside a PKZIP file record, and correspond to the filename field.

Here is an updated version of the rule:

rule pkvbare {
        $vbaprojectbin = /[a-zA-Z\/]*\/?vbaProject\.bin/
        uint32be(0) == 0x504B0304 and
        $vbaprojectbin and
        for any i in (1..#vbaprojectbin): ((uint32be(@vbaprojectbin[i] - 30) == 0x504B0304) and
                                           (!vbaprojectbin[i] == uint16(@vbaprojectbin[i] - 4))

In this updated rule, I use a regular expression (/[a-zA-Z\/]*\/?vbaProject\.bin/) to find filename vbaProject.bin. That's because the full filename is preceded by a path, and that path differs per type of Office document. For example, inside Word documents, that filename is "word/vbaProject.bin":

30 bytes before string "word/vbaProject.bin", one will find the header of the PKZIP file record:

The header of a PKZIP file record starts with magic sequence "50 4B 03 04".

I check this with the folowwing clause in my YARA rule:

(uint32be(@vbaprojectbin[i] - 30) == 0x504B0304)

Since more than one instance of $vbaprojectbin can be found, I need to tests all instances, to find one that fullfills all the conditions. I do this with a for expression:

for any i in (1..#vbaprojectbin): (...)

#vbaprojectbin is the number of instances (#) found.

i is an index (integer) that varies between 1 and the number of found instances.

@vbaprojectbin[i] represents the position of the found instance with index number i. Subtracting 30 from that position, brings me to the start of the PKZIP file record header. I check that this is indeed the case, by comparing with the magic sequence:

(uint32be(@vbaprojectbin[i] - 30) == 0x504B0304)

Another test I perform in this rule: I check if the length of the found instance of string vbaprojectbin corresponds to the integer that is stored inside the filenamelength field of a PKZIP file record. That field is 4 bytes in front of the filename:

!vbaprojectbin[i] represents the length of the found instance with index number i.

This length is compared with the 16-bit little-endian integer, found inside the length field of the PKZIP file record: that is 4 bytes in front of the filename:

!vbaprojectbin[i] == uint16(@vbaprojectbin[i] - 4)

When all these clauses are true for at least one instance of string $vbaprojectbin, then it's very likely that a PKZIP file record was found with a filename like */vbaProject.bin. I try to decrease the number of false positives by performing more tests.


Didier Stevens
Senior handler
Microsoft MVP

Keywords: maldoc office yara
0 comment(s)
ISC Stormcast For Tuesday, November 23rd, 2021 https://isc.sans.edu/podcastdetail.html?id=7768


Diary Archives