Dealing with obfuscated RTF files

Published: 2017-12-25. Last Updated: 2017-12-25 23:20:00 UTC
by Didier Stevens (Version: 1)

I see a lot of malicious RTF files that are heavily obfuscated. Last, I received a sample that rtfobj or rtfdump could not handle properly to correctly identify OLE objects ("Not a well-formed OLE object"). But my rtfdump tool has an option that can help decode objects that are not well-formed. Let's take a closer look.

rtfdump does not identify OLE objects in this sample, however, the h= indicator tells us that there are a lot of hexadecimal characters.

Let's take a closer look at the first sequence with hexadecimal strings (sequence 4 is the first, inner most nested sequence with 6989 hexadecimal characters and an hexadecimal string of 252 characters):

It looks like an embedded object, however the control word is \.\objdata while we expect \*\objdata. And it contains an obfuscated hexadecimal string that indicates the presence of an OLE file: d0cf11e0a1b11ae1.

So let's convert this hexadecimal data to binary with option -H:

First we see the string "Equation.3", so this could be an exploit for CVE-2017-11882 (the equation editor vulnerability).

Next we see that rtfdump.py was not able to deobfuscate the hexadecimal string correctly: da 0c f1 1e 0a 1b 11 ae 10. There is one extra nibble (a), which shifts all subsequent bytes by 4 bits.

Using option -S, we can try to fix this, by shifting the byte sequence by 4 extra bits, making the total shift by 8 bits, e.g. one byte: