Malicious RTF Files

Published: 2016-07-29
Last Updated: 2016-07-29 13:35:02 UTC
by Didier Stevens (Version: 1)
5 comment(s)

About a year ago I received RTF samples that I could not analyze with RTFScan or rtfobj (FYI: Philippe Lagadec has improved rtfobj.py significantly since then). So I started to write my own RTF analysis tool (rtfdump), but I was not satisfied enough with the way I presented the analysis result to warrant a release of my tool. Last week, I started analyzing new samples and updating my tool. I released it, and show how I analyze sample 07884483f95ae891845caf0d50ce507f in this diary entry.


This sample is an heavily obfuscated RTF file. RTF files are essentially sets of nested strings that start with { and end with }. Like this (strongly simplified):

{\rtf {data {more data}}}.

Malicious RTF files contain a payload. Objects in RTF files are embedded in hexadecimal, like this (strongly simplified):
{\rtf {data
{\*\objdata
01050000
02000000
08000000
46696C656E616D6500000000000000...
}}}

Malicious RTF files obfuscate the hexadecimal data in many ways, one of them is to put extra control strings inside the hexadecimal data, like this:
{\rtf {data
{\*\objdata
01050000
02000000
08000000
46696C656E61{\obj}6D6500000000000000...
}}}

The sample I analyzed takes this to the extreme. After each hexadecimal digit, extra control strings and whitespace are inserted:


(I removed a lot of whitespace to be able to put several hexadecimal digits on the screen).
The hexadecimal digits (highlighted in red) are 01050…

My tool outputs a line of analysis data for each nested string. In this sample, because of the obfuscation, there are a lot of them (22956, which is gigantic for an RTF file).

But you can reduce the output by filtering for entries that (potentially) contain an embedded object using option -f O:

Entry 165 is the one we will take a closer look at first. The information presented for entry 165 is the following: the nesting level is 4, it has 1 child (c=), starts at position 2ae5 in the file (p=), is 1194952 bytes long (l=), has 11429 hexadecimal digits (h=), has no \bin entries (b=), contains an embedded object (O), has 1 unknown character (u=) and is named \*\objdata133765.

We can select entry 165 for closer analysis:

I highlighted the hexademical digits in red.

To decode the hexadecimal data, we use option -H:

You can see the hex data clearly now: 01 05 00 ...

Since this is an embedded object, we use option -i to get more info on the object:

From the magic header, we see that the embedded object is an OLE file (FYI: if we analyze it with oledump, we get parsing errors).

Looking further into the data (-H), we see stream entries in the output:

And a bit further, we even find a URL:

Taking a closer look, I don't only see a URL, but hex data that looks like shellcode.

We can select this shellcode by cutting if out of the stream (option -c):

And of course also dump it to a file (option -d), so that we scan analyze it with the shellcode analyzer from libemu:

So this RTF file is a downloader.

The presence of shellcode in an RTF file is often an indication of an exploit. rtfdump supports YARA (like many of my *dump tools):

The first YARA search doesn't find anything. But the second search with option -H (to decode the hexadecimal content to binary) has hits for my RTF_ListView2_CLSID YARA rule. This indicates that entry 165 contains a byte sequence for the ListView2 classid, so this is very likely an exploit for vulnerability CVE-2012-0158 in this ListView.

The set of samples I looked at last week are characterized by the following properties:

they start with {\rtfMETAX

they end with this:

If you have interesting tools or techniques to analyze RTF files, please post a comment.

Didier Stevens
Microsoft MVP Consumer Security
blog.DidierStevens.com DidierStevensLabs.com

Keywords: maldoc rtf
5 comment(s)

Comments

Hi Didier, nice article!

I also recently rewrote most of rtfobj to be able to parse the exact same samples. For this, I had to implement a full-blown RTF parser, that handles edge cases the same way as MS Word does (because the RTF specifications do not cover all the special situations exploited by these samples).

The new development version of rtfobj is able to extract the embedded object (Word document) completely: https://github.com/decalage2/oletools/archive/master.zip

Philippe.
Thanks a lot for your update and hard work Philippe.
Great work , and great article.

Only feedback I would add is

It looks like you need Python installed to run these.

Python is used pretty commonly however I haven't needed it yet,
but I use Perl quite frequently, if anyone encounters a version
without python
Do you program in Perl or do you just use Perl?

If you don't want to install Python, you can convert Python programs to executables with PyInstaller: https://isc.sans.edu/forums/diary/Python+Malware+Part+1/21057/

Or you can use portable Python.
Thank you!

Diary Archives