Malicious RTF Files
About a year ago I received RTF samples that I could not analyze with RTFScan or rtfobj (FYI: Philippe Lagadec has improved rtfobj.py significantly since then). So I started to write my own RTF analysis tool (rtfdump), but I was not satisfied enough with the way I presented the analysis result to warrant a release of my tool. Last week, I started analyzing new samples and updating my tool. I released it, and show how I analyze sample 07884483f95ae891845caf0d50ce507f in this diary entry.
This sample is an heavily obfuscated RTF file. RTF files are essentially sets of nested strings that start with { and end with }. Like this (strongly simplified):
{\rtf {data {more data}}}.
Malicious RTF files contain a payload. Objects in RTF files are embedded in hexadecimal, like this (strongly simplified):
{\rtf {data
{\*\objdata
01050000
02000000
08000000
46696C656E616D6500000000000000...
}}}
Malicious RTF files obfuscate the hexadecimal data in many ways, one of them is to put extra control strings inside the hexadecimal data, like this:
{\rtf {data
{\*\objdata
01050000
02000000
08000000
46696C656E61{\obj}6D6500000000000000...
}}}
The sample I analyzed takes this to the extreme. After each hexadecimal digit, extra control strings and whitespace are inserted:
(I removed a lot of whitespace to be able to put several hexadecimal digits on the screen).
The hexadecimal digits (highlighted in red) are 01050…
My tool outputs a line of analysis data for each nested string. In this sample, because of the obfuscation, there are a lot of them (22956, which is gigantic for an RTF file).
But you can reduce the output by filtering for entries that (potentially) contain an embedded object using option -f O:
Entry 165 is the one we will take a closer look at first. The information presented for entry 165 is the following: the nesting level is 4, it has 1 child (c=), starts at position 2ae5 in the file (p=), is 1194952 bytes long (l=), has 11429 hexadecimal digits (h=), has no \bin entries (b=), contains an embedded object (O), has 1 unknown character (u=) and is named \*\objdata133765.
We can select entry 165 for closer analysis:
I highlighted the hexademical digits in red.
To decode the hexadecimal data, we use option -H:
You can see the hex data clearly now: 01 05 00 ...
Since this is an embedded object, we use option -i to get more info on the object:
From the magic header, we see that the embedded object is an OLE file (FYI: if we analyze it with oledump, we get parsing errors).
Looking further into the data (-H), we see stream entries in the output:
And a bit further, we even find a URL:
Taking a closer look, I don't only see a URL, but hex data that looks like shellcode.
We can select this shellcode by cutting if out of the stream (option -c):
And of course also dump it to a file (option -d), so that we scan analyze it with the shellcode analyzer from libemu:
So this RTF file is a downloader.
The presence of shellcode in an RTF file is often an indication of an exploit. rtfdump supports YARA (like many of my *dump tools):
The first YARA search doesn't find anything. But the second search with option -H (to decode the hexadecimal content to binary) has hits for my RTF_ListView2_CLSID YARA rule. This indicates that entry 165 contains a byte sequence for the ListView2 classid, so this is very likely an exploit for vulnerability CVE-2012-0158 in this ListView.
The set of samples I looked at last week are characterized by the following properties:
they start with {\rtfMETAX
they end with this:
If you have interesting tools or techniques to analyze RTF files, please post a comment.
Didier Stevens
Microsoft MVP Consumer Security
blog.DidierStevens.com DidierStevensLabs.com
Comments
www
Nov 17th 2022
6 months ago
EEW
Nov 17th 2022
6 months ago
qwq
Nov 17th 2022
6 months ago
mashood
Nov 17th 2022
6 months ago
isc.sans.edu
Nov 23rd 2022
6 months ago
isc.sans.edu
Nov 23rd 2022
6 months ago
isc.sans.edu
Dec 3rd 2022
6 months ago
isc.sans.edu
Dec 3rd 2022
6 months ago
<a hreaf="https://technolytical.com/">the social network</a> is described as follows because they respect your privacy and keep your data secure. The social networks are not interested in collecting data about you. They don't care about what you're doing, or what you like. They don't want to know who you talk to, or where you go.
<a hreaf="https://technolytical.com/">the social network</a> is not interested in collecting data about you. They don't care about what you're doing, or what you like. They don't want to know who you talk to, or where you go. The social networks only collect the minimum amount of information required for the service that they provide. Your personal information is kept private, and is never shared with other companies without your permission
isc.sans.edu
Dec 26th 2022
5 months ago
isc.sans.edu
Dec 26th 2022
5 months ago