Recently, when the targeted attack with malicious RTF attachments was making the rounds, I wondered how to best get the embedded EXE extracted from the RTF for further analysis. On a Windows system, you would most likely simply copy/paste the embedded object from within RTF to an Explorer window, and end up with the original file. Since I do my malware analysis on Unix, this wasn't an option. Looking at the file, it appeared as if RTF was using some sort of hexadecimal encoding: Now, as a command line Perl addict, hex is something I know how to deal with :-). $cat detail.rtf | sed -e '1,3d' | perl -ne 's/(..)/print chr(hex($1))/ge' > detail.bin 00000000 02 00 00 00 08 00 00 00 50 61 63 6b 61 67 65 00 |........Package.| Sweet, we get something printable! The “sed” command deletes the first three lines, because they don't contain hex and would confuse the Perl statement that follows. The Perl code eats up two digits at once and converts them to the corresponding ASCII character, iterating through the entire file. I'm using “perl -ne” combined with “print” instead of “perl -pe” because the former makes it easier to ignore the pesky CR/LF line end markers that make Windows text so annoying on Unix. The output gets piped into “hexdump -C”, because we expect this content to be an embedded EXE file, and thus it likely contains a lot of non-printable characters that would not be fun to look at in “vi” or “more”. A bit further down in the output, there was indeed the tell tale “MZ” marker of the beginning of a MSDOS PE header. 00000170 6c 20 63 6f 6e 74 65 6e 74 2e 73 63 72 00 00 e0 |l content.scr..à| Easy, I thought. Let's carve out the file beginning with the MZ and we should have the EXE: $ dd if=detail.bin of=detail.exe bs=1 skip=386 “if” and “of” are the input and output files of the “dd” command. “bs=1” sets the step size to one byte, and “skip”, well, skips the given number of bytes at the beginning of the file. 386 is the decimal equivalent of 0x182, the offset of MZ visible in the hexdump above. While the “file” command confirmed that I had indeed carved out an executable, something was wrong – the file didn't want to run in the emulator, and when I uploaded it to threatexpert.com, their service called it “invalid”. I quickly figured out that the RTF has a lot of crud at the end as well, which also needs to be cut off, but I still couldn't reliably determine the correct length, and hence didn't know where the last byte of the embedded executable was. Well, time for the malware reverse engineering equivalent of the “known plaintext attack”. I used a Windows PC to embed a copy of notepad.exe into an otherwise empty RTF document of my own, and then went about analyzing this RTF until I was able to carve out the original notepad.exe. The main “AHA!” moment was when I realized that the bytes between the filename and the “MZ” header actually are the length of the embedded file. If we use our hexdump from before 00000170 6c 20 63 6f 6e 74 65 6e 74 2e 73 63 72 00 00 e0 |l content.scr..à| the length of the file in this case is 0x00E000, which is 57344 in decimal. Back to the sample: $ dd if=detail.exe of=detail-fixed.exe bs=1 count=57344 This time, the emulator, ThreatExpert and VirusTotal were all happy with the file, and while anti-virus coverage at the time was poor, the EXE/SCR embedded within the RTF attachment was quickly confirmed as unfriendly.
|
Daniel 375 Posts ISC Handler Jul 2nd 2009 |
Thread locked Subscribe |
Jul 2nd 2009 1 decade ago |
Sign Up for Free or Log In to start participating in the conversation!