Hiding in White Text: Word Documents with Embedded Payloads

Published: 2016-07-06
Last Updated: 2016-07-11 17:31:25 UTC
by Johannes Ullrich (Version: 1)
1 comment(s)

This is a guest diary by Yaser Mansour. Due to the extensive use of images, please note that all the images are clickable to view them at full size. A PDF version of this diary is available here

Malicious macros in Office documents are not new, and several samples have been analyzed here at the ISC Diary website. Usually, the macro script is used to drop the second stage malware either by reaching to the internet or by extracting a binary embedded in the Office document itself. In this post, we will examine two similar malicious documents that were observed separately with each dropping a different malware sample, namely, NetWiredRC and iSpy.

There are several interesting facts about the samples we are going to analyze today:

  1. The macro embedded in the Office document does not reach to the internet. Instead, it extracts a binary embedded in the Word document itself in ASCII hex format and writes it to disk.
  2. Both malicious Word documents were observed separately. However, both use the same technique to extract the embedded binary as well as the same decoy message enticing the end user to enabled macros.
  3. During network forensics of the NetWiredRC malware, a new C&C command was observed which was not reported by [1][2]. This also resulted in a total of 9 custom Snort signatures being submitted and published in Snort Community Ruleset [3][4].
  4. The iSpy sample generated new network traffic patterns than what were observed previously. More about this in the following sections.

Brief History of NetWiredRC and iSpy Malware Samples

NetWiredRC RAT family has been extensively discussed by security researchers [1][2], and recently TALOS released Snort signatures to detect NetWiredRC over the network [3], and a new signature for the new NetWiredRC command[4].

iSpy was first observed by the author during January 2016 with the sample b33c5ba388f8a32006133cb8888a9370. This sample performed its C&C over HTTP as seen in the below screenshot and Snort signatures were released [4].

Picture 1

Other samples were observed during March and April 2016 (65ee535f0efcb30626ce5c8e7763e782 and cd3a43d3504925a396183b467b0980cb, respectively). Both of these samples also used HTTP for their C&C communication. One of the latest samples observed was extracted from the embedded payload in the Word document discussed in the remaining of this article. This recent sample performs its C&C over SMTP for initial and exfiltration communication as shown in the screenshots below.

Malicious Word Documents Analysis

Both malicious documents implement the same algorithm used to extract the embedded binary. While the focus will be on the malicious document embedding the NetWiredRC sample, we will attempt to provide side-by-side analysis of both malicious documents. Throughout the remaining of this post, comparative screenshots will have the first screenshot reflecting the analysis of the document dropping NetWiredRC while the second reflecting the iSpy sample.

Both samples documents have exactly the same decoy message enticing the unsuspicious user of disabling the protected view and enabling macros as seen in the below screenshot.

Interestingly each document consisted of a large number of “empty” pages with the only text visible as shown in the above screenshot. More accurately, the document consisted of 232. The second malicious Word document consisted of 528 pages.

The document was initially inspected for the presence of macros using oledump.py [5]. From the screenshots below, we can see that document indeed embeds a macro. The VBScript code was extracted using the oledump.py or olevba.py [6] can also be used. Both tools exist in the Remnux [7] image.


Closely inspecting the dumped and obfuscated VBScript, there were no indications that the script reaches to the internet. Going further through the code, there was an interesting function as shown in the below screenshot. Please note the highlighted variable and its data type as it plays a major role on how the embedded second stage binary is extracted from the Office document.

Continuing to inspect the script, it becomes apparent how the second stage binary is dropped to the local disk. The script leverages the Word Object Model [8] to access the Paragraph Object Members [9]. More about this later. On order to understand why the script would access paragraphs from the document itself, the document was opened while macros are disabled prevent the script from executing.

Scrolling through the document to inspect what these 232 pages contain showed only empty pages containing nothing, or did they? To verify, the document was zip extracted since it is an OOXML document. Extracting the document will also help in getting access to the internal structures of the document.  Once extracted, we end up with a set of XML files. We are interested in one file, in particular, the document.xml. This file contains the actual content of the document in XML representation.

An OOXML file contains element blocks representing the various content aspects of a document. For examples, how paragraphs in a word document are structured as XML [10]. In a nutshell, a paragraph is expressed with the XML element block . Text within a paragraph block is expressed by the XML element . Within the paragraph block, other XML elements may exist to represent things like formatting (). For more information about these elements, refer to the EMCA-376 Standard of the Office Open XML File Formats [11] and [12].

Back to our document.xml file, when viewed in a browser we notice that we have 24 paragraphs denoted by the element as we explained earlier. Inspecting and mapping the elements to the actual paragraphs in the document leads up to the fact that the last paragraph spans across the 232 pages. This paragraph not only contains text but also formats the text within the paragraph to be in white color (#FFFFFF) as a hiding technique from the viewer of the document.

So we have a VBScript that defines a variable as a paragraph and hidden paragraph that spans 232 pages. This indicates that the VBScript does in fact access the Word Object Model in order to reach to the paragraph. Inspecting the VBScript, we find evidence that the script indeed attempts to access the paragraphs objects and the text within which are included in the document. Specifically, the script has interest in paragraph number 24.

A bird’s-eye view of this segment of script suggests that the script loops through the paragraphs available within the document and the embedded text within until it reaches to paragraph 24. From its text, the script grabs 2 letters (or 1 sting hex byte, see below 2 screen shots), “un-hexifies” it o get the decimal/numerical representation using the Type Character “&H” (hS variable value) Hexadecimal Literal [13], and then hex xor it with hexadecimal key &HEE (0xEE) to produce hexadecimal bytes that serve a specific purpose. 

Let’s take the first two bytes from the below screenshot to test this logic. The first 2 string hex bytes are “A3 B4”. The below table breaks down the conversions performed by the script snippet above. Do you see anything familiar in the table? The two bytes “4D 5A” or “MZ” are the magic number of the DOS MZ executable.

Raw String Hex Byte Decimal Representation (&H) Decimal Representation (Xor 0xEE) Hex Representation
A3 163 77 0x4D
B4 180 90 0x5A

To automate this the above algorithm, the following python script was created. 

Once each byte is extracted and converted, it is written to disk byte-by-byte until there is no more text left in the paragraph. Afterward, the script calls a function passing the name of the just generated binary to execute it. The function uses the built-in function Shell() [14] to execute the second stage binary. The function is captured in the below screenshot.

To put everything together, the below screenshot represents the beautified and commented version of both functions discussed earlier.

The below screenshot represents the same “ParagraphRemove()” (beautified and commented ) from the second malicious Word, which dropped iSpy malware sample. An interesting note from both malicious Word documents is the “Startincex” misspell. 


[1] https://www.circl.lu/pub/tr-23/
[2] http://researchcenter.paloaltonetworks.com/2014/08/new-release-decrypting-netwire-c2-traffic/
[3] http://blog.snort.org/2016/03/snort-subscriber-rule-set-update-for_29.html
[4] http://blog.snort.org/2016/05/snort-subscriber-rule-set-update-for_31.html
[5] https://blog.didierstevens.com/programs/oledump-py/
[6] http://www.decalage.info/vba_tools
[7] https://remnux.org/
[8] https://msdn.microsoft.com/en-us/library/kw65a0we.aspx
[9] https://msdn.microsoft.com/en-us/library/office/ff839491.aspx
[10] http://officeopenxml.com/WPparagraph.php
[11] http://www.ecma-international.org/publications/standards/Ecma-376.htm
[12] https://msdn.microsoft.com/en-us/library/office/gg607163(v=office.14).aspx
[13] https://msdn.microsoft.com/en-us/library/s9cz43ek.aspx
[14] https://msdn.microsoft.com/en-us/library/xe736fyk(v=vs.90).aspx


1 comment(s)


Thanks for the write-up!

Apparently such documents were massively spammed in the Netherlands last week up to yesterday (the links seem dead now).

Variants were discussed (in Dutch = the language in Holland/the Netherlands) in
and in

The file "BT-32084.doc" [1] I analyzed contains the following string twice:
<w:color w:val="FFFFFF" w:themeColor="background1"/>

The encoded bytes begin with:
7E 69 A3 33 30 33 33 33
So I guess the XOR code must be 33, which results in:
4D 5A 90 00 03 00 00 00

I did not examine the VBA code.

[1] https://www.virustotal.com/en/file/3135956e280e61d44f42928899dcb3b0105fb35b53b33f85eaa41bc0ad879d83/analysis/1468214620/

Diary Archives