XML: A New Vector For An Old Trick

Published: 2015-03-05
Last Updated: 2015-03-05 18:53:53 UTC
by Didier Stevens (Version: 1)
3 comment(s)

October 2014 saw the beginning of an e-mail campaign spamming malicious Microsoft Office documents. Mostly Word documents using the “old” binary format, but sometimes Excel documents and sometimes the “new” ZIP/XML format. All with VBA macros that auto-execute when opened.

Yesterday, we started to see XML attachments. You might expect that these attachments open with Internet Explorer when you double-click them, but if you have Microsoft Office installed, they will open with Microsoft Word.

The XML file contains an element (w:binData) with a base64 string as content. Its attribute (w:name="editdata.mso") reveals it is a MSO file. Decoding the base64 string, we end up with a binary stream with header ActiveMime. 50 bytes into the stream (position 0x32), a ZLIB compression object starts. Inflating this object reveals an OLE file (header 0xD0CF1LE0) containing the VBA macros.

This OLE file can be analyzed with my oledump tool I mentioned in a previous diary entry [1], but I just released a new version [2] that handles XML files directly. And Philippe Lagadec already released a new version of his olevba tool yesterday [3].

Both tools are Python programs, giving you the means to analyze malicious Microsoft Office documents in any environment supporting Python, without Microsoft Office applications.

If you filter e-mail attachments that are Microsoft Office documents, you should check what your e-mail filter does with XML files. XML declaration identifies the XML file as a Word document, and attribute w:macrosPresent="yes" (of element w:wordDocument) indicates the presence of VBA macros. Remark that these strings are different for XML Excel documents.

Until now, we have only seen XML Word documents. Please post a comment if you received another format, like XML Excel documents, if possible with a link to the VirusTotal entry.

The MD5 of the sample discussed in this diary entry is 77739ab6c20e9dfbeffa3e2e6960e156.

1: https://isc.sans.edu/diary/oledump+analysis+of+Rocket+Kitten+-+Guest+Diary+by+Didier+Stevens/19137
2: http://blog.didierstevens.com/programs/oledump-py
3: http://www.decalage.info/python/oletools


3 comment(s)


Nice article.

Did a blog entry when they first hit here:


Sanesecurity.Malware.24759.XmlHeurGen signature also created for ClamAV 3rd Party signatures (sanesecurty.com) to detect them.


Yes, it's nice to have the "blue team" point of view.
If you want the "red team" one check:


Good article. Had a few clients receive suspicious Word docs recently but they weren’t triggering any AV software. Fortunately we have them fairly well trained to check the sender names and call if they are unsure of an attachment. I knew the files had to be bad but I am not well versed in decoding base64. Damn Microsoft and their word2003xml!!!

Diary Archives