Office: About OLE and ZIP Files

Published: 2020-09-07
Last Updated: 2020-09-07 16:41:49 UTC
by Didier Stevens (Version: 1)
1 comment(s)

A reader asked if a particular Emotet sample was a malformed ZIP file. It is not, and I will explain why you might think it is in this diary entry.

I create an example Word document, and save it as a .doc file (OLE file).
When I look at it with my tool, I get this output:

Why do I get output for a ZIP file, when the .doc file is an ole file?

What the reader noticed, is that when they used my tool with option -f L to find and list all PKZIP record, the output showed that there was data before the first PKZIP record (p = prefix, 10566 bytes) and after the last PKZIP record (s = suffix, 12898 bytes):

We have indeed seen ZIP files with data prepended or appended, to try to fool anti-virus products. But this is not the case here.
What is going on, is that each .doc file created with Office contains an embedded ZIP file with theme data.
When I use with its YARA option to do an ad hoc search for filename theme1.xml, I see that this string is in the 1Table stream. This is where the ZIP file is embedded:

This file theme1.xml, found in a ZIP file embedded in an OLE file (.doc), is also present in the OOXML format (.docx):

.doc files (and also .xls files) created with Microsoft Office contain an embedded ZIP file with theme data, and this ZIP file can be found with


Didier Stevens
Senior handler
Microsoft MVP

1 comment(s)


Thank you for replying to my question.
This might be a better example for malformed zip: e7a8dd258aefb376f23ef3d68e233e5e5f6c5f277303652d614252f7e1ef00ac

For me this is an unusual malware.

For Emotet: When the url's were in an 'o'-stream, it was easy to analyse with oledump. For the latest versions this helped:

Diary Archives