Some Strings to Remember
When you handle unknown files, be it for malware analysis or other reasons, it helps to know some strings / hexadecimal sequences to quickly recognize file types and file content.
If you want to memorize some strings to improve your analysis skills, I recommend that the first string you memory is MZ, or 4D 5A in hexadecimal (ASCII table).
All Windows executables (PE file format) start with these 2 bytes: 4D 5A.
And that is not the only "skill" that you acquire by memorizing 4D 5A: as Z is the last letter of the alphabet, you also learned that all uppercase letters are smaller than or equal to 5A. You might already know that letter A is 41 (for example from PoC buffer overflows: AAAAAA -> 414141414141). Then you've learned that all uppercase letters are between hexadecimal values 41 and 5A.
Lowercase letters have their 6th most-significant bit set, while uppercase letters have that bit cleared. A byte with its 6th MSB set and all other bits cleared, has hexadecimal value 20. Add 20 to 41, and you have 61: letter a. Hence all lowercase letters are comprised between hexadecimal values 61 and 7A.
The next string I recommend to memorize, is PK: 50 4B. All records of a ZIP file start with PK (50 4B), and typical ZIP files start with a ZIP record (although this is not mandatory): hence typical ZIP files starts with PK. ZIP files are not only used for ZIP archives, but also for many other file formats, like Office documents (.docx, .docm, .xlsx, .xlsm, ...).
And when you memorize that PK is 50 4B, then it's not that difficult to memorize that PE is 50 45 (E is the fifth letter -> 45).
PE are the first 2 bytes of the header for PE files (Windows executables), and can be found after the MZ header (which is actually the DOS header).
If some mnemotechnic can help you remember strings MZ and PK: then know that these are initials of developers: Mark Zbikowski and Phil Katz.
To summarize:
- MZ -> 4D 5A
- PK -> 50 4B
- PE -> 50 45
- A-Z -> 41 - 5A
- a-z -> 61 - 7A
Please post a comment if you have more "memorable" strings. We might end up with a small cheat sheet.
Didier Stevens
Senior handler
Microsoft MVP
blog.DidierStevens.com DidierStevensLabs.com
Comments
my favourite magic string is definitely "docfile o albi lael" = "D0CF11E0 A1B11AE1" in hexspeak marking the beginning of the OLE objects (like word documents, excel files etc.). It is long and unique enough to throw only very little of the false positives.
Would you know the origin/meaning of meaning of that string? - I have not found that anywhere explained. Someone working on OLE2 specification having so relation to arabic/hebrew?
Other magic strings:
D0CF11E0 A1B11AE1 - OLE files - Object Linking and Embedding
CAFEBABE - Java / universal (old) Mach-o
FEEDFACE - Mach-0
FEEDFACF - Mach-0 (64b)
Obvious magic strings:
GIF87a or GIF89a - Graphics Interchange File
%PDF - Portable document format
\x7FELF - ELF executable
Rar! - begining of the Rar archive - usefull for dissecting self extracting rars
Some more resources on the topic:
https://en.wikipedia.org/wiki/Hexspeak
https://en.wikipedia.org/wiki/List_of_file_signatures
https://github.com/corkami/
https://github.com/libyal/libolecf/blob/master/documentation/OLE%20Compound%20File%20format.asciidoc
And obviously the mother of all magic strings - the "file" utility source code:
https://github.com/file/file/tree/master/magic/Magdir
Best regards
Michal Ambroz
Anonymous
May 22nd 2020
4 years ago