Decoding Common XOR Obfuscation in Malicious Code

Published: 2012-06-04
Last Updated: 2013-07-13 01:52:49 UTC
by Lenny Zeltser (Version: 1)
3 comment(s)

There are numerous ways of concealing sensitive data and code within malicious files and programs. However, attackers use one particular obfuscation technique very frequently because it is simple to implement and offers protection that's usually sufficient. This approach works like this:

  1. The attacker picks a 1-byte value to act as the key. The possible key values range from 0 to 255 (in decimal).
  2. The attacker's code iterates through every byte of the data that needs to be encoded, XOR'ing each byte with the selected key.

To deobfuscate the protected string, the attacker's code repeats step #2, this time XOR'ing each byte in the encoded string with the key value.

For example, consider the malicious Microsoft Word document "World Uyghur Congress Invitation.doc", which was submitted to victims as an email attachment in a targeted attack. (To understand how this exploit works, see my earlier posts How Malicious Code Can Run in Microsoft Office Documents and How to Extract Flash Objects From Malicious MS Office Documents.)

In this case, the attacker embedded an ActiveX control inside the Word document to execute JavaScript, which and executed downloaded a malicious Flash program, which targeted a vulnerability in the victim's Flash Player. The payload of the exploit extracted and executed a malicious Windows executable, which was hidden inside the Word document.

To locate the executable file within the Word document, you can use Frank Boldewin's OfficeMalScanner tool. The "scan" option directs the tool to look for the embedded malicious Office and Windows executable files. The "brute" option tells the tool to look for these artifacts even if they were obfuscated using several common methods, including the XOR technique described above.


In this example, OfficeMalScanner automatically locates and extracts the embedded Windows executable, saving it as the "WUC Invitation Letter Guests__PEFILE__OFFSET=0xfc10__XOR-KEY=0x70.bin". (The tool automatically determined that the attacker used XOR key 0x70 to conceal this file.) According to PEiD (see screenshot below), the extracted file is a Win32 program that is not packed and that was probably compiled using Microsoft Visual C++.


The deobfuscated and extracted Windows executable file can be analyzed using any means, including your favorite disassembler and debugger, as well as using behavioral analysis techniques.

It's quite possible that the extracted malicious executable also contains obfuscated data. Given that everyone, including malware authors, takes shortcuts once in a while, it's possible that this data is protected using the simple XOR algorithm we discussed earlier. Didier Steven's XORSeach tool can scan any file, looking for strings encoded using simple techniques, including this XOR method.

You need to know the clear-text version of the string you'd like XORSearch to locate. One good value to look for is "http", because attackers often wish to conceal URLs within malicious code. Another good string, as suggested by Marfi, might be "This program", because that might identify an embedded and XOR-encoded Windows executable, which typically has the string "This program cannot be run in DOS mode" in the DOS portion of the PE header. 

As you can see below, XORSearch locates the string "HTTP/1.1" within the extracted malicious executable; apparently it was encoded using the key 1B. (Sometimes you get a false positive, as seems to be the case with the key 3B.)


When invoking XORSearch with the "-s" parameter, you direct the tool to attempt decoding all strings within the file using the discovered key. In our example, this results in the creation of the "WUC Invitation Letter Guests__PEFILE__OFFSET=0xfc10__XOR-KEY=0x70.bin.XOR.1B" file. If you look at this file using a hex editor, you can locate several decoded strings that you might use as the basis for custom signatures and further code-level analysis.


XOR and related methods are often used by attackers to obfuscate code and data. The tools above help you locate, decode and extract these concealed artifacts. If you have recommendations for other tools that can help with such tasks, please let us know by email or leave a comment below.

-- Lenny Zeltser

Lenny Zeltser focuses on safeguarding customers' IT operations at NCR Corp. He also teaches how to analyze malware at SANS Institute. Lenny is active on Twitter and . He also writes a security blog.

3 comment(s)


My open-source packet capture framework has an auto-un-XOR routine which will automatically attempt to unxor executables with several different algorithms. It's quite handy for submitting samples to VirusTotal, etc. Patches for more routines are welcome!
I can explain why XORSearch finds string http with key 3B at position 68D0:

Notice that the string http://... also contains HTTP/1.1 a bit further inside the string.
XOR key 3B is equivalent with the combination of XOR keys 1B and 20 (1B + 20 = 3B). Key 1B is the encoding key, while key 20 has the property of changing the case of ASCII letters (lowercase -> uppercase and vice versa).

So the http string XORSearch finds at position 68D0 is actually string HTTP/1.1 decoded with key 1B and converted to lowercase (with XOR key 20).
I was doing code analysis with my first malware sample recently and saw it decoding strings with XOR. Here is the Python script I used to decode it. There are probably much better ways, but it works...

#!/usr/bin/env python
import sys
import binascii
import re

f = open(sys.argv[1], 'rb').read()
data = binascii.hexlify(f)

def xor(m):
decoded = chr(int(, 16) ^ int(sys.argv[2], 16))
return decoded

decoded = re.sub('[0-9a-fA-F]{2}', xor, data)

print decoded

$ ./ malware.exe 0x1F | strings | less

Diary Archives