Advanced obfuscated JavaScript analysis

Published: 2008-04-06. Last Updated: 2008-04-07 18:47:31 UTC
by Daniel Wesemann (Version: 1)

When we got contacted by ISC reader Greg in Hungary, whose web server had been hacked and adorned with a couple of obfuscated JavaScript files, we expected a variant of the "nmidahena" injection and a closed case. JavaScript is an interpreted language, and while the obfuscation attempts we see are getting more creative, the scripts can usually still be coerced quite easily into divulging their secrets. ISC handler Lenny Zeltser teaches the SANS course on malware analysis, and ISC handler Bojan Zdrnja wrote the portion on JavaScript analysis for that course, so we are usually able to make short work of bad stuff.

Not so this time. This one was something new.

The file looked benign enough, the usual method to resolve one of these has been described elsewhere in detail, and involves removing the script tags, changing eval to print, and running the file through SpiderMonkey.

It worked. That apparently another step of de-obfuscation was needed didn't faze us. Same routine, hunt down the eval() calls, change to print, re-run through SpiderMonkey. Easy enough. But the resulting lines printed did not show the expected exploit script in all its badness, but rather simply said

arguments.callee
la'Sbjd

Now you might remember the diary that we ran a while back on the properties of arguments.callee.toString() and how this makes analysis harder. This method allows a function to reference itself, and hence allows a function to detect modifications to its own code. Changing eval() to print() changes the function string, and with it the result. This can usually be defeated by re-defining the eval() function into a simple call to print(), but not so in this case. So let's take a look at some of the protection features in detail.

#1: Simple obfuscation

xdxc=eval('a#rPgPu,mPe,n,t9sP.9ckaPl,lPe9e9'.replace(/[9#k,P]/g, ''))

All this does is make the string "arguments.callee" un-obvious for both human and automated analysis. If you look closely, you'll see that the replace() call substitutes 9#k,P with nothing, and hence turns the string into what it really is. While this technique is not in itself very savvy, the usage of eval() in this context makes it impossible to simply re-define eval() into print() as we tried. If we do so, xdxc does not end up containing the correct string, and the moment this variable gets used, the whole thing falls apart.

#2 Deriving the cipher key from the code itself

arguments.callee as used returns the entire "body" of the function called ppEwEu .. which is everything between the start and the closing curly bracket after the catch(e) clause. The function xFplcSbG() is then used to turn this entire function into a numeric cipher key that is dependent on the actual text in the code block, as well as on its length.

function xFplcSbG(mrF) {
    var rmO = mrF.length;
    var wxxwZl = 0, owZtrl = 0;
    while (wxxwZl < rmO) {
        owZtrl += mrF.charCodeAt(wxxwZl) * rmO;
        wxxwZl++;
    }
    return ("" + owZtrl);
}

It is obvious now why touching the code in any way leads to completely different results: A change of a single letter in the code, say, if we replace eval() by evil(), already changes the resulting cipher key significantly. A bigger change, like if we replace eval() with print(), throws the result into a different ballpark alltogether.

#3 Using the cipher key to decrypt the function arguments

nzoexMG=nuI.charCodeAt(sIoLeu)^xgod.charCodeAt(qcNz) is comparably simple - this section shifts the key derived above "over" the obfuscated string and uses an XOR operation between the two to obtain the cleartext.

There still is a way to decode such a self-defending function: Use Microsoft Script Editor (MSE). With MSE, you can set breakpoints in JavaScript code and check out variable contents at your leisure. Loaded into MSE with a breakpoint set on the second call to eval(), the script as obtained after the first decoding stage readily reveals its secret. The big downside of this method is, of course, that you are actually running the hostile code in an environment that well might be vulnerable to the exploit you are about to reveal. As they say in Script-Busters: Don't try any of this at home. Ever.

But it ain't over until the fat trojan runs...

Even after this stage, the code still had a couple of tricks up its sleeve. But we readily recognize the string "traff3.cn", and also a couple of artefacts like the text "iwf[rIa[mIeK" (iframe), which suggests that we are getting close.

#4 Using a function prototype instead of a function

We have no idea why - probably in the hope that automated script parsers do not have prototyping implemented. Or to confuse the human analyst - as you can see from the image above, the resulting pile of characters is not for the faint-hearted. With a little patience, the prototype can be readily split into its parts though.

#5 Using cookies

The install() function calls alreadyInstalled() to check if the script has already run. Install(), when complete, sets a browser cookie named "dhafcbeg", and this is what the alreadyInstalled() method verifies. This is no obfuscation mechanism per se, probably rather an attempt to keep the user's browser from turning sluggish from re-infections on heavily infected web pages. As a side-effect, this also makes analysis in SpiderMonkey harder though: SpiderMonkey has no "document" object and doesn't do cookies.

#6 Including the referer

One particularly nasty bit is the call to "document.location.host" in the getFrameURL() function. This retrieves the host name portion of the page currently displayed in the browser. For example, if "http://some.server.nul/bbs/board.php" had been infected with this obfuscated script, document.location.host would return "some.server.nul". This string is then used to build the path from where the next stage exploit is loaded! Again, if run in SpiderMonkey or even within Microsoft Script Editor, the origin page object - and hence the host string - would be empty. The bad guys check for this in the getFrameURL() function, and substitute the host name with a random 16 character hex string if no hostname is set.

When run from within an analysis environment, the resulting URL is therefore something like
34ce19ab20045c11.a004ebb329886522.3traff-dot-cn
whereas when run as a real exploit, the first random string would reflect the "host"
some.server.nul.a004ebb329886522.3traff-dot-cn

The bad guys seem to use this difference to automatically spot and ban whoever is not careful enough in tracking them - their web server stopped responding to two of the IP addresses that we used during our analysis. The site currently seems to be down, but it probably is still a very good idea not to try any of these URLs. Curiosity bricked the lap'.

When it still was active one week ago, the above URL redirected to www.google-analytics.com.urchin.js.7traff-dot-cn. Yes, someone is trying to be cute. From there, after another stage of obfuscation, it finally triggered MS06-014 to download and run a Keylogger Trojan. The probably only reason why such advanced obfuscation would be paired with such an old exploit is - that there are still sufficient unpatched systems out there for the exploit to work.

Thanks to Greg for the sample, and to ISC Handler Bojan Zdrnja for help with the analysis.

Keywords: JavaScript malware obfuscation

0 comment(s)

Internet Storm Center

Advanced obfuscated JavaScript analysis

Comments