Obfuscation and Repetition

Published: 2020-10-05
Last Updated: 2020-10-05 20:35:27 UTC
by Didier Stevens (Version: 1)
2 comment(s)

The obfuscated payload of a maldoc submitted by a reader can be quickly extracted with the "strings method" I explained in diary entry "Quickie: String Analysis is Still Useful".

This is a very long string (more than 1000 characters) and is most likely the payload we are looking for.
It looks like this is just a sequence of repeating strings, but if you take a close look, you’ll see that there are characters between the repeating string hui12t7gGG7&^6272 gasg671. I have highlighted this repeating string in red here:

You can see individual letters between the repeating string: p, o, w, e, r, …
I’m sure you can now guess where this is going: powershell …
This is an obfuscation method I’ve seen several times: obfuscate the payload by inserting a long string of characters between each character of the payload.
Here is an example.
Say that our payload is "powershell payload". We obfuscate it by inserting character . between each character of the payload, like this:

"p.o.w.e.r.s.h.e.l.l. .p.a.y.l.o.a.d"

In this example, the payload is still easily recognizable.
But what if we use "Internet_Storm_Center" as repeating string? Then we get this:

"pInternet_Storm_CenteroInternet_Storm_CenterwInternet_Storm_CentereInternet_Storm_CenterrInternet_Storm_CentersInternet_Storm_CenterhInternet_Storm_CentereInternet_Storm_CenterlInternet_Storm_CenterlInternet_Storm_Center Internet_Storm_CenterpInternet_Storm_CenteraInternet_Storm_CenteryInternet_Storm_CenterlInternet_Storm_CenteroInternet_Storm_CenteraInternet_Storm_Centerd"

And in this example, the payload is not so easy to recognize.
The trick to decode the obfuscated payload, is to find the repeating string, and remove it. As this can be sometimes tricky, I wrote a small program that automates this task: deobfuscate-repetitions.py.

In this example, we can see that it finds several repeating strings for our sample, but that there’s one repeating string that results in a decoded payload starting with powersheLL:

We can then use option -f to search for string "power", and have the complete payload decoded:

This can then be decoded with base64dump.py:


Didier Stevens
Senior handler
Microsoft MVP
blog.DidierStevens.com DidierStevensLabs.com

2 comment(s)


Thank You Didier
Thank you very much for posting this. I literally just had one fwded to me that a client received. Looked like a phishing attempt, had a bad HTML file attached to it. The body of the email was just the date, "06 October, 202012:06:02 PM", and letters scattered between that. It's weird though because it de-obfuscated to be something like "This voice message is for $person, please ignore if wrongly received".

Anyway, thank you. This was relevant for me this AM.

Diary Archives