Stop relying on file extensions

Published: 2017-10-24
Last Updated: 2017-10-24 07:05:45 UTC
by Xavier Mertens (Version: 1)
3 comment(s)

Yesterday, I found an interesting file in my spam trap. It was called '16509878451.XLAM’. To be honest, I was not aware of this extension and I found this on the web: "A file with the XLAM file extension is an Excel Macro-Enabled Add-In file that's used to add new functions to Excel. Similar to other spreadsheet file formats, XLAM files contain cells that are divided into rows and columns that can contain text, formulas, charts, images and… macros!” Indeed, the file contained some VBA code:

$ oledump 16509878451.XLAM
A: xl/vbaProject.bin
A1:       463 'PROJECT'
A2:        80 'PROJECTwm'
A3: M   18788 'VBA/RRWIx'
A4: m     991 'VBA/Sheet1'
A5: M    1295 'VBA/ThisWorkbook'
A6:      8673 'VBA/_VBA_PROJECT'
A7:      1665 'VBA/__SRP_0'
A8:       243 'VBA/__SRP_1'
A9:       214 'VBA/__SRP_2'
A10:       230 'VBA/__SRP_3'
A11:       557 'VBA/dir'

The file is already know on VT (SHA256: c55e26fff6096362fab93dd03b6b4c5e4e62ed5a8a7fc266c77f34683b645bf6[1]) and contains a dropper macro that grab the following payload: hXXps://a.pomfe[.]co/ezrtecm.png. Nothing special.

Then, I found another one called 'PL-BL.R01’. This extension is used to indicate that we have a multi-volumes RAR archive. This method was very popular in the 90’s when the Internet was not as stable as today or when a big amount of data had to be split across multiple devices. By curiosity, I checked the files received by my spam trap in 2017, here is an overview of the files received:

 

At the bottom of the list, I found:

ACE 5
R01 3
ARJ 2
XLAM 1
CAB 1

The main reason why such files have exotic extensions is to try to bypass dump filters based on regex like:

.*\.(doc|zip|exe|cab|com|pif|bat|dll)$

Instead of relying on file extensions, use libmagic[2] or YARA[3]. Here is a simple test in Python with the magic module:

# python
Python 2.7.12 (default, Nov 19 2016, 06:48:10)
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import magic
>>> ms = magic.open(magic.NONE)
>>> ms.load()
0
>>> ms.file(“/tmp/no_ext_file")
'JPEG image data, JFIF standard 1.01, aspect ratio, density 1x1, segment length 16, progressive, precision 8, 1600x512, frames 3'
>>>

And the same file analyzed with a simple YARA rule:

$ cat jpeg.yar
rule IsJPGImage
{
    meta:
        author = "Xavier Mertens (https://blog.rootshell.be)";
        description = "Detect if a file is a JPG image"
    strings:
        $header1 = { FF D8 FF DB }
        $header2 = { FF D8 FF E0 }
        $header3 = { FF D8 FF E1 }
    condition:
        $header1 in (0..3) or $header2 in (0..2) or $header3 in (0..3)

}
$ # yara jpeg.yar /tmp/no_ext_file
IsJPGImage /tmp/no_ext_file

YARA is also perfectly supported in Python! So, please stop relying on file extensions to decide if a file must be flagged as suspicious or not. Microsoft Windows has tons of extensions[4] (some very old like the .lzh or .lha compression algorithms) that are still supported by many tools!

[1] https://www.virustotal.com/#/file/c55e26fff6096362fab93dd03b6b4c5e4e62ed5a8a7fc266c77f34683b645bf6/detection
[2] https://github.com/threatstack/libmagic
[3] https://github.com/VirusTotal/yara
[4] https://en.wikipedia.org/wiki/List_of_filename_extensions

Xavier Mertens (@xme)
ISC Handler - Freelance Security Consultant
PGP Key

3 comment(s)

Comments

This nice little utility will identify the real file-type in Windows
http://mark0.net/soft-trid-e.html

TrID/32 - File Identifier v2.24 - (C) 2003-16 By M.Pontello
Definitions found: 9177
Analyzing...

Collecting data from file: c:\sysinternals\trid.exe
91.4% (.EXE) FreeBASIC 1.0x Win32 Executable (792408/92/73)
3.6% (.EXE) Win32 Executable MS Visual C++ (generic) (31206/45/13)
3.1% (.EXE) Win64 Executable (generic) (27625/18/4)
0.7% (.DLL) Win32 Dynamic Link Library (generic) (6578/25/2)
0.5% (.EXE) Win32 Executable (generic) (4508/7/1)
It's best not to rely upon extensions or the "magic" numbers of files, unless all that you care about is loosely identifying what type of file you are dealing with. Malware can easily be appended to various file types or encoded within files and go unnoticed by extension or magic number checks. A file's contents must be deeply examined - a great example:

https://isc.sans.edu/forums/diary/Base64+All+The+Things/22912/
Totally agree also because I'm the author of this diary too ;-)
Multiple layers of controls must be implemented.

Diary Archives