Quickie: Parsing XLSB Documents

Published: 2022-03-30
Last Updated: 2022-03-30 07:32:47 UTC
by Didier Stevens (Version: 1)
0 comment(s)

Inspired by Xavier's diary entry "XLSB Files: Because Binary is Stealthier Than XML", I took a look at Microsoft's XLSB specification.

This confirmed my hopes: the binary format of XLSB files is a sequence of TLV records, just like BIFF. At least for sheets and shared string tables, I haven't looked at the other file formats yet.

The type and length of each TLV record is a variable length integer: from 1 to 2 bytes (type) and from 1 to 4 bytes (length). It's stored in little-endian format, and the least significant bytes have all their most significant bit set. The most significant byte has its most significant bit cleared. 7 least significant bits are used to encode the integer value. This implies that the highest value for a type integer is number 16383.

I wrote a simple parser, it is still in beta: xlsbdump.py.

Didier Stevens
Senior handler
Microsoft MVP
blog.DidierStevens.com

Keywords: maldoc xlsb
0 comment(s)

Comments


Diary Archives