PDF Analysis Intro and OpenActions Entries

Published: 2022-07-29
Last Updated: 2022-07-29 16:29:46 UTC
by Johannes Ullrich (Version: 1)
0 comment(s)

This diary was contributed by Jesse La Grew

Many of the tools used to manage email systems filter malicious content before they ever arrive in a user’s inbox. It is becoming rarer to see a malicious document delivered after having the attachments screened through a variety of scanners and malware detonation sandboxes. There are certainly exceptions as creators of these documents improve methods of evasion. A PDF may still get delivered to an inbox and need to be analyzed manually.

An excellent tool is pdf-parser.py [1] and is included within the Remnux VM [2]. First, getting a general idea of what to expect in the document can be useful before getting into the details.

pdf-parser.py <filename> -a


Image 1: Output of pdf-parser.py highlighting the Object ID

The summary will give some general information about the document and some indicators on where to pivot for your investigation. It is important to look at riskier document behavior or content, such as included JavaScript, actions that happen when a document is opened, or links a user may click on. In this case, there is an action when the file is opened and a URI. We’ll look at the /OpenAction object detail first.

pdf-parser.py <filename> -o 11

output of pf-parser.py highlighting the OpenAction property
Image 2: Output of pdf-parser.py highlighting OpenAction

There is a lot of information to unpack. This can be challenging if unfamiliar with the PDF standards. A useful resource is the Adobe Acrobat Developer Resources [3] and the latest formatting document from Adobe [4].

/OpenAction [6 0 R /FitH null]

Excerpt from Adobe PDF Documentation
OpenAction “…A value specifying a destination that shall be displayed or an action that shall be performed when the document is opened…” [4, page 74]
[ page /FitH top ] “Display the page designated by page, with the vertical coordinate top positioned at the top edge of the window and the contents of the page magnified just enough to fit the entire width of the page within the window. A null value for top specifies that the current value of that parameter shall be retained unchanged.” [4, page 366]

Referencing the Adobe documentation, this item is simply telling the PDF viewer to open the page specified by the object reference 6 0 R. References to other objects within the same document are common. It can be helpful to map out these object references to get a better overall picture.

This example did not have anything interesting to tell us, which is usually what I like to see most days. An example mocked up by Didier Stevens shows what a malicious file may look like using the same /OpenAction entry [5].

There are a variety of great tools for analyzing files. In the case of PDF documents, one of the best tools is Adobe’s PDF standards documentation. Keep it handy the next time you need to really understand what a PDF document is doing and why it may be doing it.

[1] https://github.com/DidierStevens/DidierStevensSuite/blob/master/pdf-parser.py
[2] https://remnux.org/
[3] https://opensource.adobe.com/dc-acrobat-sdk-docs/
[4] https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/PDF32000_2008.pdf
[5] https://blog.didierstevens.com/2015/08/28/test-file-pdf-with-embedded-doc-dropping-eicar/

0 comment(s)
Diary Archives