Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error parsing when object/stream put after %%EOF #74

Open
binjo opened this issue May 16, 2018 · 2 comments
Open

error parsing when object/stream put after %%EOF #74

binjo opened this issue May 16, 2018 · 2 comments

Comments

@binjo
Copy link

binjo commented May 16, 2018

It appears Acrobat will render pdf files properly even when object/stream def after %%EOF, however peepdf will discard the content due to stop at %%EOF.

e.g: the recent hot pdf exploit, bd23ad33accef14684d42c32769092a0

0000023515 00000 n
0000024187 00000 n
0000024261 00000 n
trailer
<<
 /Size 67
 /Root 10 0 R
>>
startxref
24613
%%EOF

1 0 obj 
<<
 /Length 56305 
 /Filter /FlateDecode 
 >> 
 stream
....

Current peepdf will failed to parse, throws exception.

The following tries to fix the problem.

diff --git a/PDFCore.py b/PDFCore.py
index 3b2fe00..33cf5a4 100644
--- a/PDFCore.py
+++ b/PDFCore.py
@@ -4315,7 +4315,7 @@ class PDFBody :
                                 self.setObject(compressedId, compressedObject, offset)
                             del(compressedObjectsDict)
         for id in self.referencedJSObjects:
-            if id not in self.containingJS:
+            if (len(self.containingJS) and id not in self.containingJS):
                 object = self.objects[id].getObject()
                 if object == None:
                     errorMessage = 'Object is None'
@@ -6941,6 +6941,9 @@ class PDFParser :
                     self.fileParts.append(fileContent)
                 else:
                     sys.exit(errorMessage)
+        # append anything behind %%EOF
+        if fileContent:
+            self.fileParts.append(fileContent)
         pdfFile.setUpdates(len(self.fileParts) - 1)

         # Getting the body, cross reference table and trailer of each part of the file

Applying the change, there should be no issue of parsing said file:

Version 0:
        Catalog: 10
        Info: No
        Objects (50): [6, 7, 9, 10, 11, 12, 14, 15, 17, 19, 20, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 40, 41, 42, 43, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 61, 62, 63, 64, 65, 66]
                Errors (1): [33]
        Streams (14): [14, 15, 17, 25, 31, 32, 33, 34, 49, 51, 55, 56, 57, 62]
                Encoded (11): [14, 15, 17, 25, 31, 32, 33, 49, 51, 55, 56]
                Decoding errors (1): [33]
        Suspicious elements:
                /AcroForm (1): [10]
                /OpenAction (1): [10]
                /JS (1): [11]
                /JavaScript (1): [11]


Version 1:
        Catalog: No
        Info: No
        Objects (1): [1]
        Streams (1): [1]
                Encoded (1): [1]
        Objects with JS code (1): [1]
PPDF> object 1

<< /Length 56305
/Filter /FlateDecode >>
stream

var dlldata= [0x81ec8b55,0x000498ec,0xf4458900 ....

It's a quick fix, you may refactor the logic a bit...

@Tigzy
Copy link

Tigzy commented May 16, 2018

That was fast, I was looking for this :)
Sample here: https://malshare.io/sample.php?hash=e6b7392fb03ff9ff069a9ec5d4221641
I created a fix and PR for another parsing issue: #75
However the "hidden" stream isn't seeing because after the %%EOF, thanks for your code

@jesparza
Copy link
Owner

Thanks @binjo! I want to merge first everything from a fork which is more active right now than master, I will try to do this fast, but I need to do some testing before. It is curious that having an isolated object really works with Adobe Reader, I am quite sure I read all the specification years ago, or if was not documented or they changed something...:?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants