-
Notifications
You must be signed in to change notification settings - Fork 943
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
exporting JBIG2 images #46
Conversation
There is no |
Please check why tests fail and fix this before we can merge... |
@side2k you can replace |
Adapted from the original at pdfminer/pdfminer.six#46
This PR needs only a little work. @side2k can you do that? |
This PR fixes #26 |
@pietermarsman I will look into it within a nearest couple of hours |
… references reading
@pietermarsman i've rebased and added the fixes. All the tests are passing now. |
it would be really great to get a code review from someone who is familiar with the current state of things in pdfminer - I didn't touch it for a couple of years by now. |
Nice, quick response! :) Do you have a pdf and script to test the changes with? That would make it a lot easier for me to review the code and understand what it does and what you have changed. I don't have a lot of experience with pdfs, nor with pdfminer. But I want to learn that and I can give your code a thorough sanity check. |
(@side2k , not sure if you've missed my previous message, this is a friendly notification if you did) Do you have a pdf and sample code to test this with? I can't understand and check this PR if I don't have a pdf that triggers this code. |
@pietermarsman I think I had once, but right now I dont have an opportunity to search. Maybe later. I am on a business trip right now, sorry. |
@side2k let us know when you find it. I think we can close this PR until we have some testing material. Once we have that we can reopen, test, review and merge. |
@ganeshtata, there is no pdf to test this PR with. Do you agree that we cannot merge this PR if there is nothing to test? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tried testing this with the this pdf: jbig2.pdf.
I needed to make a few adjustments to make this work, they are all related to byte-strings vs. normal strings.
The output I got has the extension .jb2
but I could not determine if that was a proper jbig2 file since I have no viewer for that, and could not find one on the internet. At least, not one that shows the actual image. I've also attached the output image (Im1.jb2.zip).
|
||
# file literals | ||
|
||
FILE_HEADER_ID = '\x97\x4A\x42\x32\x0D\x0A\x1A\x0A' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be a bytestring
return segments | ||
|
||
def is_eof(self): | ||
if self.stream.read(1) == '': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should compare to a bytestring, e.g. self.stream.read(1) == b''
return data_len | ||
|
||
def encode_segment(self, segment): | ||
data = '' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be a bytestring
'flags': {'deferred': False, 'type': SEG_TYPE_END_OF_FILE}, | ||
'number': seg_number, | ||
'page_assoc': 0, | ||
'raw_data': '', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be a bytestring
I've found a pdf with JBIG2 images pdfbox jira. |
This PR is adaptation of this one: euske/pdfminer#107