Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PDFObjRef Sneaking into PDFFont.widths #268

Closed
igormp opened this issue Jul 15, 2019 · 3 comments
Closed

PDFObjRef Sneaking into PDFFont.widths #268

igormp opened this issue Jul 15, 2019 · 3 comments

Comments

@igormp
Copy link
Contributor

igormp commented Jul 15, 2019

That's actually an issue on the original repo (as seen here) that I stumbled upon. A PR with a fix also exists here.

Should I recreate that PR in here?

EDIT - original issue message by @jeresch for the sake of better organization:

When parsing a document, I am getting an exception with the following traceback:

Traceback (most recent call last):
  File "./main.py", line 748, in parseEachProfile
    self.people.append(parser.parse(None))
  File "./main.py", line 322, in parse
    poslist = parser.parsepdf2(filename)
  File "/Users/____/Dropbox/___/Parser/pdfparsing.py", line 167, in parsepdf2
    interpreter.process_page(page)
  File "/usr/local/lib/python2.7/site-packages/pdfminer/pdfinterp.py", line 839, in process_page
    self.render_contents(page.resources, page.contents, ctm=ctm)
  File "/usr/local/lib/python2.7/site-packages/pdfminer/pdfinterp.py", line 852, in render_contents
    self.execute(list_value(streams))
  File "/usr/local/lib/python2.7/site-packages/pdfminer/pdfinterp.py", line 877, in execute
    func(*args)
  File "/usr/local/lib/python2.7/site-packages/pdfminer/pdfinterp.py", line 760, in do_TJ
    self.device.render_string(self.textstate, seq)
  File "/usr/local/lib/python2.7/site-packages/pdfminer/pdfdevice.py", line 82, in render_string
    scaling, charspace, wordspace, rise, dxscale)
  File "/usr/local/lib/python2.7/site-packages/pdfminer/pdfdevice.py", line 98, in render_string_horizontal
    font, fontsize, scaling, rise, cid)
  File "/usr/local/lib/python2.7/site-packages/pdfminer/converter.py", line 112, in render_char
    textwidth = font.char_width(cid)
  File "/usr/local/lib/python2.7/site-packages/pdfminer/pdffont.py", line 517, in char_width
    return self.widths[cid] * self.hscale
TypeError: unsupported operand type(s) for *: 'PDFObjRef' and 'float'

On closer inspection with pdb, I saw an issue:

(Pdb) p self.widths
{0: 1000, 1: 277, 2: 333, 3: 666, 4: 556, 5: 556, 6: 556, 7: <PDFObjRef:11>, 8: 333, 9: 943, 10: 556, 11: 556, 12: 777, 13: 556, 14: 722, 15: 556, 16: 277, 17: <PDFObjRef:11>, 18: 277, 19: 222, 20: 222, 21: 277, 22: 666, 23: 610, 24: <PDFObjRef:11>, 25: 556, 26: 556, 27: 556, 28: 833, 29: 666, 30: <PDFObjRef:11>, 31: <PDFObjRef:11>, 32: 666, 33: 777, 34: 556, 35: 556, 36: 556, 37: 556, 38: 556, 39: 666, 40: 333, 41: 556, 42: 333, 43: <PDFObjRef:11>, 44: 556, 45: 583, 46: 556, 47: 722, 48: 277, 49: <PDFObjRef:11>, 50: 833, 51: 556, 52: 610, 53: 556, 54: 666, 55: 722, 56: 222, 57: <PDFObjRef:11>, 58: 277, 59: 722, 60: 222, 61: 277}

When the exception occurred, cid was 7, self.hscale was 0.001, and as shown above, self.widths[7] is not a float, but a PDFObjRef. On probing the object, I got

(Pdb) p self.widths[7].__dict__
{'doc': <pdfminer.pdfdocument.PDFDocument object at 0x10d23d310>, 'objid': 11}

I tried stepping through with pdb, but wasn't quite able to figure out where the PDFObjRef got inserted. In pdffont.py, there are two functions that seem likely: get_width(seq) and get_width2(seq), but this depends on the input obviously.

I can provide the problematic pdf if anyone would find it useful. I'll have to go through and scrub confidential information though, so I'll hold off for now. -Jeremy

@pietermarsman
Copy link
Member

pietermarsman commented Jul 15, 2019

Can you edit the description of this issue to include the information from the original issue? This helps to document everything in one place. Copying and pasting it from the original issue is fine by me, if there are no specific pdfminer2 or pdfminer.six details.

@pietermarsman
Copy link
Member

Should I recreate that PR in here?

And yes, please recreate the PR!

@pietermarsman
Copy link
Member

Fixed by #273

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants