Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AttributeError when opening documents that contain chinese characters #7

Open
Alexander-0x80 opened this issue Oct 12, 2014 · 4 comments
Assignees

Comments

@Alexander-0x80
Copy link

When opening documents that contain chinese/thai characters i get an exception saying :

File "create_index.py", line 16, in <module>
    pdf_data = slate.PDF(f)
  File "/home/alexander/dev/pdf_indexer/env/local/lib/python2.7/site-packages/slate/slate.py", line 49, in __init__
    self._cleanup()
  File "/home/alexander/dev/pdf_indexer/env/local/lib/python2.7/site-packages/slate/slate.py", line 57, in _cleanup
    del self.device
AttributeError: device
@timClicks
Copy link
Owner

Could you provide an example PDF so that I could create a test case?

@bsmartt13
Copy link

I'm seeing this with this PDF: . However the only version of pdfminer/slate that I've gotten to work is
slate==0.3
pdfminer==20110515
so please keep that in mind..

Traceback (most recent call last):
...
    doc = slate.PDF(read_handle)
  File "/Users/bsmartt/reputation_env/lib/python2.7/site-packages/slate/slate.py", line 49, in __init__
    self._cleanup()
  File "/Users/bsmartt/reputation_env/lib/python2.7/site-packages/slate/slate.py", line 57, in _cleanup
    del self.device
AttributeError: device

Happens with this pdf: http://www.emc.com/collateral/white-papers/h12756-wp-shell-crew.pdf

At a glance I didn't notice any chinese/thai, but there are definitely some funky characters in there.

edit: omfg. Preview.app is coming to the rescue. "Without the proper password, you do not have permission to copy portions of this document. Enter the password to unlock copying from the document."... fuck my life, and have a great day @timClicks 👍

@bsmartt13
Copy link

I realize slate supports passing the password into the constructor, however, it would be cool to fail gracefully here (if we didn't know a password was needed, for example).

I'm in a position where it would be awesome if I could recover (catch) this as a different exception than the AttributeError shown above (so as to be certain it was due to password protection, and not some other failure), I could notify the user 'hey, the pdf you gave us is password protected, please give us the password to continue.'.

Thanks Tim

@timClicks
Copy link
Owner

@bsmartt13 hey sorry I've taken months to get back to you, issuing some kind of useful if we encounter a PDF that expects a password seems very useful.

@timClicks timClicks self-assigned this Feb 12, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants