assert xrefstream["/Type"] == "/XRef" #357

phoccavalcante · 2017-07-03T19:11:14Z

I cant read one PDF file because /Type == /Page and I cant understand the reason.

<class 'PyPDF2.generic.DictionaryObject'> {'/Parent': IndirectObject(1, 0), '/Contents': IndirectObject(4, 0), '/Type': '/Page', '/Resources': Indirec tObject(2, 0)}

Follow the file: http://communy.com.br/static/cobranca.pdf

Please, can anybody help me?

The text was updated successfully, but these errors were encountered:

guysoft · 2019-07-14T12:55:31Z

I am getting the same issue with another PDF. That can be causing this?

Also getting this with the PDF provided.

guysoft · 2019-07-14T14:03:34Z

Workaround:
Repair the file with ghostscript

gs \
  -o repaired.pdf \
  -sDEVICE=pdfwrite \
  -dPDFSETTINGS=/prepress \
   corrupted.pdf

AzizieAbuduaini · 2019-10-18T09:14:38Z

I got this error when I try to read pdf from s3. later I found that there is some unexpected unicode character apostrophe in pdf content. What I did is to replace apostrophe with "’" then read agian, it works fine.
so before you pass the content
content = text.content("'", "’")

then pass content to file reader.
not sure apostrophe cause this issue but it works for me. Please try this approach and let me know it works or not.

MartinThoma · 2022-04-07T14:45:20Z

Can somebody create a minimal Python script that shows the issue with the shared PDF?

MartinThoma · 2022-04-16T11:33:44Z

The comment by @AzizieAbuduaini indicates that it might be related to #384

MartinThoma · 2022-06-06T13:13:19Z

As here is not PDF to check, I assume that #924 has fixed this issue. I'll release the new PyPDF2==2.1.0 today.

Please ping me if you still encounter this issue with 2.1.0 or later.

MartinThoma added the is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF label Apr 7, 2022

MartinThoma added the workflow-text-extraction From a users perspective, text extraction is the affected feature/workflow label Apr 16, 2022

MartinThoma closed this as completed Jun 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

assert xrefstream["/Type"] == "/XRef" #357

assert xrefstream["/Type"] == "/XRef" #357

phoccavalcante commented Jul 3, 2017

guysoft commented Jul 14, 2019 •

edited

Loading

guysoft commented Jul 14, 2019 •

edited

Loading

AzizieAbuduaini commented Oct 18, 2019 •

edited

Loading

MartinThoma commented Apr 7, 2022

MartinThoma commented Apr 16, 2022

MartinThoma commented Jun 6, 2022

assert xrefstream["/Type"] == "/XRef" #357

assert xrefstream["/Type"] == "/XRef" #357

Comments

phoccavalcante commented Jul 3, 2017

guysoft commented Jul 14, 2019 • edited Loading

guysoft commented Jul 14, 2019 • edited Loading

AzizieAbuduaini commented Oct 18, 2019 • edited Loading

MartinThoma commented Apr 7, 2022

MartinThoma commented Apr 16, 2022

MartinThoma commented Jun 6, 2022

guysoft commented Jul 14, 2019 •

edited

Loading

guysoft commented Jul 14, 2019 •

edited

Loading

AzizieAbuduaini commented Oct 18, 2019 •

edited

Loading