Detect Orientation for OCR and Mixed Orientations #4165

ebeck-iq · 2024-12-18T15:56:02Z

ebeck-iq
Dec 18, 2024

I'm using 1.24.13 at the moment. From discussion #3254 PyMuPDF should be able to handle the orientation issues for text capture, but OCR was not mentioned. Is this auto rotation handled in the OCR process as well?
Additionally I have some scanned PDF's with mixed rotations (user error) and the OCR process is not producing usable text except from the cover page (normally would not be present). We're able to deal with it using PyTesseract in testing, but would prefer to stick with PyMuPDF if possible.

JorjMcKie · 2024-12-18T16:11:31Z

JorjMcKie
Dec 18, 2024
Maintainer

Share an example please that is handled correctly by pytesseract but not by the Tesseract embedded in PyMuPDF.

2 replies

ebeck-iq Dec 18, 2024
Author

I invited you as an outside collaborator to this repo - I hope that work OK. There are client details on the doc.
https://github.com/iqbo-dev/priv-share/blob/main/5star-43312291.pdf

ebeck-iq Dec 18, 2024
Author

I also added the resulting extracted text in the same repo

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Detect Orientation for OCR and Mixed Orientations #4165

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{title}}

Select a reply

Detect Orientation for OCR and Mixed Orientations #4165

ebeck-iq Dec 18, 2024

Replies: 1 comment · 2 replies

JorjMcKie Dec 18, 2024 Maintainer

ebeck-iq Dec 18, 2024 Author

ebeck-iq Dec 18, 2024 Author

ebeck-iq
Dec 18, 2024

Replies: 1 comment 2 replies

JorjMcKie
Dec 18, 2024
Maintainer

ebeck-iq Dec 18, 2024
Author

ebeck-iq Dec 18, 2024
Author