Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fix OCR on Qubes: PyMuPDF required TESSDATA_PREFIX
PyMuPDF versions lower than 1.22.5 pass the tesseract data path as an argument to `pixmap.pdfocr_tobytes()` [1], but lower versions require setting instead the TESSDATA_PREFIX environment variable [2]. Because on Qubes the pixels to pdf conversion happens on the host and Qubes has a lower PyMuPDF package version, we need to pass instead via environment variable. NOTE: the TESSDATA_PREFIX env. variable was set in dangerzone-cli instead of closer to the calling method in `doc_to_pixels.py` since PyMuPDF reads this variable as soon as the fitz module is imported [3][4]. [1]: https://pymupdf.readthedocs.io/en/latest/pixmap.html#Pixmap.pdfocr_tobytes [2]: https://pymupdf.readthedocs.io/en/latest/installation.html#enabling-integrated-ocr-support [3]: pymupdf/PyMuPDF#2439 [4]: https://github.com/pymupdf/PyMuPDF/blob/5d6a7db/src/__init__.py#L159 Fixes #682
- Loading branch information