We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hi,
I noticed a bug in PyMuPDF version > 1.23.9 (included) when using get_text to extract text from PDF documents.
To reproduce the bug
Consider the attached PDF file: test_file.pdf
Extract text using the code below (see "How to reproduce the bug")
To reproduce the correct behavior install a PyMuPDF version < 1.23.9 (e.g., 1.23.8). We obtain the following complete text: doc_text_1238.txt
To reproduce the bug behavior install a PyMuPDF version >= 1.23.9 (e.g., 1.23.24). We obtain the following broken text: doc_text_12324.txt
ADDITIONAL NOTES
Thank you for your help
Extract text using the following code
fitz_doc = fitz.open(pdf_path) doc_text = list() for page in fitz_doc: doc_text.append(page.get_text()) doc_text = ' '.join(doc_text)
To reproduce the bug behavior install a PyMuPDF version >= 1.23.9 (e.g., 1.23.24).
1.23.24
Windows
3.10
The text was updated successfully, but these errors were encountered:
Address #3186: don't terminate extracted text at chr(0) characters.
9983ce2
25adc23
Fixed in 1.23.25.
Sorry, something went wrong.
No branches or pull requests
Description of the bug
Hi,
I noticed a bug in PyMuPDF version > 1.23.9 (included) when using get_text to extract text from PDF documents.
To reproduce the bug
Consider the attached PDF file: test_file.pdf
Extract text using the code below (see "How to reproduce the bug")
To reproduce the correct behavior install a PyMuPDF version < 1.23.9 (e.g., 1.23.8). We obtain the following complete text: doc_text_1238.txt
To reproduce the bug behavior install a PyMuPDF version >= 1.23.9 (e.g., 1.23.24). We obtain the following broken text: doc_text_12324.txt
ADDITIONAL NOTES
Thank you for your help
How to reproduce the bug
Extract text using the following code
To reproduce the bug behavior install a PyMuPDF version >= 1.23.9 (e.g., 1.23.24).
PyMuPDF version
1.23.24
Operating system
Windows
Python version
3.10
The text was updated successfully, but these errors were encountered: