-
Notifications
You must be signed in to change notification settings - Fork 558
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect image generation and content extraction #4184
Comments
Please be specific:
|
I found a similar issue, and this PDF has only 4 pics , but what I see in WPS is different file = '/Users/hujian/Downloads/SAMPLE+5.pdf' please help check this, I assume this is a submasked picture and thus waiting for your response, hope this is what I missed how to process this image |
Any feedback? |
Everything works as it should! |
@JorjMcKie thanks understood it as a smask. |
Image |
@JorjMcKie This is my issue, and I know that KSPX1 is as the water mark of the whole page, and that is what I found red on the page with 30 or 45 degree picture |
I don't know what WPS or Microsoft edit are doing ... and I won't investigate their behavior either. But PyMuPDF allows you to "delete" images selectively. If you execute |
I would look into those myself for now; still trying to know what they did to the PDF |
The PDF page also has a number of "clips" defined. They can make areas appear empty. |
Description of the bug
here is the raw pdf
BOW2429730S1.pdf
what i look in wps
pic i transfer with pymupdf
They seem to have inconsistencies
There are two issues with this document. The first issue is that the content I extracted is missing compared to the source PDF. The second issue is that the layout of the generated images is different from the source file. Are there any configurations or schemes that can allow this document to be extracted normally
How to reproduce the bug
import fitz
document = fitz.open('BOW2429730S1.pdf')
page = document.load_page(0)
texts = page.get_text()
img = page.get_pixmap()
img.save()
PyMuPDF version
1.24.10
Operating system
Linux
Python version
3.9
The text was updated successfully, but these errors were encountered: