Inverted colors when extracting CMYK image #2931
Labels
is-bug
From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF
workflow-images
From a users perspective, image handling is the affected feature/workflow
When
page.images
is used to read images, the color becomes incorrect. However, when replacing it, pypdf calls the same function to read the image again, and the image is in the correct color space. I will explain more in the issue analysis section below.Environment
Which environment were you using when you encountered the problem?
Code + PDF
This is a minimal, complete example that shows the issue:
Share here the PDF file(s) that cause the issue. The smaller they are, the
better. Let us know if we may add them to our tests!
example.pdf
I personally fine with adding it to test. However, this is modified from http://paper.people.com.cn/rmrb/images/2024-10/28/03/rmrb2024102803.pdf and it may have some copywrite issues. It would be better to create a new PDF file with a CMYK image if it can reproduce the issue.
Traceback
This is the complete traceback I see:
Issue Analysis
page.images
callsPageObject._get_image()
function in the_page.py
file. Alsoimg.replace()
function also calls the same_get_image()
function twice in theImageFile.replace()
byreader.pages[0].images[0]
.pypdf/pypdf/_page.py
Lines 632 to 669 in 98aa974
pypdf/pypdf/_page.py
Lines 398 to 401 in 98aa974
By editing the
_get_image()
function:Here is the new output:
One decode output is used when reading
page.images
, and two are called when replacing. Here is the reason of the issue: image decode is wrong when reading it.pypdf/pypdf/_page.py
Line 658 in 98aa974
Now I would like to bring your attention to this function
_xobj_to_image()
infilters.py
pypdf/pypdf/filters.py
Line 793 in 98aa974
The error decode will cause an image with the wrong color space.
The text was updated successfully, but these errors were encountered: