Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Image.tobytes() output is platform-dependent #5190

Closed
Lucas-C opened this issue Jan 7, 2021 · 5 comments
Closed

Image.tobytes() output is platform-dependent #5190

Lucas-C opened this issue Jan 7, 2021 · 5 comments
Labels

Comments

@Lucas-C
Copy link
Contributor

Lucas-C commented Jan 7, 2021

What did you do?

My minimal reproduction code:

from hashlib import md5
from PIL import Image

print(md5(Image.open('insert_images_insert_jpg.jpg').tobytes()).hexdigest())

The test image used comes from here: https://github.com/alexanderankin/pyfpdf/blob/master/test/image/image_types/insert_images_insert_jpg.jpg

What did you expect to happen?

I expected a consistent output of this program, however, when I execute this on my Windows 10 computer, I get:

# python --version
Python 3.7.7
# pip freeze | grep Pillow
Pillow==8.1.0
# python repro.py
0454372f0408b52e0141e316eece21b9

While when I run it inside the latest python Docker image, I get:

# python --version
Python 3.7.7
# pip freeze | grep Pillow
Pillow==8.1.0
# python repro.py
7f52c3112b24fe0f5d54520a3805e2df

What actually happened?

The output of the Image.bytes() method seems dependent of the environment.

What are your OS, Python and Pillow versions?

  • OS: varying -> Windows 10 / Linux
  • Python: Python 3.7.7 (the Python version does not change the hash from my tests)
  • Pillow: 8.1.0

Before even thinking about a fix, I'm mainly looking for an explanation on this inconsistent behaviour:
what causes this variation?

@nulano
Copy link
Contributor

nulano commented Jan 7, 2021

This is very likely a duplicate of #3833 / #4686. In summary, wheels for the two systems use a different JPEG library (libjpeg-turbo on Windows and libjpeg elsewhere). Due to the lossy encoding, JPEG decoders are allowed to differ slightly between different libraries, which produces the difference you are seeing.

@Lucas-C
Copy link
Contributor Author

Lucas-C commented Jan 7, 2021

Alright! Thanks for the explanation, and for pointing me to those issues I did not find :)

As far as I'm concern, this can be closed then.

@Lucas-C
Copy link
Contributor Author

Lucas-C commented Jan 7, 2021

If that seems useful to you, I can submit a PR to explictely mentions that in Image.bytes() docstring though.

@radarhere radarhere added the JPEG label Jan 8, 2021
@radarhere
Copy link
Member

The explanation being provided is that the JPEG data is being loaded differently, not that it is being saved differently - nothing to do with tobytes().

In summary, wheels for the two systems use a different JPEG library (libjpeg-turbo on Windows and libjpeg elsewhere). Due to the lossy encoding, JPEG decoders are allowed to differ slightly between different libraries, which produces the difference you are seeing.

@Lucas-C
Copy link
Contributor Author

Lucas-C commented Jan 8, 2021

To me, @nulano answers seems to the point: if JPEG decoders differ slightly, the resulting decompressed picture may have slightly different pixels, and hence Image.bytes() output will vary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants