Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pnmtopng: fatal libpng error: Extra compressed data; adding imgdataopt #51

Closed
pts opened this issue Oct 4, 2017 · 10 comments
Closed

pnmtopng: fatal libpng error: Extra compressed data; adding imgdataopt #51

pts opened this issue Oct 4, 2017 · 10 comments
Assignees

Comments

@pts
Copy link
Owner

pts commented Oct 4, 2017

The image stream in bad_image_extra_data.pdf indeed contains extra bytes after the image data. The expected behavior would be truncating those extra bytes. What happens instead is pdfsizeopt calls sam2p, which calls pnmtopng, which fails with fatal error pnmtopng: fatal libpng error: Extra compressed data, making sam2p fail, making pdfsizeopt fail.

Image viewer qiv also indicates the error Extra compressed data on the corresponding PNG, but it at least shows the image.

@rbrito
Copy link

rbrito commented Oct 4, 2017 via email

@pts
Copy link
Owner Author

pts commented Oct 4, 2017

Yes, please attach the temporary .png files pdfsizeopt has created here, and please copy the console output of pdfsizeopt. sam2p shouldn't make a difference. pngtopnm may be different.

@pts
Copy link
Owner Author

pts commented Oct 5, 2017

A radical approach to fix these two bugs (#51 and #52) is replacing sam2p as a dependency of pdfsizeopt by a newly written tool named imgdataopt, which will provide a very small subset of the functionality of sam2p used by pdfsizeopt:

  • It can read PNG files generated by pdfsizeopt. (Please note that this is a small subset of the PNG format, e.g. it doesn't have interlacing, alpha channel or gamma correction.)
  • It can convert RGB images to grayscale and indexed (palette) etc.
  • It can convert an 8-bits-per-sample image to 4, 2 and 1.
  • It can write PNG files with smart predictor selection for each row (like sam2p -c:zip:15).
  • It can write PNG files with the None predictor in each row (like sam2p without the -c:zip:15 flag).
  • It can write PNG-like files without a per-row predictor specified. (This is compatible with /Filter /FlateDecode without any /Predictor.)
  • It works successfully even if the input image data is truncated, has the wrong checksum, or is too long.
  • It uses about 100 000 + 3 * width * (height + 6) bytes of memory (in addition to the code size). That is, it can keep an uncompressed RGB8 version of the image in memory. 100 kB is needed for the ZIP compression window (of 32 kB) and other buffers. (In fact, it uses even less memory: the multiplier 3 will be only 1 if the input image has the colorspace Gray or Indexed.)
  • It is a program written in C with only a few library dependencies (libc (-lc -lm) and zlib (-lz) only). C is for better portability, since the code complexity doesn't warrant classes or other C++ features. In fact, it is a common subset of C and C++, so it will compile with both gcc and g++.
  • Some of the low-level pixel data processing code (e.g. the hashing for palette generation) will be based on small parts of sam2p.
  • In addition to a source code release on GitHub, binaries for Linux i386, Win32 i386 and macOS i386 would be distributed as part of the pdfsizeopt_libexec archive package.
  • As a future improvement, it can read and write PNG-like files with the TIFF predictor (/Predictor 2).

Non-features:

  • Doing the extra slow and hard work of creating very small PNG files. This will still be done by pngout etc. called by pdfsizeopt.
  • Reading or writing PDF files. (pdfsizeopt would do the quick and easy conversion between PDF image objects and PNG files.)
  • Reading or writing file formats other than PNG or PDF.
  • Compression other than ZIP (/Filter /FlateDecode).
  • Using less memory than the uncompressed input image size as RGB8.

@rbrito
Copy link

rbrito commented Oct 6, 2017 via email

@rbrito
Copy link

rbrito commented Oct 6, 2017

I just commented out all the calls to os.remove and here is what I got:

$ ~/Downloads/pdfsizeopt/pdfsizeopt --use-pngout=no --use-multivalent=no bad_image_extra_data.pdf 
info: This is pdfsizeopt rUNKNOWN size=378713.
info: prepending to PATH: /home/rbrito/Downloads/pdfsizeopt
info: loading PDF from: bad_image_extra_data.pdf
info: loaded PDF of 24810 bytes
info: separated to 5 objs + xref + trailer
info: parsed 5 objs
info: found 0 Type1 fonts loaded
info: found 0 Type1C fonts loaded
info: will optimize image XObject 4; orig width=681 height=250 colorspace=/DeviceRGB bpc=8 inv=False filter=/FlateDecode dp=1 size=24286 gs_device=png16m
info: saving PNG to psotmp.16495.img-4.parse.png
info: written 24130 bytes to PNG
info: optimizing 1 images of 24286 bytes in total
info: executing image converter sam2p_np: sam2p -pdf:2 -c zip:1:9 -s Gray1:Indexed1:Gray2:Indexed2:Rgb1:Gray4:Indexed4:Rgb2:Gray8:Indexed8:Rgb4:Rgb8:stop -- psotmp.16495.img-4.parse.png psotmp.16495.img-4.sam2p-np.pdf
This is sam2p .
Available Loaders: PS PDF JAI PNG JPEG TIFF PNM BMP GIF LBM XPM PCX TGA.
Available Appliers: XWD Meta Empty BMP PNG TIFF6 TIFF6-JAI JPEG-JAI JPEG PNM GIF89a+LZW XPM PSL1C PSL23+PDF PSL2+PDF-JAI P-TrOpBb.
libpng warning: IDAT: Extra compressed data
libpng warning: IDAT: Extra compressed data
sam2p: Notice: PNM: loaded alpha, but no transparent pixels
sam2p: Notice: job: read InputFile: psotmp.16495.img-4.parse.png
sam2p: Notice: writeTTT: using template: p02
sam2p: Notice: applyProfile: applied OutputRule #11
sam2p: Notice: job: written OutputFile: psotmp.16495.img-4.sam2p-np.pdf
Success.
info: loading image from: psotmp.16495.img-4.sam2p-np.pdf
info: loading PDF from: psotmp.16495.img-4.sam2p-np.pdf
info: loaded PDF of 20463 bytes
info: separated to 5 objs + xref + trailer
info: parsed 5 objs
info: loaded PNG IDAT of 19753 bytes
info: executing image converter sam2p_pr: sam2p -c zip:15:9 -- psotmp.16495.img-4.parse.png psotmp.16495.img-4.sam2p-pr.png
This is sam2p .
Available Loaders: PS PDF JAI PNG JPEG TIFF PNM BMP GIF LBM XPM PCX TGA.
Available Appliers: XWD Meta Empty BMP PNG TIFF6 TIFF6-JAI JPEG-JAI JPEG PNM GIF89a+LZW XPM PSL1C PSL23+PDF PSL2+PDF-JAI P-TrOpBb.
libpng warning: IDAT: Extra compressed data
libpng warning: IDAT: Extra compressed data
sam2p: Notice: PNM: loaded alpha, but no transparent pixels
sam2p: Notice: job: read InputFile: psotmp.16495.img-4.parse.png
sam2p: Notice: applyProfile: applied OutputRule #14
sam2p: Notice: job: written OutputFile: psotmp.16495.img-4.sam2p-pr.png
Success.
info: loading image from: psotmp.16495.img-4.sam2p-pr.png
info: loaded PNG IDAT of 23714 bytes
info: optimized image XObject 4 file_name=psotmp.16495.img-4.sam2p-np.pdf size=19916 (82%) methods=sam2p_np:19916,sam2p_pr:23927,#orig:24286,parse:24286
info: saved 4370 bytes (18%) on optimizable images
info: optimized 1 streams, kept 1 #orig
info: compressed 1 streams, kept 0 of them uncompressed
info: saving PDF with 5 objs to: bad_image_extra_data.pso.pdf
info: generated object stream of 161 bytes in 3 objects (33%)
info: generated 20375 bytes (82%)
rbrito@zatz:/tmp/test$

I will attach the contents of the directory as a tarball.

@rbrito
Copy link

rbrito commented Oct 6, 2017

test.tar.gz

Here they go.

@rbrito
Copy link

rbrito commented Oct 6, 2017

Is sam2p calling pnmtopng here? I will test manually to see if pnmtopng dies or not...

@rbrito
Copy link

rbrito commented Oct 6, 2017

The file with parse in the name is, according to optipng, indeed broken: it tells us to process it with the -fix option. advpng doesn't even care if there is any problem and goes ahead...

@pts pts changed the title pnmtopng: fatal libpng error: Extra compressed data pnmtopng: fatal libpng error: Extra compressed data; adding imgdataopt Nov 14, 2017
@pts pts self-assigned this Nov 14, 2017
@pts
Copy link
Owner Author

pts commented Jun 19, 2018

Yes, sam2p calls png22pnm, and if that fails, sam2p calls pngtopnm. The pnmtopng string in the error message is a bug in these tools, it should say png22pnm or pngtopnm, respectively. pnmtopng is not called by sam2p or pdfsizeopt.

Thank you for the uploads!

@pts pts closed this as completed in 3dca3da Feb 24, 2023
@pts
Copy link
Owner Author

pts commented Feb 24, 2023

Indeed, using imgdataopt instead of sam2p fixes the problem, because imgdataopt ignores extra data after the image data. (It also ignores the Adler-32 checksum.)

The change has been rolled out for Linux, Win32 and macOS program binaries (i.e. sam2p was change to imgdataopt, without renaming it), and to instructions in README.md for compiling from source. Thus this issue is fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants