Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Removing the backup option #542

Closed
andrews05 opened this issue Jul 28, 2023 · 25 comments
Closed

RFC: Removing the backup option #542

andrews05 opened this issue Jul 28, 2023 · 25 comments

Comments

@andrews05
Copy link
Collaborator

andrews05 commented Jul 28, 2023

Hi all, I'm hoping to get some feedback from users regarding the backup option. @AlexTMjugador and I very briefly discussed the possibility of removing this, so I want to outline why the idea came up and some reasons why we might want to do this.

The issue:
For user's of the API, the backup option is currently in the wrong place. It's provided as part of the main Options struct for all uses of the API, even though it's only relevant when outputting to a file. It would be appropriate to move the backup option to be part of the OutFile instead but the downside is that this makes the API more complicated to use.

Why we might want to remove it

  • Removing it would reduce complexity.
  • There's a strong possibility that nobody actually uses it.
  • There's no particular use case for it.
  • It's not typical practice for a program like this to provide such a feature.
  • As best I understand, the feature exists in the first place because it was inherited from Optipng, which oxipng was originally based on. This is fine, but doesn't mean we need to keep it today.

Why we might want to keep it

  • It's not really hurting - the API can happily remain as is.
  • Somebody might be using it?

Alternatives for people who might be using it

  • Output to a separate file/directory so as not to overwrite the original.
  • Copy the file beforehand.
  • Use proper backup software.
  • Don't bother? Oxipng is super reliable these days and with an extensive test suite and no known corruption issues.

Any comments on this would be welcome 🙂

@shssoichiro Definitely keen to get your input also.

@TPS
Copy link

TPS commented Jul 29, 2023

I fall heavily into the "Keep It" camp & do whatever you need to make it maintainable.

My biggest reason is something I didn't see there nor above: it's an absolutely breaking change, & typically wouldn't be noticed by someone who updates to this version (eventually, ahem) until something in their workflow breaks, & they must have those presumably backed-up files.

@andrews05
Copy link
Collaborator Author

Thanks for the input!
Breaking changes needn't be a concern. That's the whole point of semantic versioning and bumping the major version number - it signals that there are breaking changes and users should check the release notes to see how it might affect them.
There are already breaking changes for the next release, such as deinterlacing by default. Removing the backup option would actually be quite safe in the sense that if you tried to use it then it would error out due to an unrecognised option.

@ace-dent
Copy link

ace-dent commented Jul 30, 2023

+1 for removing backup. Seems odd and unnecessary.
Please make sure that oxipng self-destructs if called with unknown parameters! 🔥
(i.e. User will become aware that -backup was removed if oxipng doesn't run and reports an error to the CLI).

PS - Related side topic: When would I use -check instead of running pngcheck? Is -pretend ever used? You may also want to reconsider -fix... Greg removed it from pngcheck to reduce attack surface.

@andrews05
Copy link
Collaborator Author

andrews05 commented Jul 30, 2023

@ace-dent good points!

  • Unknown parameters: It does indeed error out.
  • Check: This was added in Add check option #439, you can read the reasoning there. It does seem a little odd to me though.
  • Pretend: I use this all the time when testing the code 😄. Not sure what the use case would be otherwise though.
  • Fix: I'm inclined to agree. Currently all this does is skip the CRC check on each chunk and I can't imagine there could any security issues with this, but dealing with invalid files is probably best left to a different program. In Optipng I believe the fix option would skip all libpng warnings, which could potentially do a lot more though I'm not sure what exactly.

@AlexTMjugador do you have any thoughts on these?

@andrews05
Copy link
Collaborator Author

andrews05 commented Jul 31, 2023

A note regarding check: This can be effectively simulated with --pretend --nx --nz -i keep which will disable all optimisation. (Or at least it will after #543)

@AlexTMjugador
Copy link
Collaborator

I'm all for removing -check. I think that would simplify OxiPNG with little side effects, as pngcheck can be used as a better replacement for it.

-pretend can be useful for benchmarking performance before optimizing images, so advanced users have valid use cases for it that otherwise can't be done as well. (Most operating systems have an equivalent null sink to Unix's /dev/null, but doing file operations on that is more costly than doing nothing, so it's not entirely the same thing for benchmarks.)

The -fix option is very niche, but very useful for those niches. Given that it deals with obscure CRC corruption cases, it's difficult to find reliable tools that handle such corruption in a equivalent way. Moreover, ignoring chunk CRCs is very handy for fuzzing OxiPNG, as otherwise interesting fuzzer inputs would be discarded due to CRC mismatches.

@ace-dent
Copy link

ace-dent commented Aug 1, 2023

The -fix option is very niche, but very useful for those niches.

-- In the near 20yrs of experimenting with pngs, I have never found a broken CRC in the wild; only synthetic test images. I'm curious if anyone has ever genuinely found one?

... ignoring chunk CRCs is very handy for fuzzing OxiPNG...

-- A fair point. I just don't think fix is a good name for ignore-crc... Perhaps this (slight) misnomer can be addressed with tweaked Documentation / help?

@andrews05
Copy link
Collaborator Author

andrews05 commented Aug 2, 2023

-- In the near 20yrs of experimenting with pngs, I have never found a broken CRC in the wild; only synthetic test images. I'm curious if anyone has ever genuinely found one?

Yeah, was going to say the same. Data corruption (or bad png encoders) might have been a thing back in the nineties but it's effectively unheard of today. [edit] Bad png encoders do still happen, apparently.
Again keep in mind that oxipng was originally designed to replicate optipng, which included the fix option. Presumably the reason it specifically applies to CRCs here is simply because that's the most obvious/easy error to recover from, not because it was a commonly encountered error.

@ace-dent
Copy link

ace-dent commented Aug 2, 2023

Presumably the reason it specifically applies to CRCs here is simply because that's the most obvious/easy error to recover from ...

-- To state the obvious, in the real-world, if you cannot trust the CRC... I wouldn't want to trust what else of the bitstream is or isn't broken... 😅

@AlexTMjugador
Copy link
Collaborator

[...] Data corruption (or bad png encoders) might have been a thing back in the nineties but it's effectively unheard of today.

For what it's worth, I've found from listening to user feedback on my apps that some popular PNG encoders (namely, those found in some versions of Photoshop and other image manipulation programs) do stupid things like adding trailing bytes to the PNG files they generate, even though they violate the PNG specification. Of course, this does not mean that such encoders are broken to the point of generating chunks with bad CRCs, but this experience has shown me that it's not surprising to find blatant errors in PNG encoders in current usage.

Anyway, given that the maintenance cost of ignoring CRCs is low, that such an option is useful for fuzzing, and that safe Rust prevents memory safety problems that might arise from parsing potentially bad input, I'd just leave that option alone.

@Tristan971
Copy link
Contributor

Data corruption (or bad png encoders) might have been a thing back in the nineties but it's effectively unheard of today.

That is unfortunately not true. We originally PR'd the check option because we found ourselves regularly receiving PNG files from users that would result in optimization failures, with the main cause being Adler32 CRC mismatches. We see those errors around once a month on our production (so around ~1/350,000 PNG files we process).

We thus needed to do proper validation of those files. oxipng was already around and --pretend didn't fit the bill performance-wise, so we added --check. I elaborated a bit more in my answer to @andrews05 in #439 (comment)

That said, I don't think oxipng needs to be able to "fix" them. Merely meant to point out that it does happen in practice. We're plenty happy to just reject the file.

As for what encoders are at fault, in our experience it's usually some image manipulation programs. Wouldn't blame any in particular though, given it's hard to be sure how exactly "affected" users might have modified them.

@andrews05
Copy link
Collaborator Author

That's really interesting to hear, thanks for sharing! Are you saying that the images would be perfectly valid if the CRC was "fixed", or is that unknown?

@Tristan971
Copy link
Contributor

Tristan971 commented Aug 4, 2023

Are you saying that the images would be perfectly valid if the CRC was "fixed", or is that unknown?

We don't collect samples in general, alas [1]. So I can't say for sure for all of them or give a breakdown.

But here's a sample file from our test suite: https://github.com/shssoichiro/oxipng/assets/875533/8587151b-c4cc-4215-b080-dcdb32c4577d

It does successfully render in most image viewers I try, but it fails oxipng processing (in the past with the adler32 error, nowadays with a less explicit 'Invalid data found; unable to read PNG file').

[1]: because ensuring perfect 1:1 pixel post-processing result is our main goal, so we reject anything even a little bit broken and don't take chances on automated fixes of any kind, in case they'd corrupt things and we'd realize it only months later with thousands of images subtly broken

@andrews05
Copy link
Collaborator Author

Oh I see, we've got slightly mixed up here. Adler32 is the zlib checksum; crc32 the PNG chunk checksum. The fix option only applies to the PNG chunks, so it doesn't work for adler32 errors anyway.

@Tristan971
Copy link
Contributor

Tristan971 commented Aug 4, 2023

Sure, we're not using (or interested in) fix anyway. Just pointed out that broken encoders do still exist in the wild (if you consider the zlib pass as part of encoding anyway) 😅

@andrews05
Copy link
Collaborator Author

Yeah, for sure. I'd love to find out what they are and get them to fix it 😄

@Tristan971
Copy link
Contributor

Tristan971 commented Aug 4, 2023

This is probably getting a bit off-topic but since you asked...

On one of the very few documented cases we have (user came to us for support, then insisted to take the time to reproduce the issue instead of just reexporting differently), it was Windows (no version information, but looks like Windows 10 from the window decorations) with GIMP 2.10.10, producing files failing with CRC Mismatch in iCCP header for the following settings:

(unfortunately, we didn't collect the source project files at the time, so yeah... not super useful)

@andrews05
Copy link
Collaborator Author

andrews05 commented Aug 4, 2023

Nice, thanks.
I researched this one and found it traced back to a bug in exiv2 library. It's long since been fixed, but apparently existed for a while and resulted in the bug manifesting in programs using that library, including GIMP and digiKam, which in turn resulted in a lot of bad PNGs being created and escaping into the wild.

I haven't yet found anything related to bad Adler32 checksums, other than it's actually "normal" for PNG decoders to skip these anyway since it's redundant. Perhaps we should make fix apply to this too (except I don't think libdeflate will allow it).

@Tristan971
Copy link
Contributor

Tristan971 commented Aug 5, 2023

Oh nice find indeed! Wonder how many people will run buggy versions of those for years, unfortunately...

I haven't yet found anything related to bad Adler32 checksums, other than it's actually "normal" for PNG decoders to skip these anyway since it's redundant. Perhaps we should make fix apply to this too (except I don't think libdeflate will allow it).

Personally I quite like overzealous checks for our use-case, as we've seen many wild edge-cases technically permitted by specs but breaking things badly, but we understand if it's a pain to maintain and is considered FP-like.

@andrews05
Copy link
Collaborator Author

Don’t worry, it’s not something we currently have control over since the deflate library we use doesn’t expose any options for it. And even so we would only want to skip it if —fix was set.

@andrews05
Copy link
Collaborator Author

@Tristan971 I just had a thought: Do you get people uploading APNGs at all? The current release of oxipng rejects these as unsupported but they will be supported in the next release. Would that be undesirable for your checks, or will it not matter?

@Tristan971
Copy link
Contributor

Tristan971 commented Aug 8, 2023

@andrews05 Yes, but we are relying on imagemagick’s identify tool to reject them so far, so that won’t be a problem 👍 (also glad to hear oxipng will support them; maybe we’ll revisit our own policy in that regard)

@andrews05
Copy link
Collaborator Author

Cool.
This may not be helpful but for reference, one option for dealing with APNGs is to use either --keep or --strip to remove all animation data, turning them into a standard single-frame png.

@Tristan971
Copy link
Contributor

We’ll look at it when and if we decide to allow these. For GIFs we always pick the first frame for non-animated thumbnailing purpsoes, but I know some animated formats allow things like different canvas sizes per-frame and we always have to keep this kind of thing in mind (since we do enforce resolution limits). We’ll see, thanks for the mention anyway!

AlexTMjugador pushed a commit that referenced this issue Sep 26, 2023
Tidy up the API by removing a couple of options we don't really need.
Backup was discussed in #542
Check was discussed in  #439

@shssoichiro Just say if you prefer to keep either of these 🙂
@andrews05
Copy link
Collaborator Author

This has been done in v9.

Pr0methean pushed a commit to Pr0methean/oxipng that referenced this issue Dec 1, 2023
Tidy up the API by removing a couple of options we don't really need.
Backup was discussed in shssoichiro#542
Check was discussed in  shssoichiro#439

@shssoichiro Just say if you prefer to keep either of these 🙂
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants