Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve deflate support #132

Merged
merged 1 commit into from
May 14, 2021
Merged

Conversation

RReverser
Copy link
Contributor

@RReverser RReverser commented May 8, 2021

  • Switch to streaming decompression via flate2. Aside from performance improvements and lower RAM consumption, this fixes a bug where max_uncompressed_length was precalculated for a single tile but then used as a hard limit on the whole data, failing to decompress any tiled images.
  • Deprecate InflateError - this was used for wrapping miniz_oxide raw status, but flate2 already reports them as io::Error instead. This makes errors from both Lzw and Deflate modes more consistent - they're all now just io::Error.
  • Add support for new Deflate tag in addition to OldDeflate - format-wise it's same Zlib, and on practice decodes successfully with the same underlying code.
  • Slightly simplify partial reads from the Reader by using the std::io::copy helper instead of doing it manually.

Fixes #131.

 - Switch to streaming decompression via `flate2`. Aside from performance improvements and lower RAM consumption, this fixes a bug where `max_uncompressed_length` was precalculated for a single tile but then used as a hard limit on the whole data, failing to decompress any tiled images.
 - Deprecate `InflateError` - this was used for wrapping `miniz_oxide` raw status, but `flate2` already reports them as `io::Error` instead. This makes errors from both Lzw and Deflate modes more consistent - they're all now just `io::Error`.
 - Add support for new `Deflate` tag in addition to `OldDeflate` - format-wise it's same Zlib, and on practice decodes successfully with the same underlying code.
 - Slightly simplify partial reads from the `Reader` by using the `std::io::copy` helper instead of doing it manually.

Fixes image-rs#131.
@fintelia
Copy link
Contributor

fintelia commented May 8, 2021

Thanks for working on this! This broadly looks good to me, but I'm slightly nervous about switching to flate2. There seems to be outstanding soundness concerns about the library. image-rs tends to take these sorts of things rather seriously both because image decoding can be security critical, and because there very high quality and well audited C/C++ image decoders that would be natural to use if people were comfortable with code that is "unsafe but probably fine."

I do really like the switch to std::io::copy, I hadn't known about that function.

Finally, before merging I would like to track down some documentation on the various deflate tags are why the different ones exist. Just to make sure we're not overlooking anything.

@RReverser
Copy link
Contributor Author

There seems to be outstanding soundness concerns about the library.

IIRC, those are concerns if the code changes, right now it's working fine - to the same extent as any wrapping around unsafe code works fine as long as some guarantees uphold. I agree switching to more robust wrappers like Vec or MaybeUninit would be even better on the flate2 side though.

That is to say, I agree upstream code safety could be improved further, but my understanding is it's safe in its current state and I'm a bit worried that building similar streaming wrapper around miniz_oxide would only increase the risk of running into same bugs that have been already solved in flate2 and tested by the rest of the ecosystem that relies on it directly.

Finally, before merging I would like to track down some documentation on the various deflate tags are why the different ones exist.

Yeah, I'm not sure myself - that's what I tried to figure out first by looking up those different Deflate tags, but other than "it was specified in different documents / by different vendors" I personally couldn't find any meaningful explanation of why two exist. Then, I decided to just look at the extracted raw data and saw that it has the same Zlib magic, and decided to try and verify that it indeed works with the same Zlib code.

Note that thanks to Zlib using a magic header instead of just being raw DEFLATE stream, worst case scenario is, raw data won't match Zlib format and will still fail to parse even after this PR is merged - but on the images I've had and tested on so far it worked successfully.

Copy link
Contributor

@fintelia fintelia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just checked the libtiff source code, and they seem to handle the two deflate tags the same. We should be good on that front.

I'm also now noticing that flate2 is in the rust-lang organization which makes me feel much better about depending on it. Lets wait just a bit longer in case anyone else raises concerns, and otherwise I think this is ready to merge.

@RReverser
Copy link
Contributor Author

I just checked the libtiff source code, and they seem to handle the two deflate tags the same.

Ah, good to know, thanks for checking!

@RReverser
Copy link
Contributor Author

RReverser commented May 8, 2021

Hm, something was nagging me, so I decided to look up the implementation, and, indeed std::io::copy uses a stack-allocated buffer in-between the reader and writer: https://github.com/rust-lang/rust/blob/ff34b919075f35a1787659e9c448a34b06bab8de/library/std/src/io/copy.rs#L128-L156

It has specialization to reuse buffer for BufWriter but not one for slices, so in the code changed in this PR there will be now two memcpys instead of one.

I think in real-world scenarios it will be negligible, but thought I'd bring it up in case you prefer that part to be restored back to a custom loop.

@RReverser
Copy link
Contributor Author

Lets wait just a bit longer in case anyone else raises concerns, and otherwise I think this is ready to merge.

@fintelia Do you think this is good to merge or did you mean to wait a bit longer? Once this part is released, I'd like to make a PR to the image crate to update TIFF dependency too (looks like it's a bit behind API-wise).

@fintelia fintelia merged commit e70121e into image-rs:master May 14, 2021
@RReverser RReverser deleted the deflate-improvements branch May 14, 2021 22:06
@RReverser
Copy link
Contributor Author

@fintelia Thanks! In terms of the release - should we wait for 0.7.0 as per #130 (and, correspondingly, #129 as per #130 (comment)) or would this constitute a 0.6.2?

@fintelia
Copy link
Contributor

I don't think this warrants back-porting to 0.6.x, but we can hopefully get 0.7.0 out soon. I'll try to look over and resolve #129 this weekend.

@RReverser
Copy link
Contributor Author

Thanks, fair enough.

RReverser added a commit to RReverser/image-tiff that referenced this pull request May 15, 2021
It was left in but marked as deprecated as part of image-rs#132 to avoid breaking API changes.

However, image-rs#130 is going to mark a 0.7.0 release, so we can remove it altogether for simplicity.
@RReverser RReverser mentioned this pull request May 15, 2021
phil-opp added a commit to phil-opp/image-tiff that referenced this pull request May 16, 2021
phil-opp added a commit to phil-opp/image-tiff that referenced this pull request May 16, 2021
HeroicKatora pushed a commit to phil-opp/image-tiff that referenced this pull request May 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Compression method Deflate is unsupported
2 participants