Rebase onto 1.3 #42

busykai · 2023-11-19T07:58:40Z

This is fully functional and tests are passing. I still need to review it one more time to make sure everything's in order. It was tricky mostly because of refactoring in the upstream.

CC: @jtkukunas

Signed-off-by: Marcus Müller <[email protected]>

memLevel 9 would cause deflateBound() to assume the use of fixed blocks, even if the compression level was 0, which forces stored blocks. That could result in a bound less than the size of the compressed data. Now level 0 always uses the stored blocks bound.

gzsetparams() now returns a Z_STREAM_ERROR in this case.

Z_SOLO defines z_size_t as an unsigned long. However Windows and MinGW-w64 are LLP64, where a long is 32 bits, but a size_t is 64 bits. This makes z_size_t, used by adler32_z() and crc32_z(), 64 bits on those systems.

The inflate() functions never leave state->bits greater than 24, so an inflatePrime() call could not cause this. The only way this could have happened would be by using inflatePrime() to fill the bit buffer with 32 bits, and then calling inflatePrime() a *second* time asking to insert zero bits, for some reason. This commit assures that a shift by 32 bits does not occur even in that case.

This supports zlib versions earlier than 1.2.3 (July 2005), as well as incomplete clones of zlib that do not have inflatePrime().

This is a temporary workaround before excising the K&R prototypes.

C2X has removed K&R definitions from the C function syntax. Though the standard has not yet been approved, some high-profile compilers are now issuing warnings when such definitions are encountered.

The comment said that the os is set to 255, when in fact it has been set to the current os since zlib 1.2.3. Or at least our best guess at the os made at compile time.

and Reset functions.

zlib does not use time_t, so _TIME_BITS is irrelevant. However it may be defined anyway as part of a sledgehammer indiscriminately applied to all builds.

In some (very rare) scenarios, the SIMD code in the `crc_folding` module can perform out-of-bounds reads or writes, which could lead to GPF crashes. Here's the deal: when the `crc_fold_copy` function is called with a non-zero `len` argument of less then 16, `src` is read through `_mm_loadu_si128` which always reads 16 bytes. If the `src` pointer points to a location which contains `len` bytes, but any of the `16 - len` out-of-bounds bytes falls in unmapped memory, this operation will trigger a GPF. The same goes for the `dst` pointer when written to through `_mm_storeu_si128`. With this patch applied, the crash no longer occurs. We first discovered this issue though Valgrind reporting an out-of-bounds access while running a unit-test for some code derived from `crc_fold_copy`. In general, the out-of-bounds read is not an issue because reads only occur in sections which are definitely mapped (assuming page size is a multiple of 16), and garbage bytes are ignored. While giving this some more thought we realized for small `len` values and `src` or `dst` pointers at a very specific place in the address space can lead to GPFs. - Minor tweaks to merge request by Jim Kukunas <[email protected]> - removed C11-isms - use unaligned load - better integrated w/ zlib (use zalign) - removed full example code from commit msg

With deflate, the only destination for crc folding was the window, which was guaranteed to have an extra 15B of padding (because we allocated it). This padding allowed us to handle the partial store case (len < 16) with a regular SSE store. For inflate, this is no longer the case. For crc folding to be efficient, it needs to operate on large chunks of the data each call. For inflate, this means copying the decompressed data out to the user-provided output buffer (moreso with our reorganized window). Since it's user-provided, we don't have the padding guarantee and therefore need to fallback to a slower method of handling partial length stores.

The deflate_quick strategy is designed to provide maximum deflate performance. deflate_quick achieves this through: - only checking the first hash match - using a small inline SSE4.2-optimized longest_match - forcing a window size of 8K, and using a precomputed dist/len table - forcing the static Huffman tree and emitting codes immediately instead of tallying This patch changes the scope of flush_pending, bi_windup, and static_ltree to ZLIB_INTERNAL and moves END_BLOCK, send_code, put_short, and send_bits to deflate.h. Updates the configure script to enable by default for x86. On systems without SSE4.2, fallback is to deflate_fast strategy. Fixes #6 Fixes #8

From: Arjan van de Ven <[email protected]> As the name suggests, the deflate_medium deflate strategy is designed to provide an intermediate strategy between deflate_fast and deflate_slow. After finding two adjacent matches, deflate_medium scans left from the second match in order to determine whether a better match can be formed. Fixes #2

(Note emit_match() doesn't currently use the value at all.) Fixes #4

Merges branch issue3.

…pilation. Minor tweak by Jim Kukunas, dropping changes to configure script

This commit significantly improves inflate performance by reorganizing the window buffer into a contiguous window and pending output buffer. The goal of this layout is to reduce branching, improve cache locality, and enable for the use of crc folding with gzip input. The window buffer is allocated as a multiple of the user-selected window size. In this commit, a factor of 4 is utilized. The layout of the window buffer is divided into two sections. The first section, window offset [0, wsize), is reserved for history that has already been output. The second section, window offset [wsize, 4 * wsize), is reserved for buffering pending output that hasn't been flushed to the user's output buffer yet. The history section grows downwards, towards the window offset of 0. The pending output section grows upwards, towards the end of the buffer. As a result, all of the possible distance/length data that may need to be copied is contiguous. This removes the need to stitch together output from 2 separate buffers. In the case of gzip input, crc folding is used to copy the pending output to the user's buffers.

…ny residue

Since the inflate optimizations depend on an efficient memcpy routine, add an optimized version for platforms that don't have one.

window anymore

This fixes an issue where a repeat sequence longer than 258 would be encoded using longer distance values after the first match.

When we load new data into the window, we invalidate the next match, in case the match would improve. In this case, the hash has already been updated with this data, so when we look for a new match it will point it back at itself. As a result, a literal is generated even when a better match is available. This avoids that by catching this case and ensuring we're looking at the past.

During some CA work in the dotnet runtime project we discovered the VC++ code analysis has trouble determining the follow value is properly initialized prior to use. Instead of adding a suppression it seemed easier to simply initialize it. See dotnet/runtime#49194 (comment) for context.

…e current window size.

The `match` pointer can be dereferenced outside the `s->window` buffer, so add a boundary check to avoid the issue. The issue was detected with valgrind. For example, running `gzip FILE` with valgrind on Clear Linux OS for a FILE that reproduces the issue: ==1610872== Thread 3: ==1610872== Invalid read of size 1 ==1610872== at 0x49E9A98: fizzle_matches (deflate_medium.c:150) ==1610872== by 0x49E9A98: deflate_medium (deflate_medium.c:281) ==1610872== by 0x49EC1E7: deflate (deflate.c:1015) ==1610872== by 0x1195D8: UnknownInlinedFun (pigz.c:1602) ==1610872== by 0x1195D8: compress_thread (pigz.c:1752) ==1610872== by 0x11AFD1: ignition (yarn.c:253) ==1610872== by 0x49CB50E: start_thread (pthread_create.c:481) ==1610872== by 0x4B43B02: clone (in /usr/lib64/haswell/libc-2.33.so) ==1610872== Address 0x4c6369f is 1 bytes before a block of size 65,552 alloc'd ==1610872== at 0x48447DA: malloc (vg_replace_malloc.c:380) ==1610872== by 0x49EAB4C: deflateInit2_ (deflate.c:319) ==1610872== by 0x11933D: compress_thread (pigz.c:1637) ==1610872== by 0x11AFD1: ignition (yarn.c:253) ==1610872== by 0x49CB50E: start_thread (pthread_create.c:481) ==1610872== by 0x4B43B02: clone (in /usr/lib64/haswell/libc-2.33.so) The out-of-bounds check for `orig` has been added for completeness.

As well as their compilation options.

madler and others added 30 commits October 15, 2022 09:02

Change version number on develop branch to 1.2.13.1.

41fda48

Update broken article link in README.

76820e4

Signed-off-by: Marcus Müller <[email protected]>

Fix crash when gzsetparams() attempted for transparent write.

02a6049

gzsetparams() now returns a Z_STREAM_ERROR in this case.

Update zlib_how.html to match the web page, and add a license.

a9b4c1d

Make z_size_t 64 bits when compiling on Windows with Z_SOLO.

fa8cd50

Z_SOLO defines z_size_t as an unsigned long. However Windows and MinGW-w64 are LLP64, where a long is 32 bits, but a size_t is 64 bits. This makes z_size_t, used by adler32_z() and crc32_z(), 64 bits on those systems.

Rewrite of zran in examples. See version history in zran.c.

eb0e038

Add a NOPRIME #define to zran.c to not use inflatePrime().

7e6dc42

This supports zlib versions earlier than 1.2.3 (July 2005), as well as incomplete clones of zlib that do not have inflatePrime().

Turn off C2X warning about deprecated K&R function syntax.

5799c14

This is a temporary workaround before excising the K&R prototypes.

Remove K&R function definitions from zlib.

e9d5486

C2X has removed K&R definitions from the C function syntax. Though the standard has not yet been approved, some high-profile compilers are now issuing warnings when such definitions are encountered.

Remove K&R function definitions from infback9.

c4aa356

Remove K&R function definitions from contrib/minizip.

bf2578b

Remove use of OF() from contrib/untgz and render it compilable.

6658868

No include file is needed for __int64 type on Windows.

9b962a4

Fix cast in minizip's ioapi.c for Windows.

05527a1

Fix warnings on test/infcover.c.

3365464

Fix test/example.c to work with FORCE_STORED.

b8a8373

Update Java and Perl links in README.

904016e

Remove duplicate "the" in zlib.h.

48c3741

Correct comment in zlib.h on os setting in gzip header.

263a982

The comment said that the os is set to 255, when in fact it has been set to the current os since zlib 1.2.3. Or at least our best guess at the os made at compile time.

Document in zlib.h the initialization of stream fields by the Init

a88f727

and Reset functions.

Avoid compiler complaints if _TIME_BITS defined when building zlib.

a566e15

zlib does not use time_t, so _TIME_BITS is irrelevant. However it may be defined anyway as part of a sledgehammer indiscriminately applied to all builds.

minizip: Fix being unable to open empty zip file

f209ca7

Remove duplicated code #806

7b28ecc

Fix reading disk number start on zip64 files in minizip.

e0bd0ad

Fix typos found by codespell in minizip

379bbda

Fix logic error in minizip argument processing.

3061e50

Match sign of printf directive to sign of argument in minizip.

dcd0d86

Match sign of printf directive to sign of argument in testzlib.

b3f23f7

NicolasT and others added 24 commits November 19, 2023 06:52

deflate.c fix for window bits > 13

30c425d

deflate: avoid use of uninitialized variable

e272cef

(Note emit_match() doesn't currently use the value at all.) Fixes #4

check whether match or orig go too far back

1307fa5

Merges branch issue3.

Include wmmintrin.h in configure test and crc_folding.c aid clang com…

7d2cf54

…pilation. Minor tweak by Jim Kukunas, dropping changes to configure script

in the case of overlap copy mempcy dist bytes then byte by byte for a…

e187440

…ny residue

Add Intel's optimized RTE memcpy

3e6deab

Since the inflate optimizations depend on an efficient memcpy routine, add an optimized version for platforms that don't have one.

integrate Intel's RTE memcpy with zmemcpy

e2737e2

infcover: remove OoM test for SetDictionary since we don't lazy alloc

468ea13

window anymore

update gitignore

0087e7a

reorganize longest_match

6196de0

add zalways_inline for msvc and gcc

c32ad24

force inline std2_longest_match

a0b9f7c

inflate: fix MSVC compiler warnings

df5fdcd

inflate: handle windowBits == 16

9eb7d84

deflate_medium: add dist -1 to hash even for long matches

56ccda7

This fixes an issue where a repeat sequence longer than 258 would be encoded using longer distance values after the first match.

Replaced constant MAX_DIST2 with MAX_DIST which takes into account th…

a8b70db

…e current window size.

busykai force-pushed the kai/rebase-on-1.3 branch 3 times, most recently from c297573 to e23cb78 Compare November 20, 2023 00:57

busykai added 2 commits November 20, 2023 01:01

Fix CMakeLists.txt to add the source files.

e14e9fb

As well as their compilation options.

CI: remove the platforms we don't support.

e23cb78

jtkukunas force-pushed the master branch from 2c04799 to 6a0a605 Compare March 1, 2024 16:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rebase onto 1.3 #42

Rebase onto 1.3 #42

busykai commented Nov 19, 2023

Rebase onto 1.3 #42

Are you sure you want to change the base?

Rebase onto 1.3 #42

Conversation

busykai commented Nov 19, 2023