Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add zlib-ng and refactor gzip open functions. #124

Closed
wants to merge 18 commits into from

Conversation

rhpvorderman
Copy link
Collaborator

@rhpvorderman rhpvorderman commented Feb 3, 2023

I added zlib-ng. This is great, but now we also have three competing library implementations for opening deflate compressed files:

  • zlib
  • ISA-L
  • zlib-ng

Each of these has their own implementation programs as well. This means the choice for the "best" application becomes more situational. Also it is hard to correctly set all the initialization options.

I have therefore refactored the code. A list of preferred implementations is made. Then using a for loop and a try except the list is tried in order for an application that works. Application-specific options are set using functool.partial

This leads to more verbose code, but also in much simpler logic. Therefore this solution scales better. Since I expect crabz to be added soon, it is much easier to do that in this codebase.

Also this will work wonders for coverage as all the individual PipedCompression Readers and Writers are tested separately anyway and (almost) all the code in the opening functions is touched as there is only a single opening code line within a for loop.
It should be much more robust this way.

@rhpvorderman
Copy link
Collaborator Author

benchmarks again:
Before

Compressed at level 1 with 0 threads; filesize: 405313510, time: 3.0943868160247803 seconds
Compressed at level 2 with 0 threads; filesize: 404989250, time: 3.2503600120544434 seconds
Compressed at level 3 with 0 threads; filesize: 407253483, time: 8.559628248214722 seconds
Compressed at level 4 with 0 threads; filesize: 386930500, time: 22.203505277633667 seconds
Compressed at level 5 with 0 threads; filesize: 374207207, time: 39.5833375453949 seconds
Compressed at level 6 with 0 threads; filesize: 355347468, time: 98.56518363952637 seconds
Compressed at level 1 with 1 threads; filesize: 405313274, time: 3.134148597717285 seconds
Compressed at level 2 with 1 threads; filesize: 404989061, time: 3.2902400493621826 seconds
Compressed at level 3 with 1 threads; filesize: 407251397, time: 8.680283308029175 seconds
Compressed at level 4 with 1 threads; filesize: 387016236, time: 22.396305322647095 seconds
Compressed at level 5 with 1 threads; filesize: 374290891, time: 40.33376383781433 seconds
Compressed at level 6 with 1 threads; filesize: 355500797, time: 100.41857743263245 seconds
Compressed at level 1 with 4 threads; filesize: 405313274, time: 3.1683871746063232 seconds
Compressed at level 2 with 4 threads; filesize: 404989061, time: 3.349888563156128 seconds
Compressed at level 3 with 4 threads; filesize: 407251397, time: 8.81507396697998 seconds
Compressed at level 4 with 4 threads; filesize: 387016236, time: 5.959070682525635 seconds
Compressed at level 5 with 4 threads; filesize: 374290891, time: 11.105888366699219 seconds
Compressed at level 6 with 4 threads; filesize: 355500797, time: 27.333863973617554 seconds

After:

Compressed at level 1 with 0 threads; filesize: 405313510, time: 3.118347406387329 seconds
Compressed at level 2 with 0 threads; filesize: 404989250, time: 3.301008939743042 seconds
Compressed at level 3 with 0 threads; filesize: 376289859, time: 20.477195024490356 seconds
Compressed at level 4 with 0 threads; filesize: 368802018, time: 17.948566198349 seconds
Compressed at level 5 with 0 threads; filesize: 359170597, time: 24.44700598716736 seconds
Compressed at level 6 with 0 threads; filesize: 347379910, time: 41.21350431442261 seconds
Compressed at level 1 with 1 threads; filesize: 405313274, time: 3.2212672233581543 seconds
Compressed at level 2 with 1 threads; filesize: 404989061, time: 3.3104476928710938 seconds
Compressed at level 3 with 1 threads; filesize: 376289859, time: 20.57276463508606 seconds
Compressed at level 4 with 1 threads; filesize: 368802018, time: 17.815391063690186 seconds
Compressed at level 5 with 1 threads; filesize: 359170597, time: 24.02311873435974 seconds
Compressed at level 6 with 1 threads; filesize: 347379910, time: 40.91991400718689 seconds
Compressed at level 1 with 4 threads; filesize: 405313274, time: 3.207160472869873 seconds
Compressed at level 2 with 4 threads; filesize: 404989061, time: 3.3800032138824463 seconds
Compressed at level 3 with 4 threads; filesize: 376289859, time: 20.435675144195557 seconds
Compressed at level 4 with 4 threads; filesize: 368802018, time: 17.669416427612305 seconds
Compressed at level 5 with 4 threads; filesize: 359170597, time: 23.864430904388428 seconds
Compressed at level 6 with 4 threads; filesize: 347379910, time: 40.691060066223145 seconds

I am much more happy with the compression gradient. Level 3 is now actually an improvement over level 2 (albeit much slower). Zlib-ng does some really nice work. Only threads are missing, but it looks crabz will provide these when ready.

I am also quite happy about how this PR turns out technically. It should be quite easy to add crabz after this and the logic has gotten a bit simpler.

@rhpvorderman rhpvorderman requested a review from marcelm February 3, 2023 09:36
@rhpvorderman
Copy link
Collaborator Author

Superseded by #135

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant