Experimental support for LZMA / XZ codec #127

milesgranger · 2024-01-21T12:32:36Z

Part of #126

Adds experimental lzma / xz support under the experimental module, with a limited amount of configuration, only being able to set preset for compression is all.

67f6902 will close #123 (hopefully) :)

For this, it creates a byte-for-byte mirror of the builtin python module using defaults.

In [1]: import lzma

In [2]: import cramjam

In [3]: compressed = lzma.compress(b'bytes')

In [4]: bytes(cramjam.experimental.lzma.compress(b'bytes'))
Out[4]: b'\xfd7zXZ\x00\x00\x04\xe6\xd6\xb4F\x02\x00!\x01\x16\x00\x00\x00t/\xe5\xa3\x01\x00\x04bytes\x00\x00\x00\x006\x93\x11\xb1PA\x11\xab\x00\x01\x1d\x05\xb8-\x80\xaf\x1f\xb6\xf3}\x01\x00\x00\x00\x00\x04YZ'

In [5]: compressed
Out[5]: b'\xfd7zXZ\x00\x00\x04\xe6\xd6\xb4F\x02\x00!\x01\x16\x00\x00\x00t/\xe5\xa3\x01\x00\x04bytes\x00\x00\x00\x006\x93\x11\xb1PA\x11\xab\x00\x01\x1d\x05\xb8-\x80\xaf\x1f\xb6\xf3}\x01\x00\x00\x00\x00\x04YZ'

TODO:

At least decode support for legacy LZMA format.
Multi stream support
Expose more configuration settings?

Closes #123

lgray · 2024-01-21T19:06:29Z

@milesgranger the present implementation works in our case, thanks! Strangely hinting the output buffer size doesn't seem to bring any performance improvement compared to the python standard library implementation. Perhaps the data I'm testing isn't large enough to see it. Maybe there's a more optimized lzma implementation out there in rust land, but that can come in time.

Still - seems fit to task! Thanks for the snappy response!

Small update: there appears to be a ~10% improvement for our data, when testing with a larger file. Not huge, but I'll take it.

milesgranger · 2024-01-22T06:31:43Z

Thanks for the feedback @lgray, will be happy to know what you think of follow-ups as time permits. Good there was a bit of improvement, but wasn't expecting anything amazing. I think they both use the same underlying liblzma under the hood.

Probably will benefit from cramjam's de/compress_into functions if you're able to work that out and re-use buffers in your use case.

lgray · 2024-01-22T13:26:34Z

@milesgranger sure - @ me when you post follow ups. I can test them fairly quickly.

For (de)compress_into I'll have to tinker with the uproot library a bit more, but it's typically organized very sensibly so I should be able to use those methods.

milesgranger added 3 commits January 21, 2024 13:29

Add experimental LZMA / XZ support

9220bca

Reduce max size for test_variants_different_dtypes

67f6902

Closes #123

Switch to xz2 crate

fc5ea5e

milesgranger marked this pull request as draft January 21, 2024 13:07

milesgranger mentioned this pull request Jan 21, 2024

lzma / xz support? #126

Closed

lgray mentioned this pull request Jan 21, 2024

feat: use cramjam for lzma, lz4, and zstd, opt-in use of isal for zlib scikit-hep/uproot5#1090

Merged

milesgranger marked this pull request as ready for review January 22, 2024 06:25

milesgranger merged commit 2d710c7 into master Jan 22, 2024
65 checks passed

milesgranger deleted the support-lzma branch January 22, 2024 06:28

milesgranger mentioned this pull request Feb 4, 2024

More work on LZMA / XZ support #133

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experimental support for LZMA / XZ codec #127

Experimental support for LZMA / XZ codec #127

milesgranger commented Jan 21, 2024 •

edited

Loading

lgray commented Jan 21, 2024 •

edited

Loading

milesgranger commented Jan 22, 2024

lgray commented Jan 22, 2024

Experimental support for LZMA / XZ codec #127

Experimental support for LZMA / XZ codec #127

Conversation

milesgranger commented Jan 21, 2024 • edited Loading

lgray commented Jan 21, 2024 • edited Loading

milesgranger commented Jan 22, 2024

lgray commented Jan 22, 2024

milesgranger commented Jan 21, 2024 •

edited

Loading

lgray commented Jan 21, 2024 •

edited

Loading