-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement pub fn de/compress_into_slice(input: &[u8], output: &mut [u8]) -> usize
#11
Comments
Hi, Generally it's possible, but currently more space is allocated than needed to allow more aggressive optimizations. That means the slice size would not reflect the actual length of the uncompressed output. The de/compress methods which |
This may be too specific to my use case. However, while I agree optimally and ergonomically, using the existing API is best for dealing with things in Rust. It turns out making a single (over) allocation into Python, doing the compression op, then resizing the python I understand if it's not worth it; it's already slightly faster than existing Python lz4 bindings, but having this ability would make it decisively faster; at least that's the experience when using snappy b/c it has implemented the Read trait; giving me the option to pre-allocate then resize the buffer (if there is a way to calculate expected size) or read it into a regular |
I'd formally like to change my request to have |
From a performance point of view, the frame API would probably be a little slower, because of some additional overhead and allocations. |
Right, that makes sense, suppose I could still very much make use of the original request here and would make use of framed when/if that comes about. Thanks for the clarification between framed and block format here. 👍 |
I'd also like having this kind of API. I'd be totally okay with having to over-allocate the buffer, then having the function return how many bytes were actually written. As long as the padding/alignment requirements are documented – or, perhaps even better, a function for calculating the required buffer size is present – I don't think that's a problem. My more concrete use case is placing the LZ4 outputs into a bump allocator. In that sense, generalizing pub fn compress_into<A: std::alloc::Allocator>(input: &[u8], compressed: &mut Vec<u8, A>) would work as well, however with |
@athre0z thanks, good to know there would be multiple use cases. Internally that's already how it works, it over allocates with |
on the mut_slice branch, it accepts now |
Thanks for implementing this! Can't really compare against previous performance because having to copy the buffer was a reason for not trying to use the library previously, but I can compare against the Trivially compressible dataLZ4 flex safe compression: 1.32 GiB/s Mostly incompressible dataLZ4 flex safe compression: 348 MiB/s (70% exec time) These throughput rates include not only compression, but also processing in my application (a sort of message queue tinkered towards high throughput). I profiled and annotated the amount of CPU time spent within the respective LZ4 implementation. It's worth noting that this is an unfair comparison, because I had been running the Do you plan on also reworking the decompression API in the same manner? |
Yes, I updated the master branch to also include decompression with the sink. It's even around 10% faster for compressible data with |
Nice, thanks! Yes -- since my application is networked, I already had the struct LZ4Compressor(i32);
impl Compressor for LZ4Compressor {
fn compress<A: Allocator + Copy>(&self, input: Vec<u8, A>) -> CompressResult<Vec<u8, A>> {
// Write uncompressed payload size.
let len: u32 = input
.len()
.try_into()
.map_err(|_| CompressError::PayloadTooLarge(input.len()))?;
let cap = lz4_flex::block::compress::get_maximum_output_size(input.len()) + 4;
let mut out = Vec::with_capacity_in(cap, *input.allocator());
out.write_u32::<LE>(len).unwrap();
// Write compressed data.
unsafe { out.set_len(cap) };
let actual = lz4_flex::compress_into(&input, &mut (&mut out[4..]).into());
assert!(actual + 4 <= cap);
unsafe { out.set_len(actual + 4) };
Ok(out)
}
fn decompress<A: Allocator + Copy>(&self, input: Vec<u8, A>) -> CompressResult<Vec<u8, A>> {
// Read uncompressed payload size.
let mut read_slice = &input[..];
let uncompressed_len = read_slice.read_u32::<LE>().unwrap();
// Decompress buffer.
let mut decompressed = Vec::with_capacity_in(uncompressed_len as _, *input.allocator());
unsafe { decompressed.set_len(uncompressed_len as _) };
let mut sink: lz4_flex::block::Sink = (&mut decompressed[..]).into();
lz4_flex::decompress_into(read_slice, &mut sink).unwrap();
assert_eq!(sink.pos(), uncompressed_len as _);
Ok(decompressed)
}
} And the |
Ah yes, the There is currently an issue I'm looking into where overallocation is required in the decompression case, but it should not. I think the logic needs to be changed a little how the fast loop is exited and then the last bytes are copied. This was previously no issue, because of the owned vec. |
Just like with the compression case, I wouldn't mind having to over-allocate the decompression buffer in presence of a function calculating the required size. On a more general note, I think it would be best to keep I also wonder if |
#[test]
fn lz4_test() {
let data = vec![4u8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0];
let encoded = LZ4Compressor.compress(data.clone()).unwrap();
let decoded = LZ4Compressor.decompress(encoded).unwrap();
assert_eq!(data, decoded);
} Currently fails with |
Yes, that's the issue I meant. I agree Over-allocation for decompression is not compatible with the frame format, because it could overwrite parts of other blocks. |
@athre0z This should work now, can you retest? I replaced the |
Sorry for the late response! Thanks, everything seems to be working as expected now, and the interface is a lot less cumbersome. :) The |
Hi!
Thanks for taking the time to put this together, I think it's great! 🚀
I'm in a situation, where I tell some other lib what size slice I want (pre-allocating it into Python), and then I get a mutable reference to that. However, it seems all de/compress APIs in lz4_flex take or result in
Vec<u8>
.Is it worth-while to add something like mentioned in the title?
The text was updated successfully, but these errors were encountered: