Implement `pub fn de/compress_into_slice(input: &[u8], output: &mut [u8]) -> usize` #11

milesgranger · 2021-02-15T07:36:13Z

Hi!

Thanks for taking the time to put this together, I think it's great! 🚀

I'm in a situation, where I tell some other lib what size slice I want (pre-allocating it into Python), and then I get a mutable reference to that. However, it seems all de/compress APIs in lz4_flex take or result in Vec<u8>.

Is it worth-while to add something like mentioned in the title?

The text was updated successfully, but these errors were encountered:

PSeitz · 2021-02-15T09:24:57Z

Hi,

Generally it's possible, but currently more space is allocated than needed to allow more aggressive optimizations. That means the slice size would not reflect the actual length of the uncompressed output.

The de/compress methods which push into the Vec would need to be changed copy add the current output pos instead.

milesgranger · 2021-02-15T09:46:36Z

This may be too specific to my use case.

However, while I agree optimally and ergonomically, using the existing API is best for dealing with things in Rust. It turns out making a single (over) allocation into Python, doing the compression op, then resizing the python bytearray to the actual size is quite a bit faster than going through two allocations; one on Rust then another to create and copy that buffer into a Python object.

I understand if it's not worth it; it's already slightly faster than existing Python lz4 bindings, but having this ability would make it decisively faster; at least that's the experience when using snappy b/c it has implemented the Read trait; giving me the option to pre-allocate then resize the buffer (if there is a way to calculate expected size) or read it into a regular &mut Vec<u8> if unknown. read vs read_to_end; but again, not sure it'd be worthwhile here for your project.

milesgranger · 2021-02-24T20:18:01Z

I'd formally like to change my request to have de/compress_into take anything for output which implements Write. Looking briefly at the source code, not sure how feasible such a request would be; but anyway having the ability to pass &mut [u8] would still be helpful.

PSeitz · 2021-02-26T15:56:48Z

Write trait support would be part of the lz4 frame format, which is missing currently. Currently only the block format is implemented.

From a performance point of view, the frame API would probably be a little slower, because of some additional overhead and allocations.

milesgranger · 2021-02-26T17:04:05Z

Right, that makes sense, suppose I could still very much make use of the original request here and would make use of framed when/if that comes about. Thanks for the clarification between framed and block format here. 👍

athre0z · 2021-03-31T15:08:01Z

I'd also like having this kind of API. I'd be totally okay with having to over-allocate the buffer, then having the function return how many bytes were actually written. As long as the padding/alignment requirements are documented – or, perhaps even better, a function for calculating the required buffer size is present – I don't think that's a problem.

My more concrete use case is placing the LZ4 outputs into a bump allocator. In that sense, generalizing compress_into to be

pub fn compress_into<A: std::alloc::Allocator>(input: &[u8], compressed: &mut Vec<u8, A>)

would work as well, however with allocator_api currently still being nightly-only, that's not really a reasonable thing to request.

PSeitz · 2021-04-01T14:43:04Z

@athre0z thanks, good to know there would be multiple use cases. Internally that's already how it works, it over allocates with get_maximum_output_size, then afterwards sets the actual size of the vector. I thought about it and this api would make sense on the block format even before the frame format, since it could be utilized from the frame format to avoid double allocations.

PSeitz · 2021-04-04T12:28:52Z

on the mut_slice branch, it accepts now Sink as parameter for compress_into. Sink is a &mut u8 slice, with an additional pos information. Not sure if the api design will stay like this or just a &mut u8 is fine.
You can give it a try if you like. Would also be interesting to know if this improves performance for you.

athre0z · 2021-04-04T13:52:59Z

Thanks for implementing this! Can't really compare against previous performance because having to copy the buffer was a reason for not trying to use the library previously, but I can compare against the lz4 crate which I used previously!

Trivially compressible data

LZ4 flex safe compression: 1.32 GiB/s
LZ4 flex unsafe compression: 1.32 GiB/s (8.3% exec time spent doing compression)
lz4 crate compression (level = 1): 1.00GiB/s (33.8% exec time spent doing compression)

Mostly incompressible data

LZ4 flex safe compression: 348 MiB/s (70% exec time)
LZ4 flex unsafe compression: 455 MiB/s (61% exec time)
lz4 crate compression (level = 1): 420 MiB/s (64% exec time)

These throughput rates include not only compression, but also processing in my application (a sort of message queue tinkered towards high throughput). I profiled and annotated the amount of CPU time spent within the respective LZ4 implementation. It's worth noting that this is an unfair comparison, because I had been running the lz4 library in framed mode (without noticing it) and it spends quite a bit of of execution time in XXH32_update, which I suspect is part of the frame header. Nevertheless, the results look very promising!

Do you plan on also reworking the decompression API in the same manner?

PSeitz · 2021-04-06T18:59:31Z

Yes, I updated the master branch to also include decompression with the sink. It's even around 10% faster for compressible data with safe-decoding, which makes sense, since push has additional overhead to check the capacity which is not required.
I think you should have either use checked-decode or hash and check the hash (which is part of the frame format, which is not yet implemented), or it could lead to memory corruption if the the data is corrupted or manipulated. Memory corruption would not a big issue if the program is sandboxed like in wasm, but I don't think that's the case for phyton.

athre0z · 2021-04-07T16:43:20Z

Nice, thanks! Yes -- since my application is networked, I already had the checked-decode enabled. :) It seems to be working great in general, but I noticed that the pos field in the Sink object didn't get updated properly. My compressor looks as following:

struct LZ4Compressor(i32);

impl Compressor for LZ4Compressor {
    fn compress<A: Allocator + Copy>(&self, input: Vec<u8, A>) -> CompressResult<Vec<u8, A>> {
        // Write uncompressed payload size.
        let len: u32 = input
            .len()
            .try_into()
            .map_err(|_| CompressError::PayloadTooLarge(input.len()))?;

        let cap = lz4_flex::block::compress::get_maximum_output_size(input.len()) + 4;
        let mut out = Vec::with_capacity_in(cap, *input.allocator());
        out.write_u32::<LE>(len).unwrap();

        // Write compressed data.
        unsafe { out.set_len(cap) };
        let actual = lz4_flex::compress_into(&input, &mut (&mut out[4..]).into());
        assert!(actual + 4 <= cap);
        unsafe { out.set_len(actual + 4) };

        Ok(out)
    }

    fn decompress<A: Allocator + Copy>(&self, input: Vec<u8, A>) -> CompressResult<Vec<u8, A>> {
        // Read uncompressed payload size.
        let mut read_slice = &input[..];
        let uncompressed_len = read_slice.read_u32::<LE>().unwrap();

        // Decompress buffer.
        let mut decompressed = Vec::with_capacity_in(uncompressed_len as _, *input.allocator());
        unsafe { decompressed.set_len(uncompressed_len as _) };
        let mut sink: lz4_flex::block::Sink = (&mut decompressed[..]).into();
        lz4_flex::decompress_into(read_slice, &mut sink).unwrap();
        assert_eq!(sink.pos(), uncompressed_len as _);

        Ok(decompressed)
    }
}

And the assert! in decompress fails because pos() is always 0. Perhaps I'm just misunderstanding the field.

PSeitz · 2021-04-08T08:04:54Z

Ah yes, the pos parameter is not updated currently in the decompression case. It's probably a very niche issue, where decompression succeeds and is smaller than uncompressed_len, but could be covered by checking pos.

There is currently an issue I'm looking into where overallocation is required in the decompression case, but it should not. I think the logic needs to be changed a little how the fast loop is exited and then the last bytes are copied. This was previously no issue, because of the owned vec.

athre0z · 2021-04-08T16:35:46Z

Just like with the compression case, I wouldn't mind having to over-allocate the decompression buffer in presence of a function calculating the required size.

On a more general note, I think it would be best to keep Sink out of the public interface. I do understand the purpose, but feel that the public API would preferably abstract this implementation detail away from the user.

I also wonder if std::io::Cursor could be used internally (which is pretty much the same thing as Sink, implementing Write on Cursor<&mut [u8]>). It might make unsafe writes more complicated, so if you had previously known about this and it simply isn't a good fit, feel very much free to disregard this comment. Sometimes, reusing existing things just isn't a good solution.

athre0z · 2021-04-10T14:50:34Z

#[test]
fn lz4_test() {
    let data = vec![4u8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0];
    let encoded = LZ4Compressor.compress(data.clone()).unwrap();
    let decoded = LZ4Compressor.decompress(encoded).unwrap();
    assert_eq!(data, decoded);
}

Currently fails with OutputTooSmall { expected_size: 0, actual_size: 20 }. I suppose that's one of the cases where "There is currently an issue I'm looking into where overallocation is required in the decompression case" cases? It looks like the expected_size is wrong here.

PSeitz · 2021-04-11T13:10:28Z

Yes, that's the issue I meant. I agree Sink should be replaced. It's mainly there to have a smooth transition from the vec methods and Cursor would have been the better choice. I will consider it in the refactoring.

Over-allocation for decompression is not compatible with the frame format, because it could overwrite parts of other blocks.

PSeitz · 2021-05-16T06:20:04Z

@athre0z This should work now, can you retest? I replaced the Sink api with &mut [u8] + pos.

athre0z · 2021-06-03T12:01:08Z

Sorry for the late response! Thanks, everything seems to be working as expected now, and the interface is a lot less cumbersome. :) The output_pos argument is slightly redundant (because you can also just do &mut output[output_pos..]), but I also don't mind it.

PSeitz mentioned this issue Apr 8, 2021

Implement frame format #13

Merged

11 tasks

PSeitz closed this as completed Sep 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement `pub fn de/compress_into_slice(input: &[u8], output: &mut [u8]) -> usize` #11

Implement `pub fn de/compress_into_slice(input: &[u8], output: &mut [u8]) -> usize` #11

milesgranger commented Feb 15, 2021

PSeitz commented Feb 15, 2021

milesgranger commented Feb 15, 2021

milesgranger commented Feb 24, 2021 •

edited

Loading

PSeitz commented Feb 26, 2021

milesgranger commented Feb 26, 2021

athre0z commented Mar 31, 2021 •

edited

Loading

PSeitz commented Apr 1, 2021

PSeitz commented Apr 4, 2021

athre0z commented Apr 4, 2021

PSeitz commented Apr 6, 2021

athre0z commented Apr 7, 2021 •

edited

Loading

PSeitz commented Apr 8, 2021

athre0z commented Apr 8, 2021

athre0z commented Apr 10, 2021

PSeitz commented Apr 11, 2021

PSeitz commented May 16, 2021 •

edited

Loading

athre0z commented Jun 3, 2021

Implement pub fn de/compress_into_slice(input: &[u8], output: &mut [u8]) -> usize #11

Implement pub fn de/compress_into_slice(input: &[u8], output: &mut [u8]) -> usize #11

Comments

milesgranger commented Feb 15, 2021

PSeitz commented Feb 15, 2021

milesgranger commented Feb 15, 2021

milesgranger commented Feb 24, 2021 • edited Loading

PSeitz commented Feb 26, 2021

milesgranger commented Feb 26, 2021

athre0z commented Mar 31, 2021 • edited Loading

PSeitz commented Apr 1, 2021

PSeitz commented Apr 4, 2021

athre0z commented Apr 4, 2021

Trivially compressible data

Mostly incompressible data

PSeitz commented Apr 6, 2021

athre0z commented Apr 7, 2021 • edited Loading

PSeitz commented Apr 8, 2021

athre0z commented Apr 8, 2021

athre0z commented Apr 10, 2021

PSeitz commented Apr 11, 2021

PSeitz commented May 16, 2021 • edited Loading

athre0z commented Jun 3, 2021

Implement `pub fn de/compress_into_slice(input: &[u8], output: &mut [u8]) -> usize` #11

Implement `pub fn de/compress_into_slice(input: &[u8], output: &mut [u8]) -> usize` #11

milesgranger commented Feb 24, 2021 •

edited

Loading

athre0z commented Mar 31, 2021 •

edited

Loading

athre0z commented Apr 7, 2021 •

edited

Loading

PSeitz commented May 16, 2021 •

edited

Loading