Miscelanous RFC #12

dpc · 2017-08-05T06:29:14Z

I was toying a bit, trying to find a way to remove a performance hit found in #11 , and also with the API.

I really like the

fn find_chunk<'a>(&mut self, buf: &'a [u8]) -> Option<(&'a [u8], &'a [u8])>;

API. It gives a nice, idiomatic, low-level handling loop:

    let mut buf = v.as_slice();
    while let Some((last, rest)) = cdc.find_chunk(buf) {
        // handle `last` piece of the chunk
        buf = rest;
    }
    // handle leftover `buf` here

and should be easy to build iterators, Read/Write adapters, and other APIs on top of it.

I'm not sure if this should get merged as a whole, but if you can let me know what seems OK, and what not, I could prepare more targeted patches.

* Split CDC and rolling shum APIs * Change the `find_chunk` API to something more idiomatic. * Make `gear` be `u64` again * Improve tests a bit

They don't really help with anything

remram44 · 2017-08-05T17:52:49Z

I always have the same remark, which is that loading the full file in memory is unnecessary, and a deal breaker for me.

dpc · 2017-08-05T20:28:07Z

@remram44 Neither the existing nor this API require loading the whole file into memory. You can look how rdedup is using the current one: https://github.com/dpc/rdedup/blob/b016278070cdcc3b6432b00452080aa45da4eb19/lib/src/sg.rs#L163 . It looks a bit more complicated due to OwningRef. Read more about the whole zero-copy, minimum memory usage processing in rdedup: https://dpc.pw/blog/2017/04/rusts-fearless-concurrency-in-rdedup/ .

This fixes optimization fail rust-lang/rust#43653

remram44 · 2017-08-06T03:54:11Z

Yes, it is definitely possible to do this by managing the buffers at a higher level and feeding those to the chunking library. However because this is such a common operation I do believe this should be part of the API exposed by the chunking crate.

I went a bit overboard and really fleshed out this chunking library, borrowing from your design and the nested crate I had in dhstore. It ends up having a bunch of different methods/iterators for different uses. I would appreciate your feedback on it, especially the low-level stuff (it should be also possible to feed stuff manually to the rolling-hash like you do).

dpc · 2017-08-06T04:45:12Z

@remram44 I'm fine with higher-level wrapper APIs, but the low-level API like the one here is optimal since it does not involve copying anything, or allocating, or arbitrary buffer sizes. There are plenty of situations where someone already does have a data in the buffer.

Also, the higher-level APis can come up with a lot of different versions, and people can argue forever which one is best. I rather have all algos with a common, optimal, tiny API, and have everyone wrap it in whatever higher-level API that suits them best.

dpc · 2017-08-06T08:10:08Z

@remram44 I've filled dswd/zvault#10 as an example of how much of a performance drop does copying-API cost.

For rdedupI already have a zero-copy chunking, that I want to be as fast as possible, as it's the only piece of rdedup that is not easily multi-threaded.

dswd · 2017-08-06T19:31:00Z

How would you go and chunk a file of arbitrary size (e.g. larger than memory) with this API? Can you provide an example. I am asking this, because this is my use case and I guess it is quite generic.

dpc · 2017-08-06T22:13:28Z

Just like in rdedup. Using bunch of smaller buffers in OwningArc. I can paste more links later.

dpc · 2017-08-10T06:46:26Z

I adapted rdedup to use the new API. https://github.com/dpc/rdedup/blob/21de28253862e9e77099adcb79184b42729c41fb/lib/src/sg.rs#L119

dpc added 2 commits August 4, 2017 23:21

Many improvements

cd25928

* Split CDC and rolling shum APIs * Change the `find_chunk` API to something more idiomatic. * Make `gear` be `u64` again * Improve tests a bit

Remove #[inline]

902594f

They don't really help with anything

dpc changed the title ~~Reword~~ Miscelanous RFC Aug 5, 2017

dpc mentioned this pull request Aug 5, 2017

Estabilish the API. #13

Open

gear: inline digest manually

89010c1

This fixes optimization fail rust-lang/rust#43653

dpc mentioned this pull request Aug 6, 2017

Fastcdc #11

Closed

dpc mentioned this pull request Aug 6, 2017

Rework fastcdc #14

Closed

dpc closed this Jul 3, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Miscelanous RFC #12

Miscelanous RFC #12

dpc commented Aug 5, 2017 •

edited

Loading

remram44 commented Aug 5, 2017

dpc commented Aug 5, 2017

remram44 commented Aug 6, 2017

dpc commented Aug 6, 2017

dpc commented Aug 6, 2017 •

edited

Loading

dswd commented Aug 6, 2017

dpc commented Aug 6, 2017

dpc commented Aug 10, 2017

Miscelanous RFC #12

Miscelanous RFC #12

Conversation

dpc commented Aug 5, 2017 • edited Loading

remram44 commented Aug 5, 2017

dpc commented Aug 5, 2017

remram44 commented Aug 6, 2017

dpc commented Aug 6, 2017

dpc commented Aug 6, 2017 • edited Loading

dswd commented Aug 6, 2017

dpc commented Aug 6, 2017

dpc commented Aug 10, 2017

dpc commented Aug 5, 2017 •

edited

Loading

dpc commented Aug 6, 2017 •

edited

Loading