-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use inline storage for small hashes #47
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this approach, though I still wonder if just using smallvec and letting the user wrap in an arc if needed wouldn’t make for a nicer separation of concerns,
src/storage.rs
Outdated
@@ -0,0 +1,43 @@ | |||
use std::sync::Arc; | |||
|
|||
const MAX_INLINE: usize = 39; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why 39?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the total size is 40, a multiple of 8. 1 byte is needed for the enum discriminator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
39 fits all 256 bit hashes with some room to spare. 512 byte hashes won't work though, and I think once you go to such large hashes, you are better off with an Arc<[u8]>
.
But the size of the inline buffer can of course be adjusted when even larger hashes become common. The whole mechanism is completely opaque.
70b19af
to
eecc3fa
Compare
@dignifiedquire smallvec has an overhead of one usize for storing the size https://github.com/servo/rust-smallvec/blob/master/lib.rs#L374 , which is not necessary because the hash already knows how big it is. It also has the overhead of the discriminator, unless you use the "union" feature. https://github.com/servo/rust-smallvec/blob/master/lib.rs#L292, which is disabled by default: https://docs.rs/smallvec/1.2.0/smallvec/#union-feature . Due to the alignment rules, this discriminator costs you 8 bytes on 64 bit arch. So in total an overhead of 16 bytes. But most of all, it is a dependency containing unsafe code that can cause problems for others and that is not needed for a few LOC. |
eecc3fa
to
0559bed
Compare
There is an old smallbox crate but comes with a silly inefficiency that's worse than smallvec here. I'd think fix I presume smallbox really exists to make It appears possible to do this manually for individual traits, like we really could make a no_std no_alloc compatible It's possible some Apologies for the derail, but I'm agreeing with you that smallbox optimizations become messy and custom, especially whenever you want I'm also however pointing out that alloc being optional comes with some benefits, although perhaps not for multihash. |
@burdges interesting, but I think in this case just doing the branching manually is best. It is quite simple code. |
Thought about this some more.
|
We can afford it since the average hash is 34 bytes, and we want this thing to be a multiple of 8 large.
c8cd5c0
to
02cf114
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
I've run this code against Vec and Bytes based on the use case the libp2p has: https://gist.github.com/vmx/0dc0d981b522e2834084559faa0dcb56
On my machine this code code is always faster or similar (never slower).
The next useful limit would be 62, which would make the whole thing 64 bytes large and fit any 384 bit hash, but I don't think those are very common yet.
@vmx thanks for the benchmark. Criterion is pretty nice. For small hashes that fit, the improvement is quite good. I think we should give multiaddr the same treatment once this is merged... |
@twittner Could you please have a look at this PR? You were in favour for using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a nice improvement to me.
- use pub(crate) for Storage - use Storage::from_slices to prevent allocations for small identity multihashes
...and also a property based test for the normal roundtrip, now that we have the dependency anyway.
no more extern crate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@twittner Did all your concerns get addressed?
c4a5fa8
to
62692e2
Compare
This also gets rid of the Bytes dependency and replaces it with an
Arc<[u8]>
.Implementation for #46
Note that the size of this new multihash is 40, where as the size of the old Bytes based multihash is 32, plus the stuff on the heap. I think this is a pretty good tradeoff!