-
Notifications
You must be signed in to change notification settings - Fork 589
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compaction lossy hash #14283
Compaction lossy hash #14283
Conversation
95e0561
to
e40cf8b
Compare
c03dece
to
45a02cc
Compare
This is useful for the open-source build when vtools is not available. Signed-off-by: Noah Watkins <[email protected]>
3a971d2
to
0ce2a75
Compare
ducktape was retried in job https://buildkite.com/redpanda/redpanda/builds/39629#018b5ebb-73a0-46d9-82bd-b910bbd31174 |
ducktape was retried in job https://buildkite.com/redpanda/redpanda/builds/39629#018b5ebb-73a4-4727-af38-9ae04aeb92d8 |
i was skeptical at first w/ the regular byte comparison hash. this makes sense. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice. A few questions/suggestions
/** | ||
* Uses successive chunks of sizeof(index_type) bytes taken from hash(key) | ||
* as probes into the hash table. When `next()` returns null then the caller | ||
* should switch to linear probing. | ||
*/ | ||
struct probe { | ||
using index_type = uint32_t; | ||
static_assert(sizeof(index_type) <= hash_type::digest_size); | ||
|
||
explicit probe(const hash_type::digest_type&); | ||
|
||
std::optional<index_type> next(); | ||
|
||
hash_type::digest_type::const_pointer iter; | ||
hash_type::digest_type::const_pointer end; | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the idea here is to probe at random places in the backing vector. This is neat. Does this smartness have a name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked extensively to see if this had a name, but I don't think it does.
IMO this is probably just a different flavor/generalization of double hashing where if the first hash position is a collision a different hash function is used as the next probe. Here the hash output is large enough that we wouldn't use all those bits anyway for the first position.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Just some nits
src/v/utils/fragmented_vector.h
Outdated
* The expected use case for this is to allocate a large vector in a fiber | ||
* using a series of smaller resize() invocations allowing for cooperative | ||
* yield calls to be inserted to avoid reactor stalls. The optimal strategy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I wonder if there's a magic number over which it's not safe to call this?
Alternatively, I wonder if there are async helper methods worth adding that encapsulates this expected use case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there are any magic numbers here. Since fragmented vector isn't futurized, it be the same concern as resizing a std::vector or avoiding calling fragmented_vector::copy on a large fragmented vector.
Signed-off-by: Noah Watkins <[email protected]>
Signed-off-by: Noah Watkins <[email protected]>
Signed-off-by: Noah Watkins <[email protected]>
Signed-off-by: Noah Watkins <[email protected]>
ducktape was retried in job https://buildkite.com/redpanda/redpanda/builds/39695#018b6311-32e1-459d-91ec-ba8d198872d7 |
Signed-off-by: Noah Watkins <[email protected]>
Signed-off-by: Noah Watkins <[email protected]>
Signed-off-by: Noah Watkins <[email protected]>
1fb5ca9
to
0b9f5fe
Compare
force-push
|
ducktape was retried in job https://buildkite.com/redpanda/redpanda/builds/39719#018b63ea-6c0d-4d9e-a600-2a2d69a8ae33 |
Failure is #14218 |
src/v/storage/key_offset_map.cc
Outdated
}; | ||
|
||
// handle a non-normalized probe position | ||
// returns true if key is inserted |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: comment needs an update
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
probe_count_ = 0; | ||
} | ||
|
||
seastar::future<> hash_key_offset_map::initialize() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe consider naming this reset()
or something? As is it seems like something we should always call before using the map, but I don't think that's the case (presumably we just need to use reset(size_bytes)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reset(size)
implies initialize()
. but in practice you really only want to call reset(size)
once at boot-up, and then call initialize()
each time you want to use the table in a new context because it avoids freeing and reallocating all the memory.
Signed-off-by: Noah Watkins <[email protected]>
Introduces a new offset key map implementation that maps compaction key space into sha256(key) space.
Backports Required
Release Notes