-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add optimized datastructure for tracking PollStates #48
Add optimized datastructure for tracking PollStates #48
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
miri
seems to be hanging on CI. I'm not sure what's causing that tbh, but we should probably resolve it before merging.
Miri probably is just timing out because the test is doing too much work. Let's see if reducing the number of iterations for miri solves the problem. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code looks good, and I'm happy to merge this as is. For performance though, there is a bit of question to what's going on. I've tested this against the suite from: #52
Benchmarks
10 items
We're seeing roughly similar performance here, but with less deviation which seems promising:
100 items
This seems like a slight performance slowdown; the 1000 item case too.
Slower? We can't have that! 😉 Is there a convenient way to compare two git branches with criterion? |
a9348e1
to
f5b2ff8
Compare
f5b2ff8
to
0198099
Compare
This version should do a bit better, but it still seems to be slower. I'm not sure if this optimization is worth the trouble. |
Yeah the benches I'm now getting are:
DiagnosisBut I think I may have found something? The main idea here is that if we have a number of entries which is less than the size of a pointer, we can inline the entries in the space of the pointer. However, if we validate the sizes, it turns out we're not quite doing that right now! #[test]
fn check_bit_sizes() {
const BYTE: usize = 8;
assert_eq!(std::mem::size_of::<EntriesWord>() * BYTE, 64); // ✅ 64 bits
assert_eq!(std::mem::size_of::<PollStateEntries>() * BYTE, 64); // ❌ 192 bits
} It seems the Potential Directions/// The max number of entries `PollStates` can store without heap allocations.
const MAX_INLINE_ENTRY_COUNT: usize = ENTRIES_PER_WORD;
const ENTRIES_PER_WORD: usize = ((std::mem::size_of::<EntriesWord>() * 8) / 2);
enum PollStateEntries {
Inline(EntriesWord),
Boxed(Vec<EntriesWord>),
} A few things stand out to me here:
What I'm wondering is if by changing these variables around we can get the size of |
Oh actually: we're a bit off here - the assumption that "pointer to vec is 64 bits" is wrong. A assert_eq!(std::mem::size_of::<Vec<()>>() * BYTE, 192); // ✅ 192 bits The issue isn't that we're not aligning our usizes and what not right. There's plenty of space for that. In fact, if we do it right we should be able to store up to 3 words worth of entries in the space of a single |
Okay, I did find another interesting bit! I was looking at the size of pub(crate) struct PollStates {
len: usize,
entries: PollStateEntries,
} What we'd want is for assert_eq!(std::mem::size_of::<Vec<PollState>>() * BYTE, 192); // ✅ 192 bits
assert_eq!(std::mem::size_of::<PollStates>() * BYTE, 192); // ❌ 256 bits And if you think about it: this makes sense. In the enum we're making space for
There might be other optimizations possible too, wrt accounting for the labels and what not. But that seems like the right place to start perhaps? |
And confirmation that inlining the length in the enum PollStateEntries {
Inline(usize, EntriesWord), // now stores an extra usize
Boxed(Vec<EntriesWord>),
}
pub(crate) struct PollStates { // no longer stores a usize
entries: PollStateEntries,
}
#[test]
fn check_bit_sizes() {
const BYTE: usize = 8;
assert_eq!(std::mem::size_of::<Vec<EntriesWord>>() * BYTE, 192); // ✅ 3 bytes
assert_eq!(std::mem::size_of::<PollStateEntries>() * BYTE, 192); // ✅ 3 bytes
assert_eq!(std::mem::size_of::<PollStates>() * BYTE, 192); // ✅ 3 bytes
} |
I'm starting to think that storing PollStates as bit masks doesn't quite carry its complexity weight (and it's not actually faster, at least in the micro benchmarks). I'll post a much simpler version when I find the time. |
The length of |
Closing in favor of the much simpler #78. |
Closes #42. Not very polished yet but it should get the job done.