-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up leb128 encoding and decoding for unsigned values. #46919
Conversation
r? @aidanhs (rust_highfive has picked a reviewer for you, use r? to override) |
@bors try |
⌛ Trying commit 43ad4fd with merge 2d54fb61881368133d872d8f878adcde8621da7f... |
☀️ Test successful - status-travis |
r? @sfackler |
src/libserialize/leb128.rs
Outdated
macro_rules! impl_read_unsigned_leb128 { | ||
($fn_name:ident, $int_ty:ident) => ( | ||
#[inline] | ||
pub fn $fn_name(data: &[u8], start_position: usize) -> ($int_ty, usize) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was here before, bu t why does this take a slice and an offset instead of just a slice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. Maybe there was a reason in the past. I'll change it.
Perf run queued. |
Huh, that's very different from what the microbenchmarks showed. Seems like I need to iterate some more on this. |
ping @michaelwoerister, just wanna make sure this doesn't fall off your radar! |
I'm still in the process of setting up some good benchmarks that work with real-world data: https://github.com/michaelwoerister/encoding-bench It's a bit of a side-project, so it will take a while. |
43ad4fd
to
53c2f44
Compare
@bors try |
Speed up leb128 encoding and decoding for unsigned values. Make the implementation for some leb128 functions potentially faster. @Mark-Simulacrum, could you please trigger a perf.rlo run?
☀️ Test successful - status-travis |
@Mark-Simulacrum, could you do another perf run please? |
The try commit is done, we're waiting for perf to collect data for the previous auto branch commit -- should be next in queue. |
So ... this looks very good in all cases, except for |
@bors try |
@bors try |
@bors retry |
Speed up leb128 encoding and decoding for unsigned values. Make the implementation for some leb128 functions potentially faster. @Mark-Simulacrum, could you please trigger a perf.rlo run?
☀️ Test successful - status-travis |
OK, success. Let's see how it does now. |
Alright, queued. Should be a couple hours. |
Posting the link for later. Doesn't work yet. |
@Mark-Simulacrum Hm, the link doesn't seem to be working. Did I do something wrong? |
The perf.rlo link works now. Numbers look good, I think. re-r? @sfackler |
macro_rules! impl_write_unsigned_leb128 { | ||
($fn_name:ident, $int_ty:ident) => ( | ||
#[inline] | ||
pub fn $fn_name(out: &mut Vec<u8>, start_position: usize, mut value: $int_ty) -> usize { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be more verbose, but another strategy I've seen for this is just branching on the size of the value and avoiding the loop. Not sure which would be faster in rustc though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, running some tests shows that the following implementation for u32
is 10% faster when encoding metadata (while showing no improvement for the query-cache and the dep-graph):
#[inline]
pub fn write_leb128_u32(out: &mut Vec<u8>, start_position: usize, value: u32) -> usize {
if value <= (1 << 7) {
write_to_vec(out, start_position, value as u8);
1
} else if value <= (1 << 14) {
write_to_vec(out, start_position, (value as u8) | 0x80);
write_to_vec(out, start_position + 1, (value >> 7) as u8);
2
} else if value <= (1 << 21) {
write_to_vec(out, start_position, (value as u8) | 0x80);
write_to_vec(out, start_position + 1, ((value >> 7) as u8) | 0x80);
write_to_vec(out, start_position + 2, (value >> 14) as u8);
3
} else if value <= (1 << 28) {
write_to_vec(out, start_position, (value as u8) | 0x80);
write_to_vec(out, start_position + 1, ((value >> 7) as u8) | 0x80);
write_to_vec(out, start_position + 2, (value >> 14) as u8 | 0x80);
write_to_vec(out, start_position + 3, (value >> 21) as u8);
4
} else {
write_to_vec(out, start_position, (value as u8) | 0x80);
write_to_vec(out, start_position + 1, ((value >> 7) as u8) | 0x80);
write_to_vec(out, start_position + 2, (value >> 14) as u8 | 0x80);
write_to_vec(out, start_position + 3, (value >> 21) as u8 | 0x80);
write_to_vec(out, start_position + 4, (value >> 28) as u8);
5
}
}
A similar implementation for usize does a lot worse than the one from the PR. Not sure if it's worth the trouble since my test data is only from one crate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, thanks for checking it out!
@bors r+ |
📌 Commit 53c2f44 has been approved by |
Speed up leb128 encoding and decoding for unsigned values. Make the implementation for some leb128 functions potentially faster. @Mark-Simulacrum, could you please trigger a perf.rlo run?
☀️ Test successful - status-appveyor, status-travis |
Make the implementation for some leb128 functions potentially faster.
@Mark-Simulacrum, could you please trigger a perf.rlo run?