Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: fix encode_varint{32,64} #709

Closed
wants to merge 3 commits into from

Conversation

morrisonlevi
Copy link
Contributor

@morrisonlevi morrisonlevi commented Jan 31, 2024

The previous code was full of jumps that aren't easily predictable. This version is less code, generates less code, and should perform better (well, on x86_64 anyway, I am not an aarch64 expert). All while using safe Rust.

You can compare assembly on godbolt. The current version of encode_varint64 is 99 lines long and has ~10 labels, the new version is 27 lines long with only 3 labels.

I haven't included benchmarking numbers because I wasn't sure what characteristics you'd be looking for. Based on the x86_64 assembly, it should definitely out-perform. Anything in particular you are looking for there?

Edit: I also simplified encoded_varint64_len while I was at it. It gets nicer assembly as a result as well but obviously matters less.

The previous code was full of jumps that aren't easily predictable.
This version is less code, generates less code, and performs better
(well, on x86_64 anyway, I am not an aarch64 expert). All while using
safe Rust.
@morrisonlevi
Copy link
Contributor Author

morrisonlevi commented Jan 31, 2024

Interestingly, the miri failure is in std:

https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=92348df8e81c457c63c78c0845bf9e3b.

fn main() {
    match 2u64.checked_pow(64) {
        Some(n) => println!("2^64: {n}"),
        None => {}
    };
}

Given this is a checked_* operation, I would not expect an overflow related issue:

   Compiling playground v0.0.1 (/playground)
    Finished dev [unoptimized + debuginfo] target(s) in 0.33s
     Running `/playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/bin/cargo-miri runner target/miri/x86_64-unknown-linux-gnu/debug/playground`
error: Undefined Behavior: overflowing shift by 64 in `unchecked_shl`
    --> /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/num/mod.rs:1181:5
     |
1181 | /     uint_impl! {
1182 | |         Self = u64,
1183 | |         ActualT = u64,
1184 | |         SignedT = i64,
...    |
1198 | |         bound_condition = "",
1199 | |     }
     | |_____^ overflowing shift by 64 in `unchecked_shl`
     |
     = help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
     = help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
     = note: BACKTRACE:
     = note: inside `core::num::<impl u64>::checked_pow` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/num/uint_macros.rs:1382:31: 1385:18
note: inside `main`
    --> src/main.rs:2:11
     |
2    |     match 2u64.checked_pow(64) {
     |           ^^^^^^^^^^^^^^^^^^^^
     = note: this error originates in the macro `uint_impl` (in Nightly builds, run with -Z macro-backtrace for more info)

note: some details are omitted, run with `MIRIFLAGS=-Zmiri-backtrace=full` for a verbose backtrace

error: aborting due to 1 previous error

I have not reported this upstream nor looked for existing issues yet.

@morrisonlevi
Copy link
Contributor Author

I opened an issue: rust-lang/rust#120537.

This greatly cleans up the assembly as well, but this function is not
as likely to matter from a perf perspective.
@stepancheg
Copy link
Owner

Merged the part about encoded_varint64_len. Can you please rebase and confirm miri is no longer an issue?

@morrisonlevi
Copy link
Contributor Author

morrisonlevi commented Feb 26, 2024

I merged in master, and Miri looks good now 👍🏻

@stepancheg
Copy link
Owner

OK, if this is still an issue, some benchmark is needed. Smaller machine code does not mean code is faster.

(Project is not very well maintained now, so long responses, sorry.)

@morrisonlevi
Copy link
Contributor Author

Re-analyzing this, I made a mistake previously: the jumps are actually quite predictable. With this in mind, I would expect this PR to be a slight regression (since it isn't unrolled), but it gave me some ideas to improve it...

...but then I realized that someone already beat me to these ideas: https://github.com/as-com/varint-simd. I think it's worth considering using their implementation on x86_64, but there's no point in doing it in this PR.

So for now, we'll just count the encoded_varint64_len as a win and not merge the rest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants