-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Display impl for u128 and i128 is slow #44583
Comments
The implementation is copied from |
I guess I haven't looked at the two implementations carefully. Can we not inline |
It’s possible, but we probably don’t want to do that. |
For platforms that don't have native 128-bit division, it may be faster to break it into parts that can do native division from there. Bases 2, 8, and 16 can split directly to 64-bit parts, and base-10 can split into three base-1013 parts. If some 32-bit targets don't have native 64-bit division, they may do even better with 5 base-108 parts stored in 32-bit, etc. |
Hmm, I just read the implementation, and it already chunks it by 10000, so that much is probably pretty good already. |
I haven’t tried it, but I believe the current implementation is going to be pretty terrible on platforms without u64. If I understand this code correctly, the |
I have yet another implementation in dtolnay/itoa#12. This one is 13x faster than std::fmt on my machine. |
Triage: this appears to have gotten even more extreme over time:
The reproduction involves cargo features and so cannot be done on the playground, details below.
[package]
name = "itoatest"
version = "0.1.0"
authors = ["Steve Klabnik <[email protected]>"]
edition = "2018"
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[dependencies.itoa]
version = "*"
features = ["std", "i128"] in #![feature(test)]
extern crate test;
extern crate itoa;
use test::Bencher;
#[bench]
fn stdlib(b: &mut Bencher) {
b.iter(|| {
let s = format!("{}", u128::max_value());
std::hint::black_box(s);
});
}
#[bench]
fn itoa(b: &mut Bencher) {
b.iter(|| {
let mut s = String::new();
itoa::fmt(&mut s, u128::max_value()).unwrap();
std::hint::black_box(s);
});
} |
Use less divisions in display u128/i128 This PR is an absolute mess, and I need to test if it improves the speed of fmt::Display for u128/i128, but I think it's correct. It hopefully is more efficient by cutting u128 into at most 2 u64s, and also chunks by 1e16 instead of just 1e4. Also I specialized the implementations for uints to always be non-false because it bothered me that it was checked at all Do not merge until I benchmark it and also clean up the god awful mess of spaghetti. Based on prior work in rust-lang#44583 cc: `@Dylan-DPC` Due to work on `itoa` and suggestion in original issue: r? `@dtolnay`
Is this still an open issue? I see a PR for this has been merged. |
Using @steveklabnik example (with added
I think there is still some room for improvements but it's much better now. |
That's likely the best std::fmt can do. The remaining 20ns discrepancy is not algorithmic, it is the overhead of the Formatter machinery. |
@henninglive has a significantly faster one in dtolnay/itoa#10 (comment). Let's provide the faster one in std::fmt!
The text was updated successfully, but these errors were encountered: