-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use less divisions in display u128/i128 #76017
Conversation
} | ||
|
||
fn udiv_1e9(n: u128) -> (u128, u64) { | ||
const DIV: u64 = 1e19 as u64; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can remove this hard "as" cast from your code like this (needs const_int_pow feature):
const DIV: u64 = 10_u64.pow(19);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean this specific one or all the as
casts? In the other cases, I inlined the others, and while technically it should be const-propagated, the cast to me should always occur at compile time, more so than doing .pow(..)
.
This is also a convention as both should be equivalent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant this specific cast.
(I'd like to see Rustc devs and stdlib devs start getting into the habit of minimizing the number of "as" casts in the rustc codebase, because such casts are sharp blades that sometimes cut you.)
Regarding the performance, using a const like I've suggested is zero-cost at run-time (I think).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In what case can these casts cause unexpected behaviour? In this case the constant is definitely in the u64 range.
If it's defined as const VAR: u64 = { expr }
, then there is definitely no runtime cost. Inlining it elsewhere might incur some runtime cost.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In what case can these casts cause unexpected behaviour? In this case the constant is definitely in the u64 range.<
(That specific "as" doesn't cause unexpected behaviour. But being willing to use "as" everywhere is like being against the presence of unsafe{} statement in Rust language on the base of the code inside a specific usage of unsafe{} in a program being evidently safe. Even if a specific usage of "as" is safe, it's a sharp tool, and it may lead to problems in other cases, so better to minimize its usage in a codebase. I can call this language blindness: what a language puts in front is visible and taken care of, what a language doesn't care about the programmer too doesn't care regardless the troubles it could cause. But eventually the unsafety of using "as" will come out in Rust culture. The recently introduced Rustc suggestions to use try_into are a step forward).
marked this as waiting on author, when you are are done and ready for review, let me know :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am going to unsubscribe until this is ready to look at. Please re-request review when needed.
r? @Dylan-DPC (i'll keep myself as a placeholder till this is ready ) |
beb5fef
to
036d693
Compare
Seems that this increases the speed of fmt noticeably, will clean up the code and prep for review. |
d4632f8
to
ed49201
Compare
Related: #39078 |
library/core/src/fmt/num.rs
Outdated
} | ||
|
||
/// Partition of `n` into n > 1e19 and rem <= 1e19 | ||
fn udiv_1e9(n: u128) -> (u128, u64) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: the name says 1e9
, the implementation does 1e19
.
I find it surprising that LLVM does not strength-reduce the division by constants for 128-bit integers into shr (mul $CONST1) $CONST2
, like it does for 64-bit ones. I wonder if its actually something infeasible, or if its just them having a wrong conditional in a wrong place.
(If you wanted to look into this http://gmplib.org/~tege/divcnst-pldi94.pdf describes the mechanism to strength-reduce)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I think the reason this happens is because this algorithm requires the upper half of the multiplication result, which for division by 128 bits means the upper 128-bits of the result when multiplying two 128-bit integers.
Since we don’t have a 256-bit multiplication algorithm, this is not something that's super feasible to implement here.
(It would still be faster than the iterative algorithm presented here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By this algorithm, you mean the division into u128 and u64?
I've taken that component from itoa
, so I'm afraid my understanding of it it lacking.
After doing some googling to figure out what strength reduction, I'm not sure if this is the place to do such, as it would seem pretty one-off as you say, but I do wonder if it could be added somewhere upstream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this is the place to do such, as it would seem pretty one-off
My problem with this is that once this algorithm lands, it is exceedingly easy to forget to remove this function once something faster, better is implemented upstream. It would be less of an issue if this was as fast as theoretically possible, as even if the upstream (LLVM) is fixed, this code won’t pessimize "just" because it implements the algorithm manually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be alright to update this, not resolve the issues, and leave a comment/marker on the corresponding issues to update in the future?
this is ready for review now r? @dtolnay |
I just had a lot of PRs stack up, so I need to reassign this in order to focus on ones that contain public API change. r? @nagisa since you have already been looking into u128 division |
No worries, this is not important at all, just a minor optimization, so if there are more pressing things those should take priority. |
Still somewhat uncomfortable with us encoding a division algorithm manually, but I guess we can remove it later once LLVM does the right thing. @bors r+ |
📌 Commit 1087590 has been approved by |
@bors r- failed in #77240 (comment) |
ecb86fb
to
605d520
Compare
☔ The latest upstream changes (presumably #77302) made this pull request unmergeable. Please resolve the merge conflicts. Note that reviewers usually do not review pull requests until merge conflicts are resolved! Once you resolve the conflicts, you should change the labels applied by bors to indicate that your PR is ready for review. Post this as a comment to change the labels:
|
4f09f3f
to
0f6781a
Compare
Add zero padding Add benchmarks for fmt u128 This tests both when there is the max amount of work(all characters used) And least amount of work(1 character used)
0f6781a
to
3f1d2aa
Compare
@bors r+ rollup=iffy |
📌 Commit 3f1d2aa has been approved by |
☀️ Test successful - checks-actions, checks-azure |
} | ||
doit! { i8 i16 i32 i64 i128 isize u8 u16 u32 u64 u128 usize } | ||
macro_rules! impl_uint { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why two macros when they're doing the same thing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, here I was thinking it made sense to explicitly separate the two cases, as I was explicitly changing to recognize ints separately from uints, but I think I removed that change before the final revision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All's good. Just cause a minor confusion to guess the intention to duplicate here.
This PR is an absolute mess, and I need to test if it improves the speed of fmt::Display for u128/i128, but I think it's correct.
It hopefully is more efficient by cutting u128 into at most 2 u64s, and also chunks by 1e16 instead of just 1e4.
Also I specialized the implementations for uints to always be non-false because it bothered me that it was checked at all
Do not merge until I benchmark it and also clean up the god awful mess of spaghetti.
Based on prior work in #44583
cc: @Dylan-DPC
Due to work on
itoa
and suggestion in original issue:r? @dtolnay