Use less divisions in display u128/i128 #76017

JulianKnodt · 2020-08-28T08:16:14Z

This PR is an absolute mess, and I need to test if it improves the speed of fmt::Display for u128/i128, but I think it's correct.
It hopefully is more efficient by cutting u128 into at most 2 u64s, and also chunks by 1e16 instead of just 1e4.

Also I specialized the implementations for uints to always be non-false because it bothered me that it was checked at all

Do not merge until I benchmark it and also clean up the god awful mess of spaghetti.
Based on prior work in #44583

cc: @Dylan-DPC

Due to work on itoa and suggestion in original issue:
r? @dtolnay

leonardo-m · 2020-08-28T09:18:05Z

library/core/src/fmt/num.rs

+}
+
+fn udiv_1e9(n: u128) -> (u128, u64) {
+    const DIV: u64 = 1e19 as u64;


You can remove this hard "as" cast from your code like this (needs const_int_pow feature):

const DIV: u64 = 10_u64.pow(19);

Do you mean this specific one or all the ascasts? In the other cases, I inlined the others, and while technically it should be const-propagated, the cast to me should always occur at compile time, more so than doing .pow(..).

This is also a convention as both should be equivalent.

I meant this specific cast.

(I'd like to see Rustc devs and stdlib devs start getting into the habit of minimizing the number of "as" casts in the rustc codebase, because such casts are sharp blades that sometimes cut you.)

Regarding the performance, using a const like I've suggested is zero-cost at run-time (I think).

In what case can these casts cause unexpected behaviour? In this case the constant is definitely in the u64 range.

If it's defined as const VAR: u64 = { expr }, then there is definitely no runtime cost. Inlining it elsewhere might incur some runtime cost.

In what case can these casts cause unexpected behaviour? In this case the constant is definitely in the u64 range.<

(That specific "as" doesn't cause unexpected behaviour. But being willing to use "as" everywhere is like being against the presence of unsafe{} statement in Rust language on the base of the code inside a specific usage of unsafe{} in a program being evidently safe. Even if a specific usage of "as" is safe, it's a sharp tool, and it may lead to problems in other cases, so better to minimize its usage in a codebase. I can call this language blindness: what a language puts in front is visible and taken care of, what a language doesn't care about the programmer too doesn't care regardless the troubles it could cause. But eventually the unsafety of using "as" will come out in Rust culture. The recently introduced Rustc suggestions to use try_into are a step forward).

Dylan-DPC-zz · 2020-08-28T20:01:36Z

marked this as waiting on author, when you are are done and ready for review, let me know :)

dtolnay

I am going to unsubscribe until this is ready to look at. Please re-request review when needed.

Dylan-DPC-zz · 2020-08-28T23:47:35Z

r? @Dylan-DPC

(i'll keep myself as a placeholder till this is ready )

JulianKnodt · 2020-08-29T03:42:29Z

Seems that this increases the speed of fmt noticeably, will clean up the code and prep for review.

leonardo-m · 2020-08-29T09:00:54Z

Related: #39078

nagisa · 2020-08-29T14:20:23Z

library/core/src/fmt/num.rs

+}
+
+/// Partition of `n` into n > 1e19 and rem <= 1e19
+fn udiv_1e9(n: u128) -> (u128, u64) {


Nit: the name says 1e9, the implementation does 1e19.

I find it surprising that LLVM does not strength-reduce the division by constants for 128-bit integers into shr (mul $CONST1) $CONST2, like it does for 64-bit ones. I wonder if its actually something infeasible, or if its just them having a wrong conditional in a wrong place.

(If you wanted to look into this http://gmplib.org/~tege/divcnst-pldi94.pdf describes the mechanism to strength-reduce)

So, I think the reason this happens is because this algorithm requires the upper half of the multiplication result, which for division by 128 bits means the upper 128-bits of the result when multiplying two 128-bit integers.

Since we don’t have a 256-bit multiplication algorithm, this is not something that's super feasible to implement here.

(It would still be faster than the iterative algorithm presented here)

By this algorithm, you mean the division into u128 and u64?
I've taken that component from itoa, so I'm afraid my understanding of it it lacking.
After doing some googling to figure out what strength reduction, I'm not sure if this is the place to do such, as it would seem pretty one-off as you say, but I do wonder if it could be added somewhere upstream.

I'm not sure if this is the place to do such, as it would seem pretty one-off

My problem with this is that once this algorithm lands, it is exceedingly easy to forget to remove this function once something faster, better is implemented upstream. It would be less of an issue if this was as fast as theoretically possible, as even if the upstream (LLVM) is fixed, this code won’t pessimize "just" because it implements the algorithm manually.

Would it be alright to update this, not resolve the issues, and leave a comment/marker on the corresponding issues to update in the future?

Dylan-DPC-zz · 2020-08-31T12:17:59Z

this is ready for review now

r? @dtolnay

dtolnay · 2020-09-19T03:56:40Z

I just had a lot of PRs stack up, so I need to reassign this in order to focus on ones that contain public API change.

r? @nagisa since you have already been looking into u128 division

JulianKnodt · 2020-09-19T06:58:25Z

No worries, this is not important at all, just a minor optimization, so if there are more pressing things those should take priority.

nagisa · 2020-09-26T13:28:20Z

Still somewhat uncomfortable with us encoding a division algorithm manually, but I guess we can remove it later once LLVM does the right thing.

@bors r+

bors · 2020-09-26T13:28:21Z

📌 Commit 1087590 has been approved by nagisa

jonas-schievink · 2020-09-26T21:04:32Z

@bors r- failed in #77240 (comment)

bors · 2020-09-28T20:28:13Z

☔ The latest upstream changes (presumably #77302) made this pull request unmergeable. Please resolve the merge conflicts.

Note that reviewers usually do not review pull requests until merge conflicts are resolved! Once you resolve the conflicts, you should change the labels applied by bors to indicate that your PR is ready for review. Post this as a comment to change the labels:

@rustbot modify labels: +S-waiting-on-review -S-waiting-on-author

Add zero padding Add benchmarks for fmt u128 This tests both when there is the max amount of work(all characters used) And least amount of work(1 character used)

nagisa · 2020-10-03T16:00:33Z

@bors r+ rollup=iffy

bors · 2020-10-03T16:00:34Z

📌 Commit 3f1d2aa has been approved by nagisa

bors · 2020-10-04T02:24:27Z

⌛ Testing commit 3f1d2aa with merge 4cf3dc1...

bors · 2020-10-04T04:33:17Z

☀️ Test successful - checks-actions, checks-azure
Approved by: nagisa
Pushing 4cf3dc1 to master...

tesuji · 2020-10-04T04:45:35Z

library/core/src/fmt/num.rs

 }
-doit! { i8 i16 i32 i64 i128 isize u8 u16 u32 u64 u128 usize }
+macro_rules! impl_uint {


Why two macros when they're doing the same thing?

Ah, here I was thinking it made sense to explicitly separate the two cases, as I was explicitly changing to recognize ints separately from uints, but I think I removed that change before the final revision.

All's good. Just cause a minor confusion to guess the intention to duplicate here.

rust-highfive assigned dtolnay Aug 28, 2020

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Aug 28, 2020

JulianKnodt force-pushed the fmt_fast branch from a65f6e3 to b0c7077 Compare August 28, 2020 08:29

leonardo-m reviewed Aug 28, 2020

View reviewed changes

JulianKnodt force-pushed the fmt_fast branch from b0c7077 to 770523c Compare August 28, 2020 19:01

Dylan-DPC-zz added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Aug 28, 2020

dtolnay reviewed Aug 28, 2020

View reviewed changes

dtolnay removed their assignment Aug 28, 2020

rust-highfive assigned Dylan-DPC-zz Aug 28, 2020

JulianKnodt force-pushed the fmt_fast branch 3 times, most recently from beb5fef to 036d693 Compare August 29, 2020 01:48

JulianKnodt force-pushed the fmt_fast branch 4 times, most recently from d4632f8 to ed49201 Compare August 29, 2020 06:45

JulianKnodt marked this pull request as ready for review August 29, 2020 06:49

nagisa reviewed Aug 29, 2020

View reviewed changes

JulianKnodt force-pushed the fmt_fast branch from ed49201 to 1087590 Compare August 29, 2020 22:44

rust-highfive assigned dtolnay and unassigned Dylan-DPC-zz Aug 31, 2020

nagisa mentioned this pull request Sep 13, 2020

Fast algorithm for u128 (and i128) divided by small constant #54867

Open

crlf0710 added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Sep 18, 2020

rust-highfive assigned nagisa and unassigned dtolnay Sep 19, 2020

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Sep 26, 2020

jonas-schievink mentioned this pull request Sep 26, 2020

Rollup of 12 pull requests #77240

Closed

bors added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Sep 26, 2020

JulianKnodt force-pushed the fmt_fast branch 2 times, most recently from ecb86fb to 605d520 Compare September 28, 2020 04:22

JulianKnodt force-pushed the fmt_fast branch 3 times, most recently from 4f09f3f to 0f6781a Compare September 28, 2020 20:38

Use more efficient scheme for display u128/i128

3f1d2aa

Add zero padding Add benchmarks for fmt u128 This tests both when there is the max amount of work(all characters used) And least amount of work(1 character used)

JulianKnodt force-pushed the fmt_fast branch from 0f6781a to 3f1d2aa Compare September 28, 2020 20:38

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Oct 3, 2020

bors added the merged-by-bors This PR was explicitly merged by bors. label Oct 4, 2020

bors merged commit 4cf3dc1 into rust-lang:master Oct 4, 2020

rustbot added this to the 1.49.0 milestone Oct 4, 2020

tesuji reviewed Oct 4, 2020

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use less divisions in display u128/i128 #76017

Use less divisions in display u128/i128 #76017

JulianKnodt commented Aug 28, 2020 •

edited

Loading

leonardo-m Aug 28, 2020

JulianKnodt Aug 28, 2020 •

edited

Loading

leonardo-m Aug 28, 2020 •

edited

Loading

JulianKnodt Aug 28, 2020

leonardo-m Aug 29, 2020

Dylan-DPC-zz commented Aug 28, 2020

dtolnay left a comment

Dylan-DPC-zz commented Aug 28, 2020

JulianKnodt commented Aug 29, 2020

leonardo-m commented Aug 29, 2020 •

edited

Loading

nagisa Aug 29, 2020 •

edited

Loading

nagisa Aug 29, 2020 •

edited

Loading

JulianKnodt Aug 29, 2020

nagisa Aug 30, 2020

JulianKnodt Aug 30, 2020 •

edited

Loading

Dylan-DPC-zz commented Aug 31, 2020

dtolnay commented Sep 19, 2020

JulianKnodt commented Sep 19, 2020

nagisa commented Sep 26, 2020

bors commented Sep 26, 2020

jonas-schievink commented Sep 26, 2020

bors commented Sep 28, 2020

nagisa commented Oct 3, 2020

bors commented Oct 3, 2020

bors commented Oct 4, 2020

bors commented Oct 4, 2020

tesuji Oct 4, 2020

JulianKnodt Oct 5, 2020

tesuji Oct 6, 2020

Use less divisions in display u128/i128 #76017

Use less divisions in display u128/i128 #76017

Conversation

JulianKnodt commented Aug 28, 2020 • edited Loading

Choose a reason for hiding this comment

JulianKnodt Aug 28, 2020 • edited Loading

Choose a reason for hiding this comment

leonardo-m Aug 28, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Dylan-DPC-zz commented Aug 28, 2020

dtolnay left a comment

Choose a reason for hiding this comment

Dylan-DPC-zz commented Aug 28, 2020

JulianKnodt commented Aug 29, 2020

leonardo-m commented Aug 29, 2020 • edited Loading

nagisa Aug 29, 2020 • edited Loading

Choose a reason for hiding this comment

nagisa Aug 29, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JulianKnodt Aug 30, 2020 • edited Loading

Choose a reason for hiding this comment

Dylan-DPC-zz commented Aug 31, 2020

dtolnay commented Sep 19, 2020

JulianKnodt commented Sep 19, 2020

nagisa commented Sep 26, 2020

bors commented Sep 26, 2020

jonas-schievink commented Sep 26, 2020

bors commented Sep 28, 2020

nagisa commented Oct 3, 2020

bors commented Oct 3, 2020

bors commented Oct 4, 2020

bors commented Oct 4, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JulianKnodt commented Aug 28, 2020 •

edited

Loading

JulianKnodt Aug 28, 2020 •

edited

Loading

leonardo-m Aug 28, 2020 •

edited

Loading

leonardo-m commented Aug 29, 2020 •

edited

Loading

nagisa Aug 29, 2020 •

edited

Loading

nagisa Aug 29, 2020 •

edited

Loading

JulianKnodt Aug 30, 2020 •

edited

Loading