-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimise and bug-fix to_digit
#475
Comments
other than changing the checks for |
ok, actually not premature optimization, the current pub fn to_digit(digit: char, radix: u32) -> Option<u32> {
assert!(radix >= 2 && radix <= 36);
let lc_digit = digit as u32 | 0x20;
let digit = if digit > '9' && radix > 10 {
lc_digit.wrapping_sub('a' as u32).wrapping_add(10)
} else {
(digit as u32).wrapping_sub('0' as u32)
};
if digit < radix {
Some(digit)
} else {
None
}
} |
as far as I know, this would be the only instance of a safe method containing "unchecked" in its name? what about a private |
also, "unchecked" also feels wrong because it still contains the |
The function has a simple check of “digit or letter?” as spaghetti code. I needed to look three times to exactly get it. I am simplifying that. IMHO that’s a win on its own. I am not experienced with branchless, which is why I offer three alternate simplifications. If none is a clear winner, they might need to be benchmarked on various CPUs. That’s not something I’m in a position to do. OTOH, if the compiler turns the latter two into the same as what we had, just choose the prettiest! I’m also not an expert on compiler optimisation. If it can really eliminate an inlined check many statements after a similar one (near the start of @programmerjake As for your new implementation, it’s essentially the same as my 3rd variant. Except you added an unnecessary |
that's necessary to eliminate the check for |
We talked about this in today's libs-api meeting. We would love a PR for the bugfix to check for radix >= 2, and we'd love a PR switching to the branchless version that LLVM can optimize better. Beyond that, adding annotations and internal functions to make sure LLVM can properly inline this and hoist the checks would be fine as well. We'd prefer not to make that internal version public unless there are still substantial performance differences after those PRs; we'd want to see specific performance benchmarks motivating any further change. |
which version do you mean by that? the first, second, or third one in the top comment, or the one I wrote? |
Whatever improves on the current version and solves the problem; not judging which of the implementations is best. |
@programmerjake I’m not a contributor and wouldn’t currently have time to get in that deep. So I’m happy to let you do the PR. However my 3rd version retains the comments, so it would be a better basis. I like your idea of pulling out the lc into a variable. It would make it clearer, if you had not pulled it out of its if-branch. That again unnecessarily gives a reader something to contemplate… |
ok, created a PR: rust-lang/rust#132709 |
I'm going to close this ACP on the basis that the optimization PR doesn't need an ACP supporting it, and can be evaluated without one. |
…it, r=joshtriplett optimize char::to_digit and assert radix is at least 2 approved by t-libs: rust-lang/libs-team#475 (comment) let me know if this needs an assembly test or similar.
…it, r=joshtriplett optimize char::to_digit and assert radix is at least 2 approved by t-libs: rust-lang/libs-team#475 (comment) let me know if this needs an assembly test or similar.
…, r=joshtriplett optimize char::to_digit and assert radix is at least 2 approved by t-libs: rust-lang/libs-team#475 (comment) let me know if this needs an assembly test or similar.
…riplett optimize char::to_digit and assert radix is at least 2 approved by t-libs: rust-lang/libs-team#475 (comment) let me know if this needs an assembly test or similar.
…riplett optimize char::to_digit and assert radix is at least 2 approved by t-libs: rust-lang/libs-team#475 (comment) let me know if this needs an assembly test or similar.
Proposal
Problem statement
Implementation of
to_digit
is convoluted and inefficiently does too much. Yet it still accepts radices0
&1
. While there is no such system as nullary, unary is a totally different scheme, with only digit1
. That is not implemented by this function. All numbers Rust deals with in any base are in positional notation. There0
is always the smallest digit. And incrementing the biggest digit gives 10. The smallest radix for which this is possible is two.Motivating examples or use cases
It is inefficient to reassert the radix for each digit again and again. I propose to split this function into a wrapper
to_digit
that does its due diligence and a workerto_digit_unchecked
. I have checked all callers ofto_digit
, to see where it is already safe, or can be made safe, to switch toto_digit_unchecked
. Maybe each time a comment should be added to the place that guarantees a valid radix, to avert accidentally eliminating the guarantee in future:bounds already checked outside of loop, so can switch:
literal base in a variable, only valid values, so can switch:
in a loop, should add bounds check and switch:
literal radices, where the compiler can hopefully eliminate the
assert
s, so no need, but might do it to save compile time:Solution sketch
I propose three variants to choose the style and assumed efficiency you see fit:
Links and related work
rust-lang/rust#132428
This discussion claims that the new bounds check might break backwards compatibility. I doubt that code would rely on a broken implementation. But if that is a concern, the new
assert
could be activated with edition 2024. OTOH, with that rationale one could never fix bugs…The text was updated successfully, but these errors were encountered: