-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unify parsing part of BigInteger with CoreLib #85978
Conversation
Tagging subscribers to this area: @dotnet/area-system-numerics Issue DetailsPart of #28657. This is my third attempt of working on this. I'd like to keep formatting and further cleanup to follow-up PRs to help reviewing. This PR does not refactor the algorithm part. It just adapts corelib-style patterns for BigInteger. Reviewing commit by commit is recommended.
|
Also asking a question about formatting code here: the following pattern is heavily used in formatting code to share between UTF8 and UTF16: private static void FormatNumber<TChar>(ref ValueListBuilder<TChar> vlb, ref NumberBuffer number, int nMaxDigits, NumberFormatInfo info) where TChar : unmanaged, IUtfChar<TChar> What's the best approach to share those code out of CoreLib? Using #ifdef? |
It's better to review and merge #84792 first. |
Merge conflicts - and then this is reviewable? Looks like the other one went in. |
@tannergooding wanna to discuss how to deal with this. #86875 uses The approach I tried looks like this: #if !SYSTEM_PRIVATE_CORELIB
using TChar = System.Char;
#pragma warning disable SA1121 // Use built-in type alias
#endif
#if SYSTEM_PRIVATE_CORELIB
internal static unsafe TChar* UInt32ToDecChars<TChar>(TChar* bufferEnd, uint value, int digits) where TChar : unmanaged, IUtfChar<TChar>
#else
internal static unsafe char* UInt32ToDecChars(char* bufferEnd, uint value, int digits)
#endif
#if SYSTEM_PRIVATE_CORELIB
private static ReadOnlySpan<TChar> NegativeSign<TChar>(NumberFormatInfo info)
where TChar : unmanaged, IUtfChar<TChar>
=> info.NegativeSignTChar<TChar>();
#else
private static ReadOnlySpan<char> NegativeSign<TChar>(NumberFormatInfo info) => info.NegativeSign;
#endif The workaround just works, but isn't expandable if we want to support UTF8 for BigInteger. What do you think about the best approach? Should I ask Stephen or someone else? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please excuse me for our Team not providing any review for so long. We have been hit by a wonderful talent redeployment quite hard, and this caused the delay.
Big thanks for removing the code duplication!
I've added some comments and I made it clear which can be ignored (or addressed in separate PR).
Overall the PR looks good, it's very thoughtful. However there are merge conflict so I am going to hit "request changes".
@huoyaoyuan thank you for your contribution!
src/libraries/System.Private.CoreLib/src/System.Private.CoreLib.Shared.projitems
Show resolved
Hide resolved
int exp = 0; | ||
do | ||
{ | ||
// Check if we are about to overflow past our limit of 9 digits |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This refactoring change introduces a behavior change: https://www.diffchecker.com/ycRc6AAy/
I used git blame to verify that this is most likely desired, as it was introduced in #73643 as a bug fix with no breaking change label. More than a year has passed, so I assume it's safe.
{ | ||
result = default; | ||
return ParsingStatus.Failed; | ||
throw e; // TryParse still throws ArgumentException on invalid NumberStyles |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Throwing an exception here will prevent from TryParseBigInteger
getting inlined.
How about moving the throw to a private helper method?
static void Throw(Exception e) => throw e;
If you don't have the time to check it right now I am fine with that.
{ | ||
throw e; // TryParse still throws ArgumentException on invalid NumberStyles | ||
buffer = stackalloc byte[value.Length + 1 + 1]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC it's recommended to use constant-size stack allocations for small buffers.
@jkotas please correct me if I am wrong
{ | ||
result = default; | ||
return ParsingStatus.Failed; | ||
buffer = arrayFromPool = ArrayPool<byte>.Shared.Rent(value.Length + 1 + 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's the second time 1 + 1 is being used and as a person who reads the code for the first time it's not obvious to me. Could you please add a comment? Or introduce a const with-self-describing-name and use it in both places?
@@ -503,6 +512,10 @@ private static ParsingStatus HexNumberToBigInteger(ref BigNumberBuffer number, o | |||
ArrayPool<uint>.Shared.Return(arrayFromPool); | |||
} | |||
} | |||
|
|||
FailExit: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is using goto
actually beneficial for the performance here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't actually remember where I take this pattern from.
The hex parsing itself can be rewritten with HexConverter in a follow-up PR. I will switch to simplest code.
{ | ||
Span<uint> currentBuffer; | ||
int[]? arrayFromPoolForMultiplier = null; | ||
try | ||
{ | ||
totalDigitCount = Math.Min(number.digits.Length - 1, numberScale); | ||
totalDigitCount = Math.Min(number.DigitsCount, numberScale); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is DigitsCount
equal to number.digits.Length - 1
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests are passing so I assume nothing is wrong. digits
has a trailing zero in the comment.
ACK for this PR: Conflicts are coming from new features/refactors after creation of this PR. I'll redo some moving work to ensure no changes are lost. Currently I will focus on my another PR first. |
This pull request has been automatically marked |
Updated. It really helps that BigInteger hasn't adopt UTF-8, otherwise this will be totally invalidated. |
BenchmarkDotNet v0.13.11-nightly.20231126.107, Windows 11 (10.0.22631.2715/23H2/2023Update/SunValley3) PowerPlanMode=00000000-0000-0000-0000-000000000000 Arguments=/p:EnableUnsafeBinaryFormatterSerialization=true IterationTime=250.0000 ms
Result for corelib integers:
Should be in acceptable noise range. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you for your contribution and patience @huoyaoyuan !
PS. I am jealous of your hardware ;)
src/libraries/System.Private.CoreLib/src/System.Private.CoreLib.Shared.projitems
Show resolved
Hide resolved
Failures are unrelated (#95298), merging! |
Well earlier today I noticed the UTF-8/16 unification of CoreLib is not shared. This reveals the pain I faced for formatting. Will open a follow-up PR for that. |
Opened #95402 as follow up. |
Part of #28657. This is my third attempt of working on this. I'd like to keep formatting and further cleanup to follow-up PRs to help reviewing.
This PR does not refactor the algorithm part. It just adapts corelib-style patterns for BigInteger. Reviewing commit by commit is recommended.