Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unify parsing part of BigInteger with CoreLib #85978

Merged
merged 20 commits into from
Nov 29, 2023

Conversation

huoyaoyuan
Copy link
Member

Part of #28657. This is my third attempt of working on this. I'd like to keep formatting and further cleanup to follow-up PRs to help reviewing.

This PR does not refactor the algorithm part. It just adapts corelib-style patterns for BigInteger. Reviewing commit by commit is recommended.

@ghost ghost added the community-contribution Indicates that the PR has been added by a community member label May 9, 2023
@ghost
Copy link

ghost commented May 9, 2023

Tagging subscribers to this area: @dotnet/area-system-numerics
See info in area-owners.md if you want to be subscribed.

Issue Details

Part of #28657. This is my third attempt of working on this. I'd like to keep formatting and further cleanup to follow-up PRs to help reviewing.

This PR does not refactor the algorithm part. It just adapts corelib-style patterns for BigInteger. Reviewing commit by commit is recommended.

Author: huoyaoyuan
Assignees: -
Labels:

area-System.Numerics

Milestone: -

@huoyaoyuan
Copy link
Member Author

Also asking a question about formatting code here: the following pattern is heavily used in formatting code to share between UTF8 and UTF16:

private static void FormatNumber<TChar>(ref ValueListBuilder<TChar> vlb, ref NumberBuffer number, int nMaxDigits, NumberFormatInfo info) where TChar : unmanaged, IUtfChar<TChar>

What's the best approach to share those code out of CoreLib? Using #ifdef?

@huoyaoyuan
Copy link
Member Author

It's better to review and merge #84792 first.

@danmoseley
Copy link
Member

Merge conflicts - and then this is reviewable? Looks like the other one went in.

@huoyaoyuan
Copy link
Member Author

There's also potential massive conflict with #85392/#86875, and minor dependency with other numeric PRs. Can someone determine an order to review these?

@huoyaoyuan
Copy link
Member Author

@tannergooding wanna to discuss how to deal with this.

#86875 uses IUtf8Char in parsing, brings the same problem from formatting. IUtf8Char can't be used in System.Runtime.Numerics thus the majority of code need to update.

The approach I tried looks like this:

#if !SYSTEM_PRIVATE_CORELIB
using TChar = System.Char;
#pragma warning disable SA1121 // Use built-in type alias
#endif

#if SYSTEM_PRIVATE_CORELIB
        internal static unsafe TChar* UInt32ToDecChars<TChar>(TChar* bufferEnd, uint value, int digits) where TChar : unmanaged, IUtfChar<TChar>
#else
        internal static unsafe char* UInt32ToDecChars(char* bufferEnd, uint value, int digits)
#endif


#if SYSTEM_PRIVATE_CORELIB
        private static ReadOnlySpan<TChar> NegativeSign<TChar>(NumberFormatInfo info)
            where TChar : unmanaged, IUtfChar<TChar>
            => info.NegativeSignTChar<TChar>();
#else
        private static ReadOnlySpan<char> NegativeSign<TChar>(NumberFormatInfo info) => info.NegativeSign;
#endif

The workaround just works, but isn't expandable if we want to support UTF8 for BigInteger. What do you think about the best approach? Should I ask Stephen or someone else?

@adamsitnik adamsitnik self-assigned this Oct 20, 2023
Copy link
Member

@adamsitnik adamsitnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please excuse me for our Team not providing any review for so long. We have been hit by a wonderful talent redeployment quite hard, and this caused the delay.

Big thanks for removing the code duplication!

I've added some comments and I made it clear which can be ignored (or addressed in separate PR).

Overall the PR looks good, it's very thoughtful. However there are merge conflict so I am going to hit "request changes".

@huoyaoyuan thank you for your contribution!

int exp = 0;
do
{
// Check if we are about to overflow past our limit of 9 digits
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This refactoring change introduces a behavior change: https://www.diffchecker.com/ycRc6AAy/

image

I used git blame to verify that this is most likely desired, as it was introduced in #73643 as a bug fix with no breaking change label. More than a year has passed, so I assume it's safe.

cc @tannergooding

{
result = default;
return ParsingStatus.Failed;
throw e; // TryParse still throws ArgumentException on invalid NumberStyles
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Throwing an exception here will prevent from TryParseBigInteger getting inlined.

How about moving the throw to a private helper method?

static void Throw(Exception e) => throw e;

If you don't have the time to check it right now I am fine with that.

{
throw e; // TryParse still throws ArgumentException on invalid NumberStyles
buffer = stackalloc byte[value.Length + 1 + 1];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC it's recommended to use constant-size stack allocations for small buffers.

@jkotas please correct me if I am wrong

{
result = default;
return ParsingStatus.Failed;
buffer = arrayFromPool = ArrayPool<byte>.Shared.Rent(value.Length + 1 + 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's the second time 1 + 1 is being used and as a person who reads the code for the first time it's not obvious to me. Could you please add a comment? Or introduce a const with-self-describing-name and use it in both places?

@@ -503,6 +512,10 @@ private static ParsingStatus HexNumberToBigInteger(ref BigNumberBuffer number, o
ArrayPool<uint>.Shared.Return(arrayFromPool);
}
}

FailExit:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is using goto actually beneficial for the performance here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't actually remember where I take this pattern from.
The hex parsing itself can be rewritten with HexConverter in a follow-up PR. I will switch to simplest code.

{
Span<uint> currentBuffer;
int[]? arrayFromPoolForMultiplier = null;
try
{
totalDigitCount = Math.Min(number.digits.Length - 1, numberScale);
totalDigitCount = Math.Min(number.DigitsCount, numberScale);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is DigitsCount equal to number.digits.Length - 1 here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests are passing so I assume nothing is wrong. digits has a trailing zero in the comment.

@ghost ghost added the needs-author-action An issue or pull request that requires more info or actions from the author. label Oct 30, 2023
@huoyaoyuan
Copy link
Member Author

ACK for this PR:

Conflicts are coming from new features/refactors after creation of this PR. I'll redo some moving work to ensure no changes are lost.

Currently I will focus on my another PR first.

@ghost ghost removed the needs-author-action An issue or pull request that requires more info or actions from the author. label Nov 7, 2023
@adamsitnik adamsitnik added the needs-author-action An issue or pull request that requires more info or actions from the author. label Nov 7, 2023
@ghost ghost added the no-recent-activity label Nov 21, 2023
@ghost
Copy link

ghost commented Nov 21, 2023

This pull request has been automatically marked no-recent-activity because it has not had any activity for 14 days. It will be closed if no further activity occurs within 14 more days. Any new comment (by anyone, not necessarily the author) will remove no-recent-activity.

@ghost ghost removed needs-author-action An issue or pull request that requires more info or actions from the author. no-recent-activity labels Nov 28, 2023
@huoyaoyuan
Copy link
Member Author

Updated. It really helps that BigInteger hasn't adopt UTF-8, otherwise this will be totally invalidated.

@huoyaoyuan
Copy link
Member Author

huoyaoyuan commented Nov 28, 2023

BenchmarkDotNet v0.13.11-nightly.20231126.107, Windows 11 (10.0.22631.2715/23H2/2023Update/SunValley3)
13th Gen Intel Core i9-13900K, 1 CPU, 32 logical and 24 physical cores
.NET SDK 8.0.100
[Host] : .NET 8.0.0 (8.0.23.53103), X64 RyuJIT AVX2
Job-UKWEZQ : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2
Job-RXOVAY : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2

PowerPlanMode=00000000-0000-0000-0000-000000000000 Arguments=/p:EnableUnsafeBinaryFormatterSerialization=true IterationTime=250.0000 ms
MaxIterationCount=20 MinIterationCount=15 WarmupCount=1

Method Job Toolchain numberString Mean Error StdDev Median Min Max Ratio Gen0 Allocated Alloc Ratio
Parse Job-UKWEZQ \PR\corerun.exe -2147483648 39.18 ns 0.604 ns 0.565 ns 39.00 ns 38.49 ns 40.28 ns 0.85 0.0016 32 B 0.24
Parse Job-RXOVAY \main\corerun.exe -2147483648 45.89 ns 0.420 ns 0.372 ns 45.80 ns 45.40 ns 46.75 ns 1.00 0.0072 136 B 1.00
Parse Job-UKWEZQ \PR\corerun.exe 123 20.52 ns 0.149 ns 0.139 ns 20.49 ns 20.31 ns 20.77 ns 0.72 - - 0.00
Parse Job-RXOVAY \main\corerun.exe 123 28.36 ns 0.384 ns 0.340 ns 28.41 ns 27.57 ns 28.92 ns 1.00 0.0055 104 B 1.00
Parse Job-UKWEZQ \PR\corerun.exe 123456789012(...)901234567890 [200] 612.07 ns 4.446 ns 4.159 ns 612.18 ns 605.97 ns 619.63 ns 0.73 0.0049 112 B 0.11
Parse Job-RXOVAY \main\corerun.exe 123456789012(...)901234567890 [200] 833.89 ns 9.352 ns 8.748 ns 830.38 ns 825.52 ns 853.02 ns 1.00 0.0504 984 B 1.00

Result for corelib integers:

Method Job Toolchain value Mean Error StdDev Median Min Max Ratio RatioSD Allocated Alloc Ratio
Parse Job-NKUDXL \PR\corerun.exe -9223372036854775808 10.961 ns 0.1929 ns 0.1710 ns 10.951 ns 10.556 ns 11.252 ns 1.02 0.02 - NA
Parse Job-PUBWNA \main\corerun.exe -9223372036854775808 10.702 ns 0.1391 ns 0.1233 ns 10.732 ns 10.406 ns 10.894 ns 1.00 0.00 - NA
TryParse Job-NKUDXL \PR\corerun.exe -9223372036854775808 10.570 ns 0.1169 ns 0.0976 ns 10.555 ns 10.405 ns 10.811 ns 0.98 0.01 - NA
TryParse Job-PUBWNA \main\corerun.exe -9223372036854775808 10.736 ns 0.1037 ns 0.0866 ns 10.758 ns 10.560 ns 10.865 ns 1.00 0.00 - NA
ParseSpan Job-NKUDXL \PR\corerun.exe -9223372036854775808 11.077 ns 0.1187 ns 0.1110 ns 11.094 ns 10.866 ns 11.249 ns 0.99 0.01 - NA
ParseSpan Job-PUBWNA \main\corerun.exe -9223372036854775808 11.171 ns 0.1055 ns 0.0987 ns 11.155 ns 11.037 ns 11.343 ns 1.00 0.00 - NA
TryParseSpan Job-NKUDXL \PR\corerun.exe -9223372036854775808 10.851 ns 0.1555 ns 0.1454 ns 10.829 ns 10.632 ns 11.183 ns 1.01 0.02 - NA
TryParseSpan Job-PUBWNA \main\corerun.exe -9223372036854775808 10.708 ns 0.1344 ns 0.1257 ns 10.692 ns 10.487 ns 10.941 ns 1.00 0.00 - NA
Parse Job-NKUDXL \PR\corerun.exe 12345 4.501 ns 0.0587 ns 0.0521 ns 4.498 ns 4.430 ns 4.617 ns 0.97 0.01 - NA
Parse Job-PUBWNA \main\corerun.exe 12345 4.633 ns 0.0326 ns 0.0305 ns 4.637 ns 4.588 ns 4.677 ns 1.00 0.00 - NA
TryParse Job-NKUDXL \PR\corerun.exe 12345 4.586 ns 0.0285 ns 0.0252 ns 4.581 ns 4.561 ns 4.639 ns 1.00 0.01 - NA
TryParse Job-PUBWNA \main\corerun.exe 12345 4.570 ns 0.0223 ns 0.0174 ns 4.573 ns 4.536 ns 4.600 ns 1.00 0.00 - NA
ParseSpan Job-NKUDXL \PR\corerun.exe 12345 4.700 ns 0.0385 ns 0.0360 ns 4.702 ns 4.644 ns 4.766 ns 1.01 0.01 - NA
ParseSpan Job-PUBWNA \main\corerun.exe 12345 4.665 ns 0.0268 ns 0.0250 ns 4.660 ns 4.626 ns 4.714 ns 1.00 0.00 - NA
TryParseSpan Job-NKUDXL \PR\corerun.exe 12345 4.656 ns 0.0173 ns 0.0145 ns 4.655 ns 4.634 ns 4.683 ns 1.00 0.01 - NA
TryParseSpan Job-PUBWNA \main\corerun.exe 12345 4.645 ns 0.0240 ns 0.0213 ns 4.646 ns 4.616 ns 4.685 ns 1.00 0.00 - NA
Parse Job-NKUDXL \PR\corerun.exe 9223372036854775807 10.011 ns 0.0631 ns 0.0590 ns 10.007 ns 9.923 ns 10.092 ns 1.00 0.01 - NA
Parse Job-PUBWNA \main\corerun.exe 9223372036854775807 9.978 ns 0.0602 ns 0.0563 ns 9.959 ns 9.907 ns 10.076 ns 1.00 0.00 - NA
TryParse Job-NKUDXL \PR\corerun.exe 9223372036854775807 10.058 ns 0.0749 ns 0.0701 ns 10.058 ns 9.916 ns 10.198 ns 1.01 0.02 - NA
TryParse Job-PUBWNA \main\corerun.exe 9223372036854775807 9.951 ns 0.1824 ns 0.1706 ns 9.916 ns 9.725 ns 10.278 ns 1.00 0.00 - NA
ParseSpan Job-NKUDXL \PR\corerun.exe 9223372036854775807 9.992 ns 0.0587 ns 0.0520 ns 9.995 ns 9.897 ns 10.076 ns 1.02 0.01 - NA
ParseSpan Job-PUBWNA \main\corerun.exe 9223372036854775807 9.794 ns 0.0796 ns 0.0744 ns 9.789 ns 9.697 ns 9.910 ns 1.00 0.00 - NA
TryParseSpan Job-NKUDXL \PR\corerun.exe 9223372036854775807 9.867 ns 0.0668 ns 0.0625 ns 9.847 ns 9.782 ns 9.951 ns 0.99 0.01 - NA
TryParseSpan Job-PUBWNA \main\corerun.exe 9223372036854775807 10.004 ns 0.0328 ns 0.0256 ns 10.005 ns 9.949 ns 10.040 ns 1.00 0.00 - NA

Should be in acceptable noise range.

Copy link
Member

@adamsitnik adamsitnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you for your contribution and patience @huoyaoyuan !

PS. I am jealous of your hardware ;)

@adamsitnik
Copy link
Member

Failures are unrelated (#95298), merging!

@adamsitnik adamsitnik merged commit f66c1c1 into dotnet:main Nov 29, 2023
172 of 176 checks passed
@adamsitnik adamsitnik added this to the 9.0.0 milestone Nov 29, 2023
@huoyaoyuan huoyaoyuan deleted the numerics-format-1 branch November 29, 2023 12:36
@huoyaoyuan
Copy link
Member Author

Well earlier today I noticed the UTF-8/16 unification of CoreLib is not shared. This reveals the pain I faced for formatting. Will open a follow-up PR for that.

@huoyaoyuan
Copy link
Member Author

Opened #95402 as follow up.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Numerics community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants