-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Porting NumberToDouble to managed code. #20080
Porting NumberToDouble to managed code. #20080
Conversation
This is related to #19999, which ported the formatting code. CoreRT already has a similar port here: https://github.com/dotnet/corert/blob/master/src/System.Private.CoreLib/src/System/Number.CoreRT.cs#L302 |
This does not attempt to fix any of the known bugs that exist in the parsing code. |
} | ||
else | ||
{ | ||
return number.sign ? -0.0 : 0.0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did some minor cleanup here to remove the goto
.
src/System.Private.CoreLib/shared/System/Number.NumberToDouble.cs
Outdated
Show resolved
Hide resolved
src/System.Private.CoreLib/shared/System/Number.NumberToDouble.cs
Outdated
Show resolved
Hide resolved
namespace System | ||
{ | ||
internal unsafe partial class Number | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there anything useful you are able to say about the algorithm used, for the benefit of anyone maintaining it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly, but there wasn't any comments like that in the native code, so I can really only put what I am interpreting the code to be doing.
Given that the algorithm is currently known to be incorrect, I would think this should be replaced with a "correct" parsing algorithm soonish (hopefully) and we can add some proper algorithm comments then.
src/System.Private.CoreLib/shared/System/Number.NumberToDouble.cs
Outdated
Show resolved
Hide resolved
CC. @jkotas, since you helped review the previous PR as well. |
As with the previous PR... I locally ran the Roslyn RealParser suite, as well as did a basic benchmark on both float and double, covering 267,386,880 values in the input range (including both denormal and normal inputs). Benchmarking was done with Tiered Jitting disabled. The below benchmarks are just for |
Is this fair benchmark to use? This benchmark is not dominated by number formatting that you are touching. |
NumberToDouble is the parsing code, which is overall dominant code being tested (ParseNumber and NumberToDouble itself, as well as much less costly intermediate calls). I could remove all of the formatting calls but then we wouldn't have a benchmark that covers a wide range of inputs (currently testing 2.67m inputs) and I would think just testing a few values thousands of times would be overall less fair, given that the algorithm has to do more or less work based on the input string length. |
The numbers you posted say native NumberToDouble took 4.597s (exclusive?), managed NumberToDouble 5.094s, the total time is ~120s and the top unrelated formatting methods took 44s that is ~10x more than NumberToDouble. These numbers say to me that NumberToDouble regressed about 10% and that the actual time spent in NumberToDouble is a small fraction of the total. I think that the right number to quote for this change is regression range and average for calling
What is the distribution of these inputs? This would be fair only if distribution of these inputs represents what one would expect distribution of inputs for parsing to be in real world apps. |
Yes, that looks like the case. Given that the only code that changed between them was changing
That may not the right number to be looking at, you probably want to look at CPU time (which covers the time actually spent executing the program, and not any collection overhead):
We do not have any CoreFX microbenchmarks for double parsing yet (or if we do, I couldn't find them, because they aren't in the same assembly as the formatting tests or other double/single tests). It is one of the items I am working on.
It is 2.67m inputs evenly distributed across the finite input range of |
Should we add some before changing this?
How are the string lengths of these inputs distributed? I expect that real-world programs frequently parse short strings that do not use full precision - we should cover that case. |
I'm working on adding some now.
The code is doing: double d1 = BitConverter.Int64BitsToDouble((long)(bits));
string s1 = d1.ToString();
d1 = double.Parse(s1);
double d2 = -d1;
string s2 = d2.ToString();
d2 = double.Parse(s2); Which gives us a range of string input lengths:
The following additional metrics might be interesting:
|
Numbers from the CoreFX benchmarks I am adding are:
This is in comparison to: dotnet/corefx#32392 (comment)
|
src/System.Private.CoreLib/shared/System/Number.NumberToDouble.cs
Outdated
Show resolved
Hide resolved
LGTM otherwise |
* Porting NumberToDouble to managed code. * Deleting bcltype/number.cpp and bcltype/number.h * Fixing NumberToDouble to call Int64BitsToDouble, rather than DoubleToInt64Bits * Some minor code cleanup in NumberToDouble for better readability. * Some additional code cleanup in the Number.NumberToDouble.cs code Signed-off-by: dotnet-bot <[email protected]>
* Porting NumberToDouble to managed code. * Deleting bcltype/number.cpp and bcltype/number.h * Fixing NumberToDouble to call Int64BitsToDouble, rather than DoubleToInt64Bits * Some minor code cleanup in NumberToDouble for better readability. * Some additional code cleanup in the Number.NumberToDouble.cs code Signed-off-by: dotnet-bot <[email protected]>
* Porting NumberToDouble to managed code. * Deleting bcltype/number.cpp and bcltype/number.h * Fixing NumberToDouble to call Int64BitsToDouble, rather than DoubleToInt64Bits * Some minor code cleanup in NumberToDouble for better readability. * Some additional code cleanup in the Number.NumberToDouble.cs code Signed-off-by: dotnet-bot <[email protected]>
* Porting NumberToDouble to managed code. * Deleting bcltype/number.cpp and bcltype/number.h * Fixing NumberToDouble to call Int64BitsToDouble, rather than DoubleToInt64Bits * Some minor code cleanup in NumberToDouble for better readability. * Some additional code cleanup in the Number.NumberToDouble.cs code Signed-off-by: dotnet-bot <[email protected]>
@tannergooding I am not sure if it's a bug or not but (I am testing your managed impl in mono):
str is
|
This ports the double/single parsing code to be implemented in managed.