Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

Porting NumberToDouble to managed code. #20080

Merged
merged 5 commits into from
Sep 22, 2018
Merged

Porting NumberToDouble to managed code. #20080

merged 5 commits into from
Sep 22, 2018

Conversation

tannergooding
Copy link
Member

This ports the double/single parsing code to be implemented in managed.

@tannergooding
Copy link
Member Author

This is related to #19999, which ported the formatting code.

CoreRT already has a similar port here: https://github.com/dotnet/corert/blob/master/src/System.Private.CoreLib/src/System/Number.CoreRT.cs#L302

@tannergooding
Copy link
Member Author

This does not attempt to fix any of the known bugs that exist in the parsing code.

}
else
{
return number.sign ? -0.0 : 0.0;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did some minor cleanup here to remove the goto.

namespace System
{
internal unsafe partial class Number
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there anything useful you are able to say about the algorithm used, for the benefit of anyone maintaining it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly, but there wasn't any comments like that in the native code, so I can really only put what I am interpreting the code to be doing.

Given that the algorithm is currently known to be incorrect, I would think this should be replaced with a "correct" parsing algorithm soonish (hopefully) and we can add some proper algorithm comments then.

@tannergooding
Copy link
Member Author

CC. @jkotas, since you helped review the previous PR as well.

@tannergooding
Copy link
Member Author

As with the previous PR...

I locally ran the Roslyn RealParser suite, as well as did a basic benchmark on both float and double, covering 267,386,880 values in the input range (including both denormal and normal inputs).

Benchmarking was done with Tiered Jitting disabled.

The below benchmarks are just for double and show a small 1.09% regression in elapsed time.

Native
image

Managed
image

@jkotas
Copy link
Member

jkotas commented Sep 21, 2018

The below benchmarks are just for double and show a small 1.09% regression in elapsed time.

Is this fair benchmark to use? This benchmark is not dominated by number formatting that you are touching.

@tannergooding
Copy link
Member Author

NumberToDouble is the parsing code, which is overall dominant code being tested (ParseNumber and NumberToDouble itself, as well as much less costly intermediate calls).

I could remove all of the formatting calls but then we wouldn't have a benchmark that covers a wide range of inputs (currently testing 2.67m inputs) and I would think just testing a few values thousands of times would be overall less fair, given that the algorithm has to do more or less work based on the input string length.

@jkotas
Copy link
Member

jkotas commented Sep 21, 2018

The numbers you posted say native NumberToDouble took 4.597s (exclusive?), managed NumberToDouble 5.094s, the total time is ~120s and the top unrelated formatting methods took 44s that is ~10x more than NumberToDouble. These numbers say to me that NumberToDouble regressed about 10% and that the actual time spent in NumberToDouble is a small fraction of the total.

I think that the right number to quote for this change is regression range and average for calling Double.TryParse. What do the CoreFX microbenchmarks for double parsing say?

benchmark that covers a wide range of inputs (currently testing 2.67m inputs)

What is the distribution of these inputs? This would be fair only if distribution of these inputs represents what one would expect distribution of inputs for parsing to be in real world apps.

@tannergooding
Copy link
Member Author

The numbers you posted say native NumberToDouble took 4.597s (exclusive?), managed NumberToDouble 5.094s,

Yes, that looks like the case. Given that the only code that changed between them was changing NumberToDouble from native to managed, I had only pulled the regression numbers for the total CPU time.

the total time is ~120s

That may not the right number to be looking at, you probably want to look at CPU time (which covers the time actually spent executing the program, and not any collection overhead):

  • Elapsed time is the wall time from the beginning to the end of collection.
  • CPU Time is time during which the CPU is actively executing your application.

What do the CoreFX microbenchmarks for double parsing say?

We do not have any CoreFX microbenchmarks for double parsing yet (or if we do, I couldn't find them, because they aren't in the same assembly as the formatting tests or other double/single tests). It is one of the items I am working on.

What is the distribution of these inputs?

It is 2.67m inputs evenly distributed across the finite input range of double. Due to the actual distribution of double-numbers, the majority of them will be in an expected user-input range and a few of them will be extremely large or extremely small (subnormal) numbers.

@jkotas
Copy link
Member

jkotas commented Sep 21, 2018

We do not have any CoreFX microbenchmarks for double parsing

Should we add some before changing this?

It is 2.67m inputs evenly distributed across the finite input range of double.

How are the string lengths of these inputs distributed? I expect that real-world programs frequently parse short strings that do not use full precision - we should cover that case.

@tannergooding
Copy link
Member Author

Should we add some before changing this?

I'm working on adding some now.

How are the string lengths of these inputs distributed? I expect that real-world programs frequently parse short strings that do not use full precision - we should cover that case.

The code is doing:

double d1 = BitConverter.Int64BitsToDouble((long)(bits));
string s1 = d1.ToString();
d1 = double.Parse(s1);

double d2 = -d1;
string s2 = d2.ToString();
d2 = double.Parse(s2);

Which gives us a range of string input lengths:

[01]: 11
[02]: 99
[03]: 1000
[04]: 10020
[05]: 100230
[06]: 1002360
[07]: 2745872
[08]: 4212994
[09]: 4920747
[10]: 5021742
[11]: 4928777
[12]: 5020955
[13]: 4994085
[14]: 4986497
[15]: 5076973
[17]: 5406024
[16]: 6055297
[18]: 4503349
[19]: 13433937
[20]: 102880126
[21]: 92085785

The following additional metrics might be interesting:

Total Numbers:              267,386,880
Positive Numbers:           133,693,440
Negative Numbers:           133,693,440
Numbers with a decimal:     32,885,022
Number with an exponent:    201,181,932 (all also contain a decimal point)
Integer Numbers:            33,319,926

@tannergooding
Copy link
Member Author

Numbers from the CoreFX benchmarks I am adding are:

   System.Runtime.Performance.Tests.dll                                                                                                             | Metric   | Unit | Iterations |    Average |    STDEV.S |        Min |     Max
  :------------------------------------------------------------------------------------------------------------------------------------------------ |:-------- |:----:|:----------:| ----------:| ----------:| ----------:| -------:
   System.Tests.Perf_Double.DefaultTryParse(input: "0", innerIterations: 10000000)                                                                  | Duration | msec |     17     |    588.690 |      4.277 |    581.576 | 595.027
   System.Tests.Perf_Double.DefaultTryParse(input: "-0.0", innerIterations: 10000000)                                                               | Duration | msec |     16     |    633.134 |      6.304 |    626.540 | 651.333
   System.Tests.Perf_Double.DefaultTryParse(input: "1", innerIterations: 1000000)                                                                   | Duration | msec |    150     |     66.872 |      1.420 |     64.905 |  73.088
   System.Tests.Perf_Double.DefaultTryParse(input: "-1", innerIterations: 1000000)                                                                  | Duration | msec |    142     |     70.627 |      4.002 |     68.318 | 108.493
   System.Tests.Perf_Double.DefaultTryParse(input: "1.7976931348623157E+308", innerIterations: 100000)                                              | Duration | msec |    686     |     14.585 |      1.116 |     13.486 |  25.640
   System.Tests.Perf_Double.DefaultTryParse(input: "-1.7976931348623157E+308", innerIterations: 100000)                                             | Duration | msec |    665     |     15.041 |      1.631 |     13.949 |  28.559
   System.Tests.Perf_Double.DefaultTryParse(input: "2.2250738585072009E-308", innerIterations: 100000)                                              | Duration | msec |    671     |     14.911 |      0.642 |     14.233 |  21.663
   System.Tests.Perf_Double.DefaultTryParse(input: "-2.2250738585072009E-308", innerIterations: 100000)                                             | Duration | msec |    659     |     15.177 |      0.523 |     14.538 |  23.675
   System.Tests.Perf_Double.DefaultTryParse(input: "2.2250738585072014E-308", innerIterations: 100000)                                              | Duration | msec |    676     |     14.809 |      0.665 |     14.110 |  20.489
   System.Tests.Perf_Double.DefaultTryParse(input: "-2.2250738585072014E-308", innerIterations: 100000)                                             | Duration | msec |    657     |     15.232 |      1.374 |     14.411 |  35.720
   System.Tests.Perf_Double.DefaultTryParse(input: "2.7182818284590451", innerIterations: 1000000)                                                  | Duration | msec |     73     |    137.142 |     10.991 |    129.995 | 190.404
   System.Tests.Perf_Double.DefaultTryParse(input: "-2.7182818284590451", innerIterations: 1000000)                                                 | Duration | msec |     74     |    135.912 |      2.717 |    133.242 | 151.757
   System.Tests.Perf_Double.DefaultTryParse(input: "3.1415926535897931", innerIterations: 1000000)                                                  | Duration | msec |     75     |    134.697 |      8.483 |    128.895 | 174.130
   System.Tests.Perf_Double.DefaultTryParse(input: "-3.1415926535897931", innerIterations: 1000000)                                                 | Duration | msec |     75     |    134.653 |      1.419 |    132.088 | 138.892
   System.Tests.Perf_Double.DefaultTryParse(input: "4.94065645841247E-324", innerIterations: 100000)                                                | Duration | msec |    704     |     14.204 |      1.110 |     13.444 |  30.665
   System.Tests.Perf_Double.DefaultTryParse(input: "-4.94065645841247E-324", innerIterations: 100000)                                               | Duration | msec |    690     |     14.495 |      1.047 |     13.730 |  27.376
   System.Tests.Perf_Double.DefaultTryParse(input: "∞", innerIterations: 10000000)                                                                  | Duration | msec |     16     |    640.379 |     12.881 |    620.160 | 661.798
   System.Tests.Perf_Double.DefaultTryParse(input: "-∞", innerIterations: 10000000)                                                                 | Duration | msec |     16     |    648.858 |      4.701 |    643.843 | 662.045
   System.Tests.Perf_Double.DefaultTryParse(input: "NaN", innerIterations: 10000000)                                                                | Duration | msec |     17     |    603.244 |      4.212 |    595.898 | 612.024
   System.Tests.Perf_Single.DefaultTryParse(input: "0", innerIterations: 10000000)                                                                  | Duration | msec |     17     |    603.986 |     12.243 |    584.432 | 635.513
   System.Tests.Perf_Single.DefaultTryParse(input: "-0.0", innerIterations: 10000000)                                                               | Duration | msec |     16     |    645.467 |      7.618 |    632.851 | 658.606
   System.Tests.Perf_Single.DefaultTryParse(input: "1", innerIterations: 1000000)                                                                   | Duration | msec |    150     |     67.108 |      2.127 |     64.419 |  80.196
   System.Tests.Perf_Single.DefaultTryParse(input: "-1", innerIterations: 1000000)                                                                  | Duration | msec |    139     |     72.015 |      2.983 |     67.907 |  84.758
   System.Tests.Perf_Single.DefaultTryParse(input: "1.17549421E-38", innerIterations: 100000)                                                       | Duration | msec |    882     |     11.345 |      0.705 |     10.592 |  21.740
   System.Tests.Perf_Single.DefaultTryParse(input: "-1.17549421E-38", innerIterations: 100000)                                                      | Duration | msec |    852     |     11.737 |      0.512 |     11.025 |  15.971
   System.Tests.Perf_Single.DefaultTryParse(input: "1.17549435E-38", innerIterations: 100000)                                                       | Duration | msec |    887     |     11.272 |      0.530 |     10.586 |  17.378
   System.Tests.Perf_Single.DefaultTryParse(input: "-1.17549435E-38", innerIterations: 100000)                                                      | Duration | msec |    857     |     11.672 |      0.536 |     10.988 |  15.727
   System.Tests.Perf_Single.DefaultTryParse(input: "1.401298E-45", innerIterations: 100000)                                                         | Duration | msec |    931     |     10.700 |      0.528 |      9.916 |  15.130
   System.Tests.Perf_Single.DefaultTryParse(input: "-1.401298E-45", innerIterations: 100000)                                                        | Duration | msec |    908     |     11.014 |      0.535 |     10.361 |  15.990
   System.Tests.Perf_Single.DefaultTryParse(input: "2.71828175", innerIterations: 1000000)                                                          | Duration | msec |    101     |     99.531 |      2.497 |     96.171 | 114.522
   System.Tests.Perf_Single.DefaultTryParse(input: "-2.71828175", innerIterations: 1000000)                                                         | Duration | msec |     97     |    103.742 |      1.853 |    100.555 | 108.782
   System.Tests.Perf_Single.DefaultTryParse(input: "3.14159274", innerIterations: 1000000)                                                          | Duration | msec |    100     |    100.383 |      2.902 |     96.441 | 112.945
   System.Tests.Perf_Single.DefaultTryParse(input: "-3.14159274", innerIterations: 1000000)                                                         | Duration | msec |     95     |    105.349 |      5.387 |    100.270 | 130.419
   System.Tests.Perf_Single.DefaultTryParse(input: "3.40282347E+38", innerIterations: 100000)                                                       | Duration | msec |    893     |     11.193 |      0.606 |     10.418 |  15.594
   System.Tests.Perf_Single.DefaultTryParse(input: "-3.40282347E+38", innerIterations: 100000)                                                      | Duration | msec |    854     |     11.704 |      0.804 |     10.824 |  19.799
   System.Tests.Perf_Single.DefaultTryParse(input: "∞", innerIterations: 10000000)                                                                  | Duration | msec |     16     |    635.425 |     18.877 |    617.934 | 678.445
   System.Tests.Perf_Single.DefaultTryParse(input: "-∞", innerIterations: 10000000)                                                                 | Duration | msec |     16     |    644.618 |      4.971 |    638.880 | 653.558
   System.Tests.Perf_Single.DefaultTryParse(input: "NaN", innerIterations: 10000000)                                                                | Duration | msec |     17     |    605.072 |     10.871 |    594.756 | 630.581

This is in comparison to: dotnet/corefx#32392 (comment)

  • The "worst" case regression (for double) looks to be 25.8% for double.MaxValue (from 11.6ms avg to 14.6ms avg -- 100k inner iterations).
  • The "best" case regression (for double) looks to be 1.07% for -0.0 (from 626.4ms avg to 633.1ms avg -- 10m inner iterations).
  • The "average" regression (for all double) looks to be 4.64% (from 3739.317ms ttl to 3912.856ms ttl)

@jkotas
Copy link
Member

jkotas commented Sep 21, 2018

LGTM otherwise

@tannergooding tannergooding merged commit 09cc49e into dotnet:master Sep 22, 2018
dotnet-maestro-bot pushed a commit to dotnet-maestro-bot/corert that referenced this pull request Sep 22, 2018
* Porting NumberToDouble to managed code.

* Deleting bcltype/number.cpp and bcltype/number.h

* Fixing NumberToDouble to call Int64BitsToDouble, rather than DoubleToInt64Bits

* Some minor code cleanup in NumberToDouble for better readability.

* Some additional code cleanup in the Number.NumberToDouble.cs code

Signed-off-by: dotnet-bot <[email protected]>
dotnet-maestro-bot pushed a commit to dotnet-maestro-bot/corefx that referenced this pull request Sep 23, 2018
* Porting NumberToDouble to managed code.

* Deleting bcltype/number.cpp and bcltype/number.h

* Fixing NumberToDouble to call Int64BitsToDouble, rather than DoubleToInt64Bits

* Some minor code cleanup in NumberToDouble for better readability.

* Some additional code cleanup in the Number.NumberToDouble.cs code

Signed-off-by: dotnet-bot <[email protected]>
jkotas pushed a commit to dotnet/corert that referenced this pull request Sep 23, 2018
* Porting NumberToDouble to managed code.

* Deleting bcltype/number.cpp and bcltype/number.h

* Fixing NumberToDouble to call Int64BitsToDouble, rather than DoubleToInt64Bits

* Some minor code cleanup in NumberToDouble for better readability.

* Some additional code cleanup in the Number.NumberToDouble.cs code

Signed-off-by: dotnet-bot <[email protected]>
jkotas pushed a commit to dotnet/corefx that referenced this pull request Sep 23, 2018
* Porting NumberToDouble to managed code.

* Deleting bcltype/number.cpp and bcltype/number.h

* Fixing NumberToDouble to call Int64BitsToDouble, rather than DoubleToInt64Bits

* Some minor code cleanup in NumberToDouble for better readability.

* Some additional code cleanup in the Number.NumberToDouble.cs code

Signed-off-by: dotnet-bot <[email protected]>
@EgorBo
Copy link
Member

EgorBo commented Sep 23, 2018

@tannergooding I am not sure if it's a bug or not but (I am testing your managed impl in mono):

var str = -1234d.ToString("#,,", nfi);

str is "-" is it expected?

nfi is from some test case:

        var nfi = new NumberFormatInfo();

        nfi.NaNSymbol = "NaN";
        nfi.PositiveSign = "+";
        nfi.NegativeSign = "-";
        nfi.PerMilleSymbol = "x";
        nfi.PositiveInfinitySymbol = "Infinity";
        nfi.NegativeInfinitySymbol = "-Infinity";

        nfi.NumberDecimalDigits = 5;
        nfi.NumberDecimalSeparator = ".";
        nfi.NumberGroupSeparator = ",";
        nfi.NumberGroupSizes = new int[] { 3 };
        nfi.NumberNegativePattern = 2;

        nfi.CurrencyDecimalDigits = 2;
        nfi.CurrencyDecimalSeparator = ".";
        nfi.CurrencyGroupSeparator = ",";
        nfi.CurrencyGroupSizes = new int[] { 3 };
        nfi.CurrencyNegativePattern = 8;
        nfi.CurrencyPositivePattern = 3;
        nfi.CurrencySymbol = "$";

        nfi.PercentDecimalDigits = 5;
        nfi.PercentDecimalSeparator = ".";
        nfi.PercentGroupSeparator = ",";
        nfi.PercentGroupSizes = new int[] { 3 };
        nfi.PercentNegativePattern = 0;
        nfi.PercentPositivePattern = 0;
        nfi.PercentSymbol = "%";

@tannergooding
Copy link
Member Author

@EgorBo, that's a bug, but not from this change. It's from #19775.

I'll take a look and see if I can resolve the issue.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants