Fix up the majority of IEEE compliance bugs for Double/Single ToString and Parse #19905

tannergooding · 2018-09-12T04:07:57Z

This resolves the majority of the issues covered by https://github.com/dotnet/coreclr/issues/19802:

Ensures that double.Parse for -0 works correctly
Ensures that recovering NaN, PositiveInfinity, and NegativeInfinity is done case insensitively
Ensures that "R" always roundtrips to the required number of digits
Ensures that the "precision" specifier works up to 50 digits
- 50 is the current internal limit for DoubleToNumber
- IEEE requires at least 20 for Double and 12 for Single

This should resolve (validated locally, can close after CoreFX tests are added):

https://github.com/dotnet/coreclr/issues/16783

This should partially resolve, both of which deal with ToString("R"), but which also involves parsing the result back as well:

Not Resolved:

https://github.com/dotnet/coreclr/issues/1316
https://github.com/dotnet/coreclr/issues/13615
- double.MinValue and double.MaxValue require 17 digits to roundtrip, otherwise an overflow occurs. The current default number of digits is 15, for any format that isn't "R"
https://github.com/dotnet/coreclr/issues/17467

This does not handle:

-nan and +nan, which IEEE requires we support parsing, but which we can parse to "Regular" NaN
snan, which IEEE requires we support parsing; but which we can parse to "regular" NaN
- support for preceding - and + is also required for snan
- IEEE does not require us to support formatting snan, we are free to format to "regular" NaN.
NaN and SNaN payloads, which are optional
+Infinity (case insensitive), which IEEE requires we support parsing

tannergooding · 2018-09-12T04:08:09Z

CC. @jkotas, @danmosemsft, @eerhardt

tannergooding · 2018-09-12T04:17:06Z

src/classlibnative/bcltype/number.cpp

@@ -304,13 +304,20 @@ void DoubleToNumber(double value, int precision, NUMBER* number)
    WRAPPER_NO_CONTRACT
    _ASSERTE(number != NULL);

+    if (precision > NUMBER_MAXDIGITS)
+    {
+        precision = NUMBER_MAXDIGITS;


Given that the format specifier is currently limited to xx (that is 0-99), maybe we should just set NUMBER_MAXDIGITS to 99. Thoughts?

Also, can users go over 99 digits with custom format specifiers? If so, we may need to tweak the algorithm to pad 0's past the "internal limit" (as per the IEEE requirement).

Edit: To clarify, I mean tweaking the NumberToString algorithm. DoubleToNumber can still be internally limited however we need, provided it is at least 20 for double and 12 for single.

I pushed a commit which changes NUMBER_MAXDIGITS to 99

tannergooding · 2018-09-12T04:30:59Z

src/System.Private.CoreLib/shared/System/Number.Formatting.cs

+                fmt = 'G';
+                digits = DoublePrecision;
+            }
+            else if (digits > 0)


G results in digits = -1 and G0 results in digits = 0.

In both cases, this results in the default precision (17 for double and 9 for single).

For G0, this matches the current behavior. The IEEE spec indicates that the lower limit for significant digits converted should be 1. So, we could also just make G0 be treated as G1.

Thoughts/Opinions?

tannergooding · 2018-09-12T04:32:10Z

We will need to update some of the various documentation for numeric format strings and the like: https://docs.microsoft.com/en-us/dotnet/standard/base-types/standard-numeric-format-strings

tannergooding · 2018-09-12T04:54:16Z

This should also measurable increase perf for "R". Will try to get numbers tomorrow.

src/System.Private.CoreLib/shared/System/Number.Formatting.cs

tannergooding · 2018-09-12T16:13:39Z

src/System.Private.CoreLib/shared/System/Number.Formatting.cs

+            {
+                // This ensures that, for the default case, we return a string containing the old
+                // number of digits (15), which results in "prettier" numbers for most cases.
+                digits = DefaultDoubleDigits;


Now, (0.1).ToString() will print 0.1 (as before). But requesting more than 15 digits will return the requested number of digits (up to 50).

For example, previously (0.1).ToString("G20") returned "0.10000000000000001" (which is the same as "G17"), but now will return "0.10000000000000000555" (as is required by the IEEE spec).

As mentioned here. The internal buffer for DoubleToNumber is set to 50, but the precision specifier supports up to 99. So it might be worth resizing it so that any requested number of digits works.

tannergooding · 2018-09-12T17:26:35Z

CoreFX has 393 failures from this, most of which are XML failures due to the "R" fix (XmlWriter serializes using "R": https://source.dot.net/#System.Private.Xml/System/Xml/XmlConvert.cs,780)

danmoseley · 2018-09-12T17:32:55Z

Can you give some examples of the values causing the failures? Looks like they use Assert.True not Assert.Equals unfortunately so it's not in the log.

tannergooding · 2018-09-12T17:44:56Z

One example is...

Expected:

<RectangleF xmlns=\"http://schemas.datacontract.org/2004/07/System.Drawing\" xmlns:i=\"http://www.w3.org/2001/XMLSchema-instance\">
  <height>-2.5</height>
  <width>1.5</width>
  <x>1.50001</x>
  <y>-2.5</y>
</RectangleF>

//

Actual:

<RectangleF xmlns=\"http://schemas.datacontract.org/2004/07/System.Drawing\" xmlns:i=\"http://www.w3.org/2001/XMLSchema-instance\">
  <height>-2.5</height>
  <width>1.5</width>
  <x>1.50001001</x>
  <y>-2.5</y>
</RectangleF>

Most of the other examples are similar. That is, they are numbers where the string returned did not have enough digits to always be an "exact" conversion and which could potentially result in a number that did not round-trip to the same bitvalue.

tannergooding · 2018-09-12T18:05:19Z

src/System.Private.CoreLib/shared/System/Number.Formatting.cs

+            else
+            {
+                // IEEE requires we correctly round to the requested number of digits
+                precision = digits;


Had to update this logic so that we preserve the fact that the user didn't explicitly specify a digit count, or that the user specified 0 for the digit count.

This makes a difference for things like double.ToString("N") which i had accidentally made to be treated like double.ToString("N15").

Now, digits is always preserved (except for "R", where it is explicitly documented to be ignored) and precision is updated accordingly.

This brought the CoreFX failure count down to 354, and they are now all XML and JSON serialization differences plus the UTF8Parser differences (which need fixup and for which the code lives in CoreFX).

…ity, and NegativeInfinity

…igits.

… non-roundtripping

…precision specifier allowed for standard numeric format strings

… of digits.

tannergooding · 2018-09-12T19:54:14Z

Dropped the commit that

Re-enables some tests temporarily disabled by #19775

The CoreFX side change hasn't flowed back into CoreCLR yet.

…tripping behavior.

tannergooding · 2018-09-12T21:34:23Z

I believe I have all the relevant tests disabled now and almost have corresponding fixes for all of them on the CoreFX side.

tannergooding · 2018-09-13T01:18:08Z

CoreFX Side PR fixing up all the tests is here: dotnet/corefx#32268

@jkotas, @danmosemsft, @eerhardt. This should be ready for final review now.

jkotas · 2018-09-13T05:30:32Z

#1316

I do not think that these changes are fixing #1316 (and other issues in your list related to parsing)

tannergooding · 2018-09-13T05:41:14Z

The big issue in #1316 was actually mostly fixed in #19775.

For x64, most of the issues listed in #1316 were fixed (except for -0.0) back when we moved to the Grisu3/Dragon4 algorithms (which ensured proper rounding and parsing).
For x86, the parsing issues remained until #19775 when I removed the custom x86 implementation that was using the x87 FPU stack.

The remaining issues related to parsing finite values should just be -0.0, which is being handled by this PR.

tannergooding · 2018-09-13T05:44:49Z

The other parsing issues also fall into the same category as #1316 and could be considered dupes.

I had tested a number of these scenarios locally already and was planning on porting the RealParserTests from Roslyn before finally closing all the bugs.

tannergooding · 2018-09-13T05:50:59Z

However, https://github.com/dotnet/coreclr/issues/13615 in particular is no longer fixed by this PR.

We would need to special-case double.MinValue and double.MaxValue to print at 17 digits for the default format specifier ("G").

jkotas · 2018-09-13T06:04:26Z

The big issue in #1316

I am not sure what you mean by the big issue #1316. The repro code in the top post in #1316 still prints wrong values:

Console.WriteLine("{0:G17}  {0:R}", Double.Parse("0.6822871999174000000"));
Console.WriteLine("{0:G17}  {0:R}", Double.Parse("0.6822871999174000001"));

Actual result:

0.68228719991740006  0.68228719991740006
0.68228719991739994  0.68228719991739994

Expected result:

0.68228719991740006  0.68228719991740006
0.68228719991740006  0.68228719991740006

jkotas · 2018-09-13T06:10:32Z

src/System.Private.CoreLib/shared/System/Number.Formatting.cs

-        private const int FloatPrecision = 7;
-        private const int DoublePrecision = 15;
+
+        // IEEE Requires at least 9 digits for roundtripping. However, using a lower number


Which paragraph does require this? I am not able to find it.

"5.12.2 External decimal character sequences representing finite numbers"

― for binary32, Pmin (binary32) = 9
― for binary64, Pmin (binary64) = 17

Conversions from a supported binary format bf to an external character sequence and back again
results in a copy of the original number so long as there are at least Pmin (bf ) significant digits
specified and the rounding-direction attributes in effect during the two conversions are round to
nearest rounding-direction attributes.

I do not read this as requirement. This paragraph just explains consequences of what is written above.

This says that round-tripping can be achieved by always having at least Pmin(bf) significant digits. It does not say that this is the only way to achieve round-tripping.

jkotas · 2018-09-13T06:14:55Z

Have you looked at what other languages do for floating point number round-tripping by any chance? I would like to know whether there is pattern that most other environments follow.

tannergooding · 2018-09-13T16:15:58Z

The repro code in the top post in #1316 still prints wrong values:

Ah, ok, I think I see the problem. It looks like (at the very least), we have a bug in our mantissa normalization logic here: https://github.com/dotnet/coreclr/blob/master/src/classlibnative/bcltype/number.cpp#L662

~~The "naive" algorithm for normalization would be:~~ Edit: Nevermind, this isn't the right algorithm either.

while (((val & 0xC000000000000000) != 0x8000000000000000) && ((val & 0xC000000000000000) != 0x4000000000000000))
{
    val <<= 1;
    exp--;
}

Which we look to be following for everything except the last shift (where we incorrectly shift if the bit is zero).

danmoseley · 2018-09-13T16:26:44Z

I wish there was a way of writing 0xC000000000000000 in C++ that meant I didn't have to laboriously count zeros. In C# of course we have 0xC000_0000_0000_0000 but I see many places in Corelib we don't do that.

jkotas · 2018-09-13T16:30:30Z

The "naive" algorithm for normalization would be:

I do not understand why we would need to care about the two top bits vs. just one that we care about today.

mikedn · 2018-09-13T16:37:31Z

I wish there was a way of writing 0xC000000000000000 in C++

0xC000'0000'0000'0000

danmoseley · 2018-09-13T16:46:09Z

@mikedn thanks. Incidentally I see that separately the VS debugger claims to support back ticks but in a quick try, it doesn't work for me.

mikedn · 2018-09-13T16:57:19Z

@danmosemsft That debugger things is non-standard. C++ does support digit separators since C++14: https://en.cppreference.com/w/cpp/language/integer_literal

danmoseley · 2018-09-13T17:23:57Z

@mikedn, right, but it wsan't working in the debugger. Anyway I'll stop distracting from the actual PR discussion... 😊

tannergooding · 2018-09-13T18:30:00Z

I do not understand why we would need to care about the two top bits vs. just one that we care about today.

@jkotas, you're right. Not sure what I was thinking for that....

In any case, the current normalization logic is doing the following:

if ((val & I64(0xFFFFFFFF00000000)) == 0) { val <<= 32; exp -= 32; }
if ((val & I64(0xFFFF000000000000)) == 0) { val <<= 16; exp -= 16; }
if ((val & I64(0xFF00000000000000)) == 0) { val <<= 8; exp -= 8; }
if ((val & I64(0xF000000000000000)) == 0) { val <<= 4; exp -= 4; }
if ((val & I64(0xC000000000000000)) == 0) { val <<= 2; exp -= 2; }
if ((val & I64(0x8000000000000000)) == 0) { val <<= 1; exp -= 1; }

index = absscale & 15;
if (index) {
    INT multexp = rgexp64Power10[index-1];
    // the exponents are shared between the inverted and regular table
    exp += (scale < 0) ? (-multexp + 1) : multexp;

    UINT64 multval = rgval64Power10[index + ((scale < 0) ? 15 : 0) - 1];
    val = Mul64Lossy(val, multval, &exp);
}

index = absscale >> 4;
if (index) {
    INT multexp = rgexp64Power10By16[index-1];
    // the exponents are shared between the inverted and regular table
    exp += (scale < 0) ? (-multexp + 1) : multexp;

    UINT64 multval = rgval64Power10By16[index + ((scale < 0) ? 21 : 0) - 1];
    val = Mul64Lossy(val, multval, &exp);
}

Before the normalization occurs, val contains the decimal representation of the double-precision number (6822871999174)

After the normalization is finished, val contains the following bit string:

1 01011101 01010100 10111111 01110100 00111111 11010001 1011 1 000 0000000
x yyyyyyyy yyyyyyyy yyyyyyyy yyyyyyyy yyyyyyyy yyyyyyyy yyyy z www wwwwwww

* x is the implicit bit
* y is the explicit bits
* z is the rounding bit
* w is the remaining bits (which are unused)

However, the correct bit string is:

1 01011101 01010100 10111111 01110100 00111111 11010001 1011 0 111 1111100
x yyyyyyyy yyyyyyyy yyyyyyyy yyyyyyyy yyyyyyyy yyyyyyyy yyyy z www wwwwwww

jkotas · 2018-09-13T18:37:19Z

Right, I think it is because of the Mul64Lossy is a wrong optimization. It needs to be a full precision multiplication instead.

tannergooding · 2018-09-13T19:33:08Z

Right, I think it is because of the Mul64Lossy is a wrong optimization. It needs to be a full precision multiplication instead.

That might be the case. Mul64Lossy has a comment that "the error won't propagate to any of the 53 significant bits of the result". However, we need to ensure that the 54th bit is accurate as well, for rounding purposes.

However, Mul64Precise looks to compute an incorrect result as well (same as before, but with the last w bit set to 1 now):

1 01011101 01010100 10111111 01110100 00111111 11010001 1011 1 000 0000001
x yyyyyyyy yyyyyyyy yyyyyyyy yyyyyyyy yyyyyyyy yyyyyyyy yyyy z www wwwwwww

It looks like there is a comment right above the pre-computed tables indicating that Mul64Precise may also be lossy:

//
// precomputed tables with powers of 10. These allows us to do at most
// two Mul64 during the conversion. This is important not only
// for speed, but also for precision because of Mul64 computes with 1 bit error.
//

tannergooding · 2018-09-13T21:51:26Z

Ok, so Mul64Precise is effectively trying to do 6822871999174 * 1e-13. However, it is doing that by doing simulating a double * double operation in software.

The result it returns is correct for double * double, but is incorrect for computing the "infinitely precise result, and then rounding" (this is because 1e-13 is not "exactly representable").

@jkotas, I would think the appropriate direction here is:

Merge this PR, which is a net improvement for several scenarios (provided you have no additional feedback on the current code)
Open a separate PR porting the roslyn "Real Parser" to CoreFX (Real parser roslyn#5871 and Correct conversion of float and double literals roslyn#5379).

From the Roslyn PR:

This is a port of real conversion routines used in the C++ front end written by @JamesMcNellis. There is a new separate Windows-specific solution for running a large number of random tests compared against the C standard library, which is based on David Gay's code (the "gold standard").

jkotas · 2018-09-13T22:01:31Z

Could you please update the list of issues that this PR is fixing at the top to match reality?

tannergooding · 2018-09-13T22:11:59Z

Could you please update the list of issues that this PR is fixing at the top to match reality?

Fixed in the original post.

This fully resolves #16783, as -0 will now fully roundtrip:

This partially resolves #3313 and #13106, as the numbers will format correctly under "R", but they may not parse correctly when reading them back.

tannergooding · 2018-09-13T22:27:40Z

Have you looked at what other languages do for floating point number round-tripping by any chance? I would like to know whether there is pattern that most other environments follow.

I looked at a number of other languages (Java, C/C++, Python, Rust) and didn't see any that had an explicit "round-trip" format specifier like .NET has.

For the other specifiesr they all seem to have some documented default (which is not the IEEE value) and which can be explicitly overridden.

For example, in C/C++, they default to 6 digits of precision, but otherwise return the number requested (trimming trailing zeros for some specifiers, but not others).

printf("%f\n", 1.1); // 1.100000
printf("%g\n", 1.1); // 1.1
printf("%e\n", 1.1); // 1.100000e+00
printf("%a\n", 1.1); // 0x1.199999999999ap+0

printf("%.15f\n", 1.1); // 1.100000000000000
printf("%.15g\n", 1.1); // 1.1
printf("%.15e\n", 1.1); // 1.100000000000000e+00
printf("%.15a\n", 1.1); // 0x1.199999999999a00p+0

printf("%.17f\n", 1.1); // 1.10000000000000009
printf("%.17g\n", 1.1); // 1.1000000000000001
printf("%.17e\n", 1.1); // 1.10000000000000009e+00
printf("%.17a\n", 1.1); // 0x1.199999999999a0000p+0

jkotas · 2018-09-13T23:54:01Z

Java's toString(double) seems to be speced to return both pretty and roundtripable number. From https://docs.oracle.com/javase/7/docs/api/java/lang/Double.html: There must be at least one digit to represent the fractional part, and beyond that as many, but only as many, more digits as are needed to uniquely distinguish the argument value.

jkotas · 2018-09-13T23:55:17Z

Python's default conversion to string is described as roundtripable. From https://docs.python.org/3/library/functions.html#repr: attempt to return a string that would yield an object with the same value when passed to eval(). It returns pretty strings too based on a quick test.

jkotas · 2018-09-13T23:58:00Z

Are Java and Python really both pretty and roundripable by default?

tannergooding · 2018-09-14T01:24:21Z

Briefly testing out a range of numbers in Java, it looks like yes. They looked to be correctly rounded as well.

I'm not sure some of the numbers would qualify as "pretty" however. And I am unsure of the performance of their algorithm.

jkotas · 2018-09-14T01:31:53Z

the numbers would qualify as "pretty" however

as many, but only as many, more digits as are needed is a good definition of "pretty" roundtripable formatted floating point number, I think.

tannergooding · 2018-09-14T01:51:18Z

Hmm, actually, looking around on the internet, and testing some additional values, it looks like Java does not always return the shortest string. In some cases, it will return more than 17 digits, and the result is not always correctly rounded.

A decent list of these issues can be found here: https://bugs.java.com/bugdatabase/view_bug.do?bug_id=4511638, which also includes some papers describing how to "properly" do this (and do it efficiently).

tannergooding · 2018-09-24T23:11:20Z

Closing for the time being. This needs some more cleanup and I want to do a bit more investigation into printing the shortest roundtrippable string.

tannergooding commented Sep 12, 2018

View reviewed changes

src/System.Private.CoreLib/shared/System/Number.Formatting.cs Show resolved Hide resolved

tannergooding commented Sep 12, 2018

View reviewed changes

tannergooding added 6 commits September 12, 2018 12:53

Updating Number.Parsing to properly recover -0

0785a5a

Updating Double/Single to ignore case when parsing NaN, PositiveInfin…

6ff3240

…ity, and NegativeInfinity

Updating Single/Double to always roundtrip to the correct number of d…

351abbb

…igits.

Updating Single/Double to keep the "prettier" default digit count for…

786929e

… non-roundtripping

Updating DoubleToNumber to support up to 99 digits, which is the max …

a909896

…precision specifier allowed for standard numeric format strings

Fixing FormatDouble and FormatSingle to preserve the requested number…

db591d5

… of digits.

Disabling various CoreFX tests that were dependent on incorrect round…

571e2ab

…tripping behavior.

tannergooding mentioned this pull request Sep 13, 2018

Fixing up various roundtripping/serialization tests that are impacted by the double/single tostring and parse changes dotnet/corefx#32268

Closed

jkotas reviewed Sep 13, 2018

View reviewed changes

tannergooding closed this Sep 24, 2018

Fix up the majority of IEEE compliance bugs for Double/Single ToString and Parse #19905

Fix up the majority of IEEE compliance bugs for Double/Single ToString and Parse #19905

Conversation

tannergooding commented Sep 12, 2018 • edited Loading

tannergooding commented Sep 12, 2018

Choose a reason for hiding this comment

tannergooding Sep 12, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tannergooding commented Sep 12, 2018

tannergooding commented Sep 12, 2018

Choose a reason for hiding this comment

tannergooding commented Sep 12, 2018

danmoseley commented Sep 12, 2018

tannergooding commented Sep 12, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tannergooding commented Sep 12, 2018

tannergooding commented Sep 12, 2018

tannergooding commented Sep 13, 2018

jkotas commented Sep 13, 2018 • edited Loading

tannergooding commented Sep 13, 2018

tannergooding commented Sep 13, 2018

tannergooding commented Sep 13, 2018

jkotas commented Sep 13, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jkotas commented Sep 13, 2018

tannergooding commented Sep 13, 2018 • edited Loading

danmoseley commented Sep 13, 2018

jkotas commented Sep 13, 2018

mikedn commented Sep 13, 2018

danmoseley commented Sep 13, 2018

mikedn commented Sep 13, 2018

danmoseley commented Sep 13, 2018

tannergooding commented Sep 13, 2018

jkotas commented Sep 13, 2018

tannergooding commented Sep 13, 2018 • edited Loading

tannergooding commented Sep 13, 2018

jkotas commented Sep 13, 2018

tannergooding commented Sep 13, 2018

tannergooding commented Sep 13, 2018

jkotas commented Sep 13, 2018

jkotas commented Sep 13, 2018 • edited Loading

jkotas commented Sep 13, 2018

tannergooding commented Sep 14, 2018

jkotas commented Sep 14, 2018 • edited Loading

tannergooding commented Sep 14, 2018

tannergooding commented Sep 24, 2018

tannergooding commented Sep 12, 2018 •

edited

Loading

tannergooding Sep 12, 2018 •

edited

Loading

jkotas commented Sep 13, 2018 •

edited

Loading

tannergooding commented Sep 13, 2018 •

edited

Loading

tannergooding commented Sep 13, 2018 •

edited

Loading

jkotas commented Sep 13, 2018 •

edited

Loading

jkotas commented Sep 14, 2018 •

edited

Loading