Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

Fix up the majority of IEEE compliance bugs for Double/Single ToString and Parse #19905

Closed
wants to merge 7 commits into from
Closed

Fix up the majority of IEEE compliance bugs for Double/Single ToString and Parse #19905

wants to merge 7 commits into from

Conversation

tannergooding
Copy link
Member

@tannergooding tannergooding commented Sep 12, 2018

This resolves the majority of the issues covered by https://github.com/dotnet/coreclr/issues/19802:

  • Ensures that double.Parse for -0 works correctly
  • Ensures that recovering NaN, PositiveInfinity, and NegativeInfinity is done case insensitively
  • Ensures that "R" always roundtrips to the required number of digits
  • Ensures that the "precision" specifier works up to 50 digits
    • 50 is the current internal limit for DoubleToNumber
    • IEEE requires at least 20 for Double and 12 for Single

This should resolve (validated locally, can close after CoreFX tests are added):

This should partially resolve, both of which deal with ToString("R"), but which also involves parsing the result back as well:

Not Resolved:

This does not handle:

  • -nan and +nan, which IEEE requires we support parsing, but which we can parse to "Regular" NaN
  • snan, which IEEE requires we support parsing; but which we can parse to "regular" NaN
    • support for preceding - and + is also required for snan
    • IEEE does not require us to support formatting snan, we are free to format to "regular" NaN.
  • NaN and SNaN payloads, which are optional
  • +Infinity (case insensitive), which IEEE requires we support parsing

@tannergooding
Copy link
Member Author

CC. @jkotas, @danmosemsft, @eerhardt

@@ -304,13 +304,20 @@ void DoubleToNumber(double value, int precision, NUMBER* number)
WRAPPER_NO_CONTRACT
_ASSERTE(number != NULL);

if (precision > NUMBER_MAXDIGITS)
{
precision = NUMBER_MAXDIGITS;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the format specifier is currently limited to xx (that is 0-99), maybe we should just set NUMBER_MAXDIGITS to 99. Thoughts?

Copy link
Member Author

@tannergooding tannergooding Sep 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, can users go over 99 digits with custom format specifiers? If so, we may need to tweak the algorithm to pad 0's past the "internal limit" (as per the IEEE requirement).

Edit: To clarify, I mean tweaking the NumberToString algorithm. DoubleToNumber can still be internally limited however we need, provided it is at least 20 for double and 12 for single.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed a commit which changes NUMBER_MAXDIGITS to 99

fmt = 'G';
digits = DoublePrecision;
}
else if (digits > 0)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

G results in digits = -1 and G0 results in digits = 0.

In both cases, this results in the default precision (17 for double and 9 for single).

For G0, this matches the current behavior. The IEEE spec indicates that the lower limit for significant digits converted should be 1. So, we could also just make G0 be treated as G1.

  • Thoughts/Opinions?

@tannergooding
Copy link
Member Author

We will need to update some of the various documentation for numeric format strings and the like: https://docs.microsoft.com/en-us/dotnet/standard/base-types/standard-numeric-format-strings

@tannergooding
Copy link
Member Author

This should also measurable increase perf for "R". Will try to get numbers tomorrow.

{
// This ensures that, for the default case, we return a string containing the old
// number of digits (15), which results in "prettier" numbers for most cases.
digits = DefaultDoubleDigits;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now, (0.1).ToString() will print 0.1 (as before). But requesting more than 15 digits will return the requested number of digits (up to 50).

For example, previously (0.1).ToString("G20") returned "0.10000000000000001" (which is the same as "G17"), but now will return "0.10000000000000000555" (as is required by the IEEE spec).

As mentioned here. The internal buffer for DoubleToNumber is set to 50, but the precision specifier supports up to 99. So it might be worth resizing it so that any requested number of digits works.

@tannergooding
Copy link
Member Author

CoreFX has 393 failures from this, most of which are XML failures due to the "R" fix (XmlWriter serializes using "R": https://source.dot.net/#System.Private.Xml/System/Xml/XmlConvert.cs,780)

@danmoseley
Copy link
Member

Can you give some examples of the values causing the failures? Looks like they use Assert.True not Assert.Equals unfortunately so it's not in the log.

@tannergooding
Copy link
Member Author

One example is...

Expected:

<RectangleF xmlns=\"http://schemas.datacontract.org/2004/07/System.Drawing\" xmlns:i=\"http://www.w3.org/2001/XMLSchema-instance\">
  <height>-2.5</height>
  <width>1.5</width>
  <x>1.50001</x>
  <y>-2.5</y>
</RectangleF>

//

Actual:

<RectangleF xmlns=\"http://schemas.datacontract.org/2004/07/System.Drawing\" xmlns:i=\"http://www.w3.org/2001/XMLSchema-instance\">
  <height>-2.5</height>
  <width>1.5</width>
  <x>1.50001001</x>
  <y>-2.5</y>
</RectangleF>

Most of the other examples are similar. That is, they are numbers where the string returned did not have enough digits to always be an "exact" conversion and which could potentially result in a number that did not round-trip to the same bitvalue.

else
{
// IEEE requires we correctly round to the requested number of digits
precision = digits;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had to update this logic so that we preserve the fact that the user didn't explicitly specify a digit count, or that the user specified 0 for the digit count.

This makes a difference for things like double.ToString("N") which i had accidentally made to be treated like double.ToString("N15").

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now, digits is always preserved (except for "R", where it is explicitly documented to be ignored) and precision is updated accordingly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This brought the CoreFX failure count down to 354, and they are now all XML and JSON serialization differences plus the UTF8Parser differences (which need fixup and for which the code lives in CoreFX).

@tannergooding
Copy link
Member Author

Dropped the commit that

Re-enables some tests temporarily disabled by #19775

The CoreFX side change hasn't flowed back into CoreCLR yet.

@tannergooding
Copy link
Member Author

I believe I have all the relevant tests disabled now and almost have corresponding fixes for all of them on the CoreFX side.

@tannergooding
Copy link
Member Author

CoreFX Side PR fixing up all the tests is here: dotnet/corefx#32268

@jkotas, @danmosemsft, @eerhardt. This should be ready for final review now.

@jkotas
Copy link
Member

jkotas commented Sep 13, 2018

#1316

I do not think that these changes are fixing #1316 (and other issues in your list related to parsing)

@tannergooding
Copy link
Member Author

The big issue in #1316 was actually mostly fixed in #19775.

For x64, most of the issues listed in #1316 were fixed (except for -0.0) back when we moved to the Grisu3/Dragon4 algorithms (which ensured proper rounding and parsing).
For x86, the parsing issues remained until #19775 when I removed the custom x86 implementation that was using the x87 FPU stack.

The remaining issues related to parsing finite values should just be -0.0, which is being handled by this PR.

@tannergooding
Copy link
Member Author

The other parsing issues also fall into the same category as #1316 and could be considered dupes.

I had tested a number of these scenarios locally already and was planning on porting the RealParserTests from Roslyn before finally closing all the bugs.

@tannergooding
Copy link
Member Author

However, https://github.com/dotnet/coreclr/issues/13615 in particular is no longer fixed by this PR.

We would need to special-case double.MinValue and double.MaxValue to print at 17 digits for the default format specifier ("G").

@jkotas
Copy link
Member

jkotas commented Sep 13, 2018

The big issue in #1316

I am not sure what you mean by the big issue #1316. The repro code in the top post in #1316 still prints wrong values:

Console.WriteLine("{0:G17}  {0:R}", Double.Parse("0.6822871999174000000"));
Console.WriteLine("{0:G17}  {0:R}", Double.Parse("0.6822871999174000001"));

Actual result:

0.68228719991740006  0.68228719991740006
0.68228719991739994  0.68228719991739994

Expected result:

0.68228719991740006  0.68228719991740006
0.68228719991740006  0.68228719991740006

private const int FloatPrecision = 7;
private const int DoublePrecision = 15;

// IEEE Requires at least 9 digits for roundtripping. However, using a lower number
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which paragraph does require this? I am not able to find it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"5.12.2 External decimal character sequences representing finite numbers"

― for binary32, Pmin (binary32) = 9
― for binary64, Pmin (binary64) = 17

Conversions from a supported binary format bf to an external character sequence and back again
results in a copy of the original number so long as there are at least Pmin (bf ) significant digits
specified and the rounding-direction attributes in effect during the two conversions are round to
nearest rounding-direction attributes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not read this as requirement. This paragraph just explains consequences of what is written above.

This says that round-tripping can be achieved by always having at least Pmin(bf) significant digits. It does not say that this is the only way to achieve round-tripping.

@jkotas
Copy link
Member

jkotas commented Sep 13, 2018

Have you looked at what other languages do for floating point number round-tripping by any chance? I would like to know whether there is pattern that most other environments follow.

@tannergooding
Copy link
Member Author

tannergooding commented Sep 13, 2018

The repro code in the top post in #1316 still prints wrong values:

Ah, ok, I think I see the problem. It looks like (at the very least), we have a bug in our mantissa normalization logic here: https://github.com/dotnet/coreclr/blob/master/src/classlibnative/bcltype/number.cpp#L662

The "naive" algorithm for normalization would be: Edit: Nevermind, this isn't the right algorithm either.

while (((val & 0xC000000000000000) != 0x8000000000000000) && ((val & 0xC000000000000000) != 0x4000000000000000))
{
    val <<= 1;
    exp--;
}

Which we look to be following for everything except the last shift (where we incorrectly shift if the bit is zero).

@danmoseley
Copy link
Member

I wish there was a way of writing 0xC000000000000000 in C++ that meant I didn't have to laboriously count zeros. In C# of course we have 0xC000_0000_0000_0000 but I see many places in Corelib we don't do that.

@jkotas
Copy link
Member

jkotas commented Sep 13, 2018

The "naive" algorithm for normalization would be:

I do not understand why we would need to care about the two top bits vs. just one that we care about today.

@mikedn
Copy link

mikedn commented Sep 13, 2018

I wish there was a way of writing 0xC000000000000000 in C++

0xC000'0000'0000'0000

@danmoseley
Copy link
Member

@mikedn thanks. Incidentally I see that separately the VS debugger claims to support back ticks but in a quick try, it doesn't work for me.

@mikedn
Copy link

mikedn commented Sep 13, 2018

@danmosemsft That debugger things is non-standard. C++ does support digit separators since C++14: https://en.cppreference.com/w/cpp/language/integer_literal

@danmoseley
Copy link
Member

@mikedn, right, but it wsan't working in the debugger. Anyway I'll stop distracting from the actual PR discussion... 😊

@tannergooding
Copy link
Member Author

I do not understand why we would need to care about the two top bits vs. just one that we care about today.

@jkotas, you're right. Not sure what I was thinking for that....

In any case, the current normalization logic is doing the following:

if ((val & I64(0xFFFFFFFF00000000)) == 0) { val <<= 32; exp -= 32; }
if ((val & I64(0xFFFF000000000000)) == 0) { val <<= 16; exp -= 16; }
if ((val & I64(0xFF00000000000000)) == 0) { val <<= 8; exp -= 8; }
if ((val & I64(0xF000000000000000)) == 0) { val <<= 4; exp -= 4; }
if ((val & I64(0xC000000000000000)) == 0) { val <<= 2; exp -= 2; }
if ((val & I64(0x8000000000000000)) == 0) { val <<= 1; exp -= 1; }

index = absscale & 15;
if (index) {
    INT multexp = rgexp64Power10[index-1];
    // the exponents are shared between the inverted and regular table
    exp += (scale < 0) ? (-multexp + 1) : multexp;

    UINT64 multval = rgval64Power10[index + ((scale < 0) ? 15 : 0) - 1];
    val = Mul64Lossy(val, multval, &exp);
}

index = absscale >> 4;
if (index) {
    INT multexp = rgexp64Power10By16[index-1];
    // the exponents are shared between the inverted and regular table
    exp += (scale < 0) ? (-multexp + 1) : multexp;

    UINT64 multval = rgval64Power10By16[index + ((scale < 0) ? 21 : 0) - 1];
    val = Mul64Lossy(val, multval, &exp);
}

Before the normalization occurs, val contains the decimal representation of the double-precision number (6822871999174)

After the normalization is finished, val contains the following bit string:

1 01011101 01010100 10111111 01110100 00111111 11010001 1011 1 000 0000000
x yyyyyyyy yyyyyyyy yyyyyyyy yyyyyyyy yyyyyyyy yyyyyyyy yyyy z www wwwwwww

* x is the implicit bit
* y is the explicit bits
* z is the rounding bit
* w is the remaining bits (which are unused)

However, the correct bit string is:

1 01011101 01010100 10111111 01110100 00111111 11010001 1011 0 111 1111100
x yyyyyyyy yyyyyyyy yyyyyyyy yyyyyyyy yyyyyyyy yyyyyyyy yyyy z www wwwwwww

@jkotas
Copy link
Member

jkotas commented Sep 13, 2018

Right, I think it is because of the Mul64Lossy is a wrong optimization. It needs to be a full precision multiplication instead.

@tannergooding
Copy link
Member Author

tannergooding commented Sep 13, 2018

Right, I think it is because of the Mul64Lossy is a wrong optimization. It needs to be a full precision multiplication instead.

That might be the case. Mul64Lossy has a comment that "the error won't propagate to any of the 53 significant bits of the result". However, we need to ensure that the 54th bit is accurate as well, for rounding purposes.

However, Mul64Precise looks to compute an incorrect result as well (same as before, but with the last w bit set to 1 now):

1 01011101 01010100 10111111 01110100 00111111 11010001 1011 1 000 0000001
x yyyyyyyy yyyyyyyy yyyyyyyy yyyyyyyy yyyyyyyy yyyyyyyy yyyy z www wwwwwww

It looks like there is a comment right above the pre-computed tables indicating that Mul64Precise may also be lossy:

//
// precomputed tables with powers of 10. These allows us to do at most
// two Mul64 during the conversion. This is important not only
// for speed, but also for precision because of Mul64 computes with 1 bit error.
//

@tannergooding
Copy link
Member Author

Ok, so Mul64Precise is effectively trying to do 6822871999174 * 1e-13. However, it is doing that by doing simulating a double * double operation in software.

The result it returns is correct for double * double, but is incorrect for computing the "infinitely precise result, and then rounding" (this is because 1e-13 is not "exactly representable").

@jkotas, I would think the appropriate direction here is:

  1. Merge this PR, which is a net improvement for several scenarios (provided you have no additional feedback on the current code)
  2. Open a separate PR porting the roslyn "Real Parser" to CoreFX (Real parser roslyn#5871 and Correct conversion of float and double literals roslyn#5379).

From the Roslyn PR:

This is a port of real conversion routines used in the C++ front end written by @JamesMcNellis. There is a new separate Windows-specific solution for running a large number of random tests compared against the C standard library, which is based on David Gay's code (the "gold standard").

@jkotas
Copy link
Member

jkotas commented Sep 13, 2018

Could you please update the list of issues that this PR is fixing at the top to match reality?

@tannergooding
Copy link
Member Author

Could you please update the list of issues that this PR is fixing at the top to match reality?

Fixed in the original post.

This fully resolves #16783, as -0 will now fully roundtrip:

This partially resolves #3313 and #13106, as the numbers will format correctly under "R", but they may not parse correctly when reading them back.

@tannergooding
Copy link
Member Author

Have you looked at what other languages do for floating point number round-tripping by any chance? I would like to know whether there is pattern that most other environments follow.

I looked at a number of other languages (Java, C/C++, Python, Rust) and didn't see any that had an explicit "round-trip" format specifier like .NET has.

For the other specifiesr they all seem to have some documented default (which is not the IEEE value) and which can be explicitly overridden.

For example, in C/C++, they default to 6 digits of precision, but otherwise return the number requested (trimming trailing zeros for some specifiers, but not others).

printf("%f\n", 1.1); // 1.100000
printf("%g\n", 1.1); // 1.1
printf("%e\n", 1.1); // 1.100000e+00
printf("%a\n", 1.1); // 0x1.199999999999ap+0

printf("%.15f\n", 1.1); // 1.100000000000000
printf("%.15g\n", 1.1); // 1.1
printf("%.15e\n", 1.1); // 1.100000000000000e+00
printf("%.15a\n", 1.1); // 0x1.199999999999a00p+0

printf("%.17f\n", 1.1); // 1.10000000000000009
printf("%.17g\n", 1.1); // 1.1000000000000001
printf("%.17e\n", 1.1); // 1.10000000000000009e+00
printf("%.17a\n", 1.1); // 0x1.199999999999a0000p+0

@jkotas
Copy link
Member

jkotas commented Sep 13, 2018

Java's toString(double) seems to be speced to return both pretty and roundtripable number. From https://docs.oracle.com/javase/7/docs/api/java/lang/Double.html: There must be at least one digit to represent the fractional part, and beyond that as many, but only as many, more digits as are needed to uniquely distinguish the argument value.

@jkotas
Copy link
Member

jkotas commented Sep 13, 2018

Python's default conversion to string is described as roundtripable. From https://docs.python.org/3/library/functions.html#repr: attempt to return a string that would yield an object with the same value when passed to eval(). It returns pretty strings too based on a quick test.

@jkotas
Copy link
Member

jkotas commented Sep 13, 2018

Are Java and Python really both pretty and roundripable by default?

@tannergooding
Copy link
Member Author

Briefly testing out a range of numbers in Java, it looks like yes. They looked to be correctly rounded as well.

I'm not sure some of the numbers would qualify as "pretty" however. And I am unsure of the performance of their algorithm.

@jkotas
Copy link
Member

jkotas commented Sep 14, 2018

the numbers would qualify as "pretty" however

as many, but only as many, more digits as are needed is a good definition of "pretty" roundtripable formatted floating point number, I think.

@tannergooding
Copy link
Member Author

Hmm, actually, looking around on the internet, and testing some additional values, it looks like Java does not always return the shortest string. In some cases, it will return more than 17 digits, and the result is not always correctly rounded.

A decent list of these issues can be found here: https://bugs.java.com/bugdatabase/view_bug.do?bug_id=4511638, which also includes some papers describing how to "properly" do this (and do it efficiently).

@tannergooding
Copy link
Member Author

Closing for the time being. This needs some more cleanup and I want to do a bit more investigation into printing the shortest roundtrippable string.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants