-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Fix up the majority of IEEE compliance bugs for Double/Single ToString and Parse #19905
Conversation
@@ -304,13 +304,20 @@ void DoubleToNumber(double value, int precision, NUMBER* number) | |||
WRAPPER_NO_CONTRACT | |||
_ASSERTE(number != NULL); | |||
|
|||
if (precision > NUMBER_MAXDIGITS) | |||
{ | |||
precision = NUMBER_MAXDIGITS; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that the format specifier is currently limited to xx
(that is 0-99), maybe we should just set NUMBER_MAXDIGITS
to 99
. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, can users go over 99 digits with custom format specifiers? If so, we may need to tweak the algorithm to pad 0's past the "internal limit" (as per the IEEE requirement).
Edit: To clarify, I mean tweaking the NumberToString
algorithm. DoubleToNumber
can still be internally limited however we need, provided it is at least 20 for double and 12 for single.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed a commit which changes NUMBER_MAXDIGITS
to 99
fmt = 'G'; | ||
digits = DoublePrecision; | ||
} | ||
else if (digits > 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
G
results in digits = -1
and G0
results in digits = 0
.
In both cases, this results in the default precision (17
for double and 9
for single).
For G0
, this matches the current behavior. The IEEE spec indicates that the lower limit for significant digits converted should be 1. So, we could also just make G0
be treated as G1
.
- Thoughts/Opinions?
We will need to update some of the various documentation for numeric format strings and the like: https://docs.microsoft.com/en-us/dotnet/standard/base-types/standard-numeric-format-strings |
This should also measurable increase perf for "R". Will try to get numbers tomorrow. |
{ | ||
// This ensures that, for the default case, we return a string containing the old | ||
// number of digits (15), which results in "prettier" numbers for most cases. | ||
digits = DefaultDoubleDigits; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now, (0.1).ToString()
will print 0.1
(as before). But requesting more than 15
digits will return the requested number of digits (up to 50
).
For example, previously (0.1).ToString("G20")
returned "0.10000000000000001"
(which is the same as "G17"
), but now will return "0.10000000000000000555"
(as is required by the IEEE spec).
As mentioned here. The internal buffer for DoubleToNumber
is set to 50
, but the precision specifier supports up to 99
. So it might be worth resizing it so that any requested number of digits works.
CoreFX has 393 failures from this, most of which are XML failures due to the "R" fix (XmlWriter serializes using "R": https://source.dot.net/#System.Private.Xml/System/Xml/XmlConvert.cs,780) |
Can you give some examples of the values causing the failures? Looks like they use Assert.True not Assert.Equals unfortunately so it's not in the log. |
One example is... Expected: <RectangleF xmlns=\"http://schemas.datacontract.org/2004/07/System.Drawing\" xmlns:i=\"http://www.w3.org/2001/XMLSchema-instance\">
<height>-2.5</height>
<width>1.5</width>
<x>1.50001</x>
<y>-2.5</y>
</RectangleF> // Actual: <RectangleF xmlns=\"http://schemas.datacontract.org/2004/07/System.Drawing\" xmlns:i=\"http://www.w3.org/2001/XMLSchema-instance\">
<height>-2.5</height>
<width>1.5</width>
<x>1.50001001</x>
<y>-2.5</y>
</RectangleF> Most of the other examples are similar. That is, they are numbers where the string returned did not have enough digits to always be an "exact" conversion and which could potentially result in a number that did not round-trip to the same bitvalue. |
else | ||
{ | ||
// IEEE requires we correctly round to the requested number of digits | ||
precision = digits; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had to update this logic so that we preserve the fact that the user didn't explicitly specify a digit count, or that the user specified 0
for the digit count.
This makes a difference for things like double.ToString("N")
which i had accidentally made to be treated like double.ToString("N15")
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now, digits
is always preserved (except for "R"
, where it is explicitly documented to be ignored) and precision is updated accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This brought the CoreFX failure count down to 354, and they are now all XML and JSON serialization differences plus the UTF8Parser differences (which need fixup and for which the code lives in CoreFX).
…ity, and NegativeInfinity
… non-roundtripping
…precision specifier allowed for standard numeric format strings
Dropped the commit that
The CoreFX side change hasn't flowed back into CoreCLR yet. |
…tripping behavior.
I believe I have all the relevant tests disabled now and almost have corresponding fixes for all of them on the CoreFX side. |
CoreFX Side PR fixing up all the tests is here: dotnet/corefx#32268 @jkotas, @danmosemsft, @eerhardt. This should be ready for final review now. |
I do not think that these changes are fixing #1316 (and other issues in your list related to parsing) |
The big issue in #1316 was actually mostly fixed in #19775. For x64, most of the issues listed in #1316 were fixed (except for The remaining issues related to parsing finite values should just be |
The other parsing issues also fall into the same category as #1316 and could be considered dupes. I had tested a number of these scenarios locally already and was planning on porting the RealParserTests from Roslyn before finally closing all the bugs. |
However, https://github.com/dotnet/coreclr/issues/13615 in particular is no longer fixed by this PR. We would need to special-case |
I am not sure what you mean by the big issue #1316. The repro code in the top post in #1316 still prints wrong values: Console.WriteLine("{0:G17} {0:R}", Double.Parse("0.6822871999174000000"));
Console.WriteLine("{0:G17} {0:R}", Double.Parse("0.6822871999174000001")); Actual result:
Expected result:
|
private const int FloatPrecision = 7; | ||
private const int DoublePrecision = 15; | ||
|
||
// IEEE Requires at least 9 digits for roundtripping. However, using a lower number |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which paragraph does require this? I am not able to find it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"5.12.2 External decimal character sequences representing finite numbers"
― for binary32, Pmin (binary32) = 9
― for binary64, Pmin (binary64) = 17
Conversions from a supported binary format bf to an external character sequence and back again
results in a copy of the original number so long as there are at least Pmin (bf ) significant digits
specified and the rounding-direction attributes in effect during the two conversions are round to
nearest rounding-direction attributes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not read this as requirement. This paragraph just explains consequences of what is written above.
This says that round-tripping can be achieved by always having at least Pmin(bf)
significant digits. It does not say that this is the only way to achieve round-tripping.
Have you looked at what other languages do for floating point number round-tripping by any chance? I would like to know whether there is pattern that most other environments follow. |
Ah, ok, I think I see the problem. It looks like (at the very least), we have a bug in our mantissa normalization logic here: https://github.com/dotnet/coreclr/blob/master/src/classlibnative/bcltype/number.cpp#L662
while (((val & 0xC000000000000000) != 0x8000000000000000) && ((val & 0xC000000000000000) != 0x4000000000000000))
{
val <<= 1;
exp--;
} Which we look to be following for everything except the last shift (where we incorrectly shift if the bit is zero). |
I wish there was a way of writing |
I do not understand why we would need to care about the two top bits vs. just one that we care about today. |
|
@mikedn thanks. Incidentally I see that separately the VS debugger claims to support back ticks but in a quick try, it doesn't work for me. |
@danmosemsft That debugger things is non-standard. C++ does support digit separators since C++14: https://en.cppreference.com/w/cpp/language/integer_literal |
@mikedn, right, but it wsan't working in the debugger. Anyway I'll stop distracting from the actual PR discussion... 😊 |
@jkotas, you're right. Not sure what I was thinking for that.... In any case, the current normalization logic is doing the following: if ((val & I64(0xFFFFFFFF00000000)) == 0) { val <<= 32; exp -= 32; }
if ((val & I64(0xFFFF000000000000)) == 0) { val <<= 16; exp -= 16; }
if ((val & I64(0xFF00000000000000)) == 0) { val <<= 8; exp -= 8; }
if ((val & I64(0xF000000000000000)) == 0) { val <<= 4; exp -= 4; }
if ((val & I64(0xC000000000000000)) == 0) { val <<= 2; exp -= 2; }
if ((val & I64(0x8000000000000000)) == 0) { val <<= 1; exp -= 1; }
index = absscale & 15;
if (index) {
INT multexp = rgexp64Power10[index-1];
// the exponents are shared between the inverted and regular table
exp += (scale < 0) ? (-multexp + 1) : multexp;
UINT64 multval = rgval64Power10[index + ((scale < 0) ? 15 : 0) - 1];
val = Mul64Lossy(val, multval, &exp);
}
index = absscale >> 4;
if (index) {
INT multexp = rgexp64Power10By16[index-1];
// the exponents are shared between the inverted and regular table
exp += (scale < 0) ? (-multexp + 1) : multexp;
UINT64 multval = rgval64Power10By16[index + ((scale < 0) ? 21 : 0) - 1];
val = Mul64Lossy(val, multval, &exp);
} Before the normalization occurs, After the normalization is finished,
However, the correct bit string is:
|
Right, I think it is because of the Mul64Lossy is a wrong optimization. It needs to be a full precision multiplication instead. |
That might be the case. However,
It looks like there is a comment right above the pre-computed tables indicating that
|
Ok, so The result it returns is correct for @jkotas, I would think the appropriate direction here is:
From the Roslyn PR:
|
Could you please update the list of issues that this PR is fixing at the top to match reality? |
Fixed in the original post. This fully resolves #16783, as This partially resolves #3313 and #13106, as the numbers will format correctly under "R", but they may not parse correctly when reading them back. |
I looked at a number of other languages (Java, C/C++, Python, Rust) and didn't see any that had an explicit "round-trip" format specifier like .NET has. For the other specifiesr they all seem to have some documented default (which is not the IEEE value) and which can be explicitly overridden. For example, in C/C++, they default to 6 digits of precision, but otherwise return the number requested (trimming trailing zeros for some specifiers, but not others). printf("%f\n", 1.1); // 1.100000
printf("%g\n", 1.1); // 1.1
printf("%e\n", 1.1); // 1.100000e+00
printf("%a\n", 1.1); // 0x1.199999999999ap+0
printf("%.15f\n", 1.1); // 1.100000000000000
printf("%.15g\n", 1.1); // 1.1
printf("%.15e\n", 1.1); // 1.100000000000000e+00
printf("%.15a\n", 1.1); // 0x1.199999999999a00p+0
printf("%.17f\n", 1.1); // 1.10000000000000009
printf("%.17g\n", 1.1); // 1.1000000000000001
printf("%.17e\n", 1.1); // 1.10000000000000009e+00
printf("%.17a\n", 1.1); // 0x1.199999999999a0000p+0 |
Java's |
Python's default conversion to string is described as roundtripable. From https://docs.python.org/3/library/functions.html#repr: attempt to return a string that would yield an object with the same value when passed to |
Are Java and Python really both pretty and roundripable by default? |
Briefly testing out a range of numbers in Java, it looks like yes. They looked to be correctly rounded as well. I'm not sure some of the numbers would qualify as "pretty" however. And I am unsure of the performance of their algorithm. |
|
Hmm, actually, looking around on the internet, and testing some additional values, it looks like Java does not always return the shortest string. In some cases, it will return more than 17 digits, and the result is not always correctly rounded. A decent list of these issues can be found here: https://bugs.java.com/bugdatabase/view_bug.do?bug_id=4511638, which also includes some papers describing how to "properly" do this (and do it efficiently). |
Closing for the time being. This needs some more cleanup and I want to do a bit more investigation into printing the shortest roundtrippable string. |
This resolves the majority of the issues covered by https://github.com/dotnet/coreclr/issues/19802:
-0
works correctlyNaN
,PositiveInfinity
, andNegativeInfinity
is done case insensitively50
digits50
is the current internal limit forDoubleToNumber
20
for Double and12
for SingleThis should resolve (validated locally, can close after CoreFX tests are added):
This should partially resolve, both of which deal with
ToString("R")
, but which also involves parsing the result back as well:Not Resolved:
double.MinValue
anddouble.MaxValue
require 17 digits to roundtrip, otherwise an overflow occurs. The current default number of digits is 15, for any format that isn't "R"This does not handle:
-nan
and+nan
, which IEEE requires we support parsing, but which we can parse to "Regular"NaN
snan
, which IEEE requires we support parsing; but which we can parse to "regular"NaN
-
and+
is also required forsnan
snan
, we are free to format to "regular"NaN
.NaN
andSNaN
payloads, which are optional+Infinity
(case insensitive), which IEEE requires we support parsing