-
Notifications
You must be signed in to change notification settings - Fork 908
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix exponent overflow in strings-to-double conversion #15517
Fix exponent overflow in strings-to-double conversion #15517
Conversation
@@ -102,6 +102,7 @@ __device__ inline double stod(string_view const& d_str) | |||
ch = *in_ptr++; | |||
if (ch < '0' || ch > '9') break; | |||
exp_ten = (exp_ten * 10) + (int)(ch - '0'); | |||
if (exp_ten >= 1e8) { break; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like there's a logical flaw here. We know the exponent of a finite double value can only be as large as std::numeric_limits<double>::max_exponent10 == 308
, so why check against 100'000'000
? Anything else would go to infinity, unless I'm missing something.
Maybe this is what I am expecting to see:
if (exp_ten >= 1e8) { break; } | |
if (exp_ten >= (exp_sign == 1 ? std::numeric_limits<double>::max_exponent10 : std::numeric_limits<double>::min_exponent10)) { break; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm mistaken about the logic above, let's at least use integer-integer comparisons (1e8 is not an integer).
if (exp_ten >= 1e8) { break; } | |
if (exp_ten >= 100'000'000) { break; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The exceeding max_exponent
is actually handled below to map to infinity
(or -infinity
).
The check here is to make sure we don't overflow the integer which is UB.
I had assumed the compiler would convert 1e8 to an integer but it appears that is incorrect.
I'll change it to an integer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is an exp_off
to store the extra exponent part of the mantissa, and it will be added to exp_ten
after this while loop. So for a large number (which we expect to be an infinity), if the mantissa is very long and the exp_ten
is not large enough, the final exp_ten
could be wrong. So we need to pick a limit that is as large as possible.
Even if we set it to 100'000'000
, the above case would still happen for a string of length more than 1e8. It's a very edge case, just want to point it out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E.g.: a very very long number(length is Int.max_value):
0.00...[There are about Int.max_value zeros].....1E999999999
Because of the the following adjustment of exp_ten
, the exp_len
will be wrong.
Propose to use long
to save exp_ten
as currently max string length is Int.max_value.
And not sure If cuDF will support Long.max_value length string in future.
exp_ten *= exp_sign;
exp_ten += exp_off;
exp_ten += num_digits - 1;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A int.max long string would only be a one row column in libcudf since we have a max column size of int.max right now.
Regardless, I don't feel we need to increase the register size of this function to handle such a case. Likewise, a 100M length string would only be about 20 rows. I think this is a reasonable limit and could even be convinced a lower value is more practical.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comment about nullptr
-> ""
in tests. Looks good otherwise considering the discussion with Bradley above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add more detail to the comment. Otherwise LGTM.
/merge |
Description
Adds a check when computing the exponent in the strings-to-double conversion to prevent an integer overflow.
Closes #15508
Checklist