[BUG] The result of casting string to decimal is not identical to Apache Spark #2019
Labels
bug
Something isn't working
cudf_dependency
An issue or PR with this label depends on a new feature in cudf
Describe the bug
Currently string to decimal type on the GPU might produce results which slightly differed from the correct results when the string
represents any number exceeding the max precision that the operation CAST_STRING_TO_FLOAT can keep. For instance, the GPU returns 99999999999999987 for input string "99999999999999999".
The cause of divergence is that we can not directly cast strings containing scientific notation to decimal via cuDF API. So, we have to cast strings to floats firstly. Then, cast floats to decimals. The first step may lead to precision loss.
Steps/Code to reproduce bug
Here are samples to produce inconsistent results:
Expected behavior
The GPU runtime can produce exact same results as Apache Spark.
Additional context
Current issue is related to issue #1625 and PR #1999.
The text was updated successfully, but these errors were encountered: