-
Notifications
You must be signed in to change notification settings - Fork 236
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
casting double to string does not match Spark #4204
Comments
Note that this behavior is expected and documented in the current release. See the spark.rapids.sql.castFloatToString.enabled documentation which states that the result does not always match Spark. That is why this behavior is not enabled by default and the user must explicitly enable it once they are sure it will not affect their application. |
I agree with you on different precision of CPU and GPU. Apart from that, in Rapids, the result contains |
Yup this is a bug in our code where we clean things up. It looks like we are looking for spark-rapids/sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuCast.scala Lines 729 to 733 in b18492e
We should be consistent as much as we can be. |
I am a little curious about the way of testing. spark-rapids/tests/src/test/scala/com/nvidia/spark/rapids/GpuExpressionTestSuite.scala Lines 145 to 173 in b18492e
|
It depends on your definition of "equal." The purpose of that test is to verify that if someone tried to turn the string back into a float, it would be "close enough" to the Spark CPU version. It's not intending to check if we produce the exact same string as Spark, as we already know we don't simply because of precision errors. That's one of many reasons why this feature is disabled by default. |
Updated the documentation to add clarification that more than just precision can be different with the resulting string. This is unlikely to be fixed until we add a custom kernel for casting floating point to string that can be compatible with Java/Spark and thus remove then need for the castFloatToString config entirely. |
I have found another example that @andygrove found while testing
|
Describe the bug
I tried to cast
5.0e-10
to string. On Spark 3.2, I got"5.0E-10"
; on spark-rapids I got"5.0e-10"
Steps/Code to reproduce bug
Spark result:
spark-rapids result:
Expected behavior
I hope in rapids, it gives
5.0E-10
Environment details (please complete the following information)
Spark 3.2.0
rapids 22.02.0
cudf 22.02.0
using spark-shell on my desktop
setConf("spark.rapids.sql.castFloatToString.enabled", "true")
Additional context
This issue is related to #4028.
The text was updated successfully, but these errors were encountered: