-
Notifications
You must be signed in to change notification settings - Fork 236
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support float case of format_number with format_float kernel #9790
Conversation
Signed-off-by: Haoyang Li <[email protected]>
Signed-off-by: Haoyang Li <[email protected]>
Signed-off-by: Haoyang Li <[email protected]>
@revans2 please take a look on this too if you have time, thanks! |
build |
Signed-off-by: Haoyang Li <[email protected]>
docs/compatibility.md
Outdated
This configuration is enabled by default. To disable this operation on the GPU set | ||
[`spark.rapids.sql.castFloatToString.enabled`](additional-functionality/advanced_configs.md#sql.castFloatToString.enabled) to `false`. | ||
|
||
The `format_number` function also use ryu as the solution when formatting floating-point data types to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: also uses ryu
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
'format_number(a, 5)').collect(), conf = float_format_number_conf) | ||
gpu = with_gpu_session(lambda spark: unary_op_df(spark, gen).selectExpr('*', | ||
'format_number(a, 5)').collect(), conf = float_format_number_conf) | ||
mismatched = sum(x[0] != x[1] for x in zip(cpu, gpu)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I preferred the version that checked that when we parsed them back to a float the numbers were within the error bounds instead of saying that we cannot be wrong more than some set percentage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
The test code is really not strong enough, I made a fix from the kernel side: NVIDIA/spark-rapids-jni#1676 |
Signed-off-by: Haoyang Li <[email protected]>
build |
Closes #9173
This PR adds better support for the float case of format_number, which made it good enough to be enabled by default.
It will have known compatibility issues from ryu, as the same way as float to string.
Depends on: NVIDIA/spark-rapids-jni#1572
performance test results
10000000 random number generated by BigDataGen:
test code:
I plan to move special case checking to the kernel next, but personally I think it is a nit for this pr.done and it runs much faster than before the change.results before move nan/inf replacement to kernel