-
Notifications
You must be signed in to change notification settings - Fork 236
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Support format_number #9173
Comments
This is fun because we essentially have to match java DecimalFormat code https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/text/DecimalFormat.html Because that is what it uses. Happily it looks like the formatting is hard coded to the |
@viadea Is it sufficient that second parameter is literal integer as first step for the customer's request? We can fully support the string pattern (maybe in cuDF or spark-rapids-jni) later. |
Since much of the complexity happens in parsing the format string in Java Most logic for format string parsing can follow or call I plan to work on supporting integer as 2nd parameter first to verify my solution, most code can be reused for string format case. |
Above is just my example.
where somecol is a double type. |
Unfortunately I think the float/double type for first parameter is also unable to be fully supported on plugin side, because casting float/double to string doesn't match Spark/Java's result, see compatibility doc. In normal way, we should first convert a float/double to string correctly before formatting it, and I also didn't find workaround to get enough information I want. I will create a PR to support other types and part support float/double soon. We may need a custom kernel for float to string casting, see #4204. |
Hi @viadea, do we have more information about the range/precision of the double customer will use? It could be difficult to exactly match Spark's behavior for doubles with high precision, but it will be easy to match them in a limited precision. |
Let me check that and will update you internally once i get the answer. |
To implement float to string part in the Instead I found https://github.com/ulfjack/ryu which is a popular solution of float to string and it has a C implementation in Apache license. If we can get approval to use some code from it, the custom kernel development will be easier. However It also can't fully match Java's results and the mismatched part is because Java's result is not good enough. (Actually Java switched to a new solution for float to string in higher version of JDK to fix those issues.) Since we can't use Java's code because of the license, it will be difficult to match Java's bug perfectly. @revans2 Do you think it's ok to use this solution for the float to string casting kernel? |
@thirtiseven I am fine with a different implementation for float/double to string and string to float/double, so long as
It would also be nice if we could be consistent in how we round trip the data float -> string -> float, double -> string -> double each produce the same value as the input (bit for bit). But that is not a requirement in any way. |
I wish we can support format_number function.
eg:
The text was updated successfully, but these errors were encountered: