Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle all special cases for Spak SQL conv #1314 #1346

Open
gerashegalov opened this issue Aug 15, 2023 · 0 comments
Open

Handle all special cases for Spak SQL conv #1314 #1346

gerashegalov opened this issue Aug 15, 2023 · 0 comments
Assignees

Comments

@gerashegalov
Copy link
Collaborator

gerashegalov commented Aug 15, 2023

In #1314 we provide only a limited support for conv bases 10, 16 based on the current libcudf functionality

This issue tracks a dedicated kernel that will implement
Spark/ Hive's conv for all bases / radices [2;36]. The to_base can be negative.
with input alphabet consisting of

  • maximum ten numerals [0-9] and max twenty-six [a-z] case insensitive Latin characters, optionally prefixed by - for negative numbers

Output consisting of

  • maximum ten numerals [0-9] and max twenty-six [A-Z] Latin characters, optionally prefixed by - for negative numbers if to_base is negative

It should check for overflow and based on the ANSI flag either throw an exceptions or use replace output value with the value corresponding to unsigned interpretation of -1 if to_base is positive ( e.g 'FFFFFFFFFFFFFFFF') and '-1' is negative.

leading spaces '0x20' prefix is ignored. Then consume potentially '-' and the longest consecutive string of the valid digits w.r.t. to from_base. If the first non-space char is neither numerical nor - char stop and produce 0.

Some examples:

>> spark.createDataFrame([('\x20\x20\x20-510garbage',), (-510,), ('garbage-510garabage',)], 'a string').select('a', f.conv('a', 10, -10)).show()
+-------------------+----------------+
|                  a|conv(a, 10, -10)|
+-------------------+----------------+
|        -510garbage|            -510|
|               -510|            -510|
|garbage-510garabage|               0|
+-------------------+----------------+
>>> spark.createDataFrame([('\x20\x20\x20-510garbage',), (-510,), ('garbage-510garabage',)], 'a string').select('a', f.conv('a', 10, 10)).show()
+-------------------+--------------------+
|                  a|     conv(a, 10, 10)|
+-------------------+--------------------+
|        -510garbage|18446744073709551106|
|               -510|18446744073709551106|
|garbage-510garabage|                   0|
+-------------------+--------------------+

Originally posted by @revans2 in #1314 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant