-
Notifications
You must be signed in to change notification settings - Fork 235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Reimplement $ transpilation using cuDF new line terminator support #11554
Comments
We should start by adding Update: PR #17139 After this, we can begin migrating the Spark Regex APIs. This involves updating the transpiler function in the |
This PR introduces the necessary changes to the cuDF jni to support the issue described in [NVIDIA/spark-rapids#11554](NVIDIA/spark-rapids#11554). For further information, refer to the details in the [comment](NVIDIA/spark-rapids#11554 (comment)). Issue #15961 adds support for handling multiple line delimiters. This PR extends that functionality to JNI, which was previously missing, and also includes a test to validate the changes. Authors: - Suraj Aralihalli (https://github.com/SurajAralihalli) Approvers: - MithunR (https://github.com/mythrocks) - Robert (Bobby) Evans (https://github.com/revans2) URL: #17139
Is your feature request related to a problem? Please describe.
cuDF added support for multiple new-line characters in rapidsai/cudf#15961, which allows support for the different Java unicode line terminator characters. This requires passing a flag to the cuDF regex APIs to enable this mode, and updating the transpiler to a more simplified implementation of $ (which only needs to add support for the
\r\n
combination in addition to the individual characters already supported by cuDF:\n
line-feed (already supported)\r
carriage-return\u0085
next line (NEL)\u2028
line separator\u2029
paragraph separatorAdditional context
This might fix failing tests here:
test_re_replace_all
fails with a corner case #9731Also, can look into a possible solution for:
The text was updated successfully, but these errors were encountered: