-
Notifications
You must be signed in to change notification settings - Fork 236
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Rework GpuSubstringIndex to use cudf::slice_strings #8750
Comments
A Spark UT failed related to
It is from following case:
Spark-shell reproduce:
Since |
rapidsai/cudf#13373 |
cudf::strings::find()/rfind() is supposed to be called multiple times in order to find the right position. Now I prefer to put logics a new kernel substringIndex. |
closed by |
When
GpuSubstringIndex
, cudf didn't have support for something like this. We filed rapidsai/cudf#5158 and it got implemented in cuDF a long time ago, but we haven't gone back toGpuSubstringIndex
and used the new api :(The current implementation in the plugin relies on a regular expression that takes into account
delim
andcount
, but it doesn't work for delimiters that are multi character for example, and we throw in this case (which is good). That said, it would be great to move to the cuDF version of this.Note that we need to make sure have unit and integration tests for this, especially showing that we could support multi character delimiters after the change to
cudf::slice_strings
.The text was updated successfully, but these errors were encountered: