-
Notifications
You must be signed in to change notification settings - Fork 908
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] provide a method to check "if-contains-key" for a map column #8120
Comments
@sameerz can you help clarify this issue? |
Hi @harrism , this is a use of Spark expression "element_at" from one customer.
For our current implementation based on 3.0.0 version:
So the goal for this issue is to match the 3.1.1 Spark behaviour -- which requires the ability to detect if the key exists or not. I'm new to cuDF, and not familiar with the whole API situation, I only found the Normally I can say, when I got a null, I throw exception. However, Spark allows a map like {"a": null}. So when I've got a "null" in my hand, I don't know if I should throw an exception, because this null can also be a value for the key. Appreciate for your suggestions. |
libcudf doesn't have a notion of a "map" column. I don't know how the current Java Seems like the cudf/cpp/include/cudf/search.hpp Line 138 in 611cabd
|
Hi @nvdbaranec , I looked more into the I am trying to extract part of it and make this part a new JNI method like "mapContains". Is this thought logical? I put the draft code in #8209 . Could you help take a look at this and provide more insights. Thanks a lot! |
Added some comments in #8209 |
To close #8120 As required in Spark 3.1.1, when ANSI mode is enabled, GetMapValue should throw an exception when the key is not found in the map in a row. So plugin side needs to check if a map column contains the specific key in all rows. The new added method `mapContains` in this PR should return a column of boolean, where _false_ means key is not found. Authors: - Allen Xu (https://github.com/wjxiz1992) Approvers: - Jason Lowe (https://github.com/jlowe) URL: #8209
Is your feature request related to a problem? Please describe.
Spark 3.1.1 introduced ansi_mode operation for
GetMapValue
and[ElementAt]
.When dealing with this case, I need to know if the map contains the specific key. This is needed for NVIDIA/spark-rapids#2272
Current cuDF API
map_lookup
will return "null" when the key is not found in the map. And this will cause a confusing result, when the map actually contains the specific key but this key maps to a value of "null"(e.g. {"a": null}). I cannot know what the "null" means when I got a ColumnVector that contains "null"Describe the solution you'd like
provide a method like
map_contains
that will return a column vector containingbool
values for each row.Describe alternatives you've considered
Any solution helps me identify the meaning of
null
is fine.The text was updated successfully, but these errors were encountered: