Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GpuMapConcat support for nested-type values #5686

Merged
merged 10 commits into from
Jun 25, 2022

Conversation

HaoYang670
Copy link
Collaborator

Signed-off-by: remzi [email protected]
Re #5559.

Changes in this PR:

  1. Support concatenating map columns with nested-type values (including Array, Struct, and Map)
  2. Simplify the GpuMapConcat function by concatenating the map columns directly.
  3. Add nested-type data gen into map_gens_sample

simplify
shrink the test size

Signed-off-by: remzi <[email protected]>
@HaoYang670
Copy link
Collaborator Author

build

@sameerz sameerz added the feature request New feature or request label May 31, 2022
Signed-off-by: remzi <[email protected]>
@HaoYang670
Copy link
Collaborator Author

build

@HaoYang670 HaoYang670 self-assigned this Jun 13, 2022
@HaoYang670
Copy link
Collaborator Author

build

jlowe
jlowe previously approved these changes Jun 13, 2022
TypeSig.MAP.nested(TypeSig.all),
repeatingParamCheck = Some(RepeatingParamCheck("input",
TypeSig.MAP.nested(TypeSig.commonCudfTypes + TypeSig.DECIMAL_128 +
TypeSig.NULL),
TypeSig.NULL + TypeSig.ARRAY + TypeSig.STRUCT + TypeSig.MAP),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem here is that the checks don't distinguish between keys and values in a map. This check is adding nested support for both keys and values, but the code can only support nested types for values, not keys.

This comment was marked as resolved.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated!

Copy link
Collaborator Author

@HaoYang670 HaoYang670 Jun 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know how could we test the nested key type cases, as MapGen does not support ArrayGen as the keys. (List in Python is unhashable)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a psNote to the param checks so it is clearly documented that keys don't support arrays? As for the tests, when we do support arrays for map keys we can write the tests in scala or we can work around it by creating a list of key/value structs and then converting it to a map, do the computation, and then convert the resulting map back to a list of key/value structs after we are done. I think a scala test is preferable but either would work.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated! And I have added 2 cases to test the falling back.

@HaoYang670
Copy link
Collaborator Author

build

revans2
revans2 previously approved these changes Jun 14, 2022
Copy link
Collaborator

@revans2 revans2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a nit for documentation

TypeSig.MAP.nested(TypeSig.all),
repeatingParamCheck = Some(RepeatingParamCheck("input",
TypeSig.MAP.nested(TypeSig.commonCudfTypes + TypeSig.DECIMAL_128 +
TypeSig.NULL),
TypeSig.NULL + TypeSig.ARRAY + TypeSig.STRUCT + TypeSig.MAP),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a psNote to the param checks so it is clearly documented that keys don't support arrays? As for the tests, when we do support arrays for map keys we can write the tests in scala or we can work around it by creating a list of key/value structs and then converting it to a map, do the computation, and then convert the resulting map back to a list of key/value structs after we are done. I think a scala test is preferable but either would work.

Signed-off-by: remzi <[email protected]>
Signed-off-by: remzi <[email protected]>
@HaoYang670 HaoYang670 requested a review from revans2 June 15, 2022 07:55
@HaoYang670
Copy link
Collaborator Author

build

@HaoYang670
Copy link
Collaborator Author

There is a weird failure: https://github.com/NVIDIA/spark-rapids/runs/6897078626?check_suite_focus=true#step:2:28678

 pyspark.sql.utils.AnalysisException: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rwxr-xr-x

@revans2
Copy link
Collaborator

revans2 commented Jun 15, 2022

build

@revans2
Copy link
Collaborator

revans2 commented Jun 15, 2022

That is a really odd error. I think it has to be a transient error, but I am not sure on it.

@HaoYang670 HaoYang670 requested a review from jlowe June 21, 2022 12:28
@HaoYang670
Copy link
Collaborator Author

@revans2 @jlowe @ttnghia Could you please help to review? Thanks a lot!.

revans2
revans2 previously approved these changes Jun 21, 2022
Copy link
Collaborator

@revans2 revans2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: would be nice to have a partial support tag on the param checks, but that is minor.

Signed-off-by: remzi <[email protected]>
@HaoYang670
Copy link
Collaborator Author

build

@HaoYang670
Copy link
Collaborator Author

nit: would be nice to have a partial support tag on the param checks, but that is minor.

Hi @revans2, sorry I am a little confused about what the partial support tag is. Is it the function tagExprForGpu?

@jlowe
Copy link
Member

jlowe commented Jun 22, 2022

I am a little confused about what the partial support tag is.

This is the withPsNote method of TypeSig where we can add a note about something being only partially supported. See callers of this method for some examples.

@HaoYang670 HaoYang670 merged commit 34c1761 into NVIDIA:branch-22.08 Jun 25, 2022
@HaoYang670 HaoYang670 deleted the map_concat_nested_value branch June 25, 2022 02:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants