Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test ORC predicate pushdown (PPD) with timestamps decimals booleans #9068

Merged
merged 6 commits into from
Aug 22, 2023

Conversation

thirtiseven
Copy link
Collaborator

Closes #8823

Corresponding tests in Spark

Note: The test "Support for pushing down filters for timestamp types" will fail now. I think plugin is not support orc PPD with timestamps, don't know if it is a known issue. Will do some investigation on it.

@revans2
Copy link
Collaborator

revans2 commented Aug 17, 2023

This is a really scary failure. I did some debugging and it looks like we are not writing out the proper metrics for predicate push down to work. If I switch the write to the CPU it works.

Could we extend the tests so that we do

  1. write on CPU read on GPU.
  2. write on GPU read on CPU.
  3. write on GPU read on GPU

@revans2
Copy link
Collaborator

revans2 commented Aug 17, 2023

I filed rapidsai/cudf#13899 to fix the timestamp issue. I think for now we either mark it as xfail or comment it out and point to this issue. I dug into it because I was scared that we could have data corruption, but this is just a performance issue, so we are okay.

@sameerz sameerz added the test Only impacts tests label Aug 17, 2023
Signed-off-by: Haoyang Li <[email protected]>
@thirtiseven
Copy link
Collaborator Author

Thanks @revans2 !

I verified that the tests only fail on files written by GPU and comment out failed cases.

Signed-off-by: Haoyang Li <[email protected]>
@thirtiseven
Copy link
Collaborator Author

build

@thirtiseven thirtiseven marked this pull request as ready for review August 18, 2023 10:09
@thirtiseven thirtiseven self-assigned this Aug 18, 2023
revans2
revans2 previously approved these changes Aug 18, 2023
Copy link
Collaborator

@revans2 revans2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have a follow on issue to enable the tests? and also one to update all of the tests to do what I described in #9068 (comment)

Signed-off-by: Haoyang Li <[email protected]>
@thirtiseven
Copy link
Collaborator Author

Filed follow on issue #9075 to enable the failed cases and updated other cases to also test CPU write GPU read and GPU write CPU read.

Signed-off-by: Haoyang Li <[email protected]>
@thirtiseven
Copy link
Collaborator Author

thirtiseven commented Aug 22, 2023

Decimal also has a similar issue, and decimal tests also fail on files written by the GPU.

I filed rapidsai/cudf#13933 in cuDF and updated the decimal tests and issue #9075 .

@thirtiseven
Copy link
Collaborator Author

build

@thirtiseven thirtiseven merged commit 3c399be into NVIDIA:branch-23.10 Aug 22, 2023
26 of 27 checks passed
@thirtiseven thirtiseven deleted the orc_ppd_test branch August 22, 2023 15:16
mythrocks pushed a commit to mythrocks/spark-rapids that referenced this pull request Aug 24, 2023
…VIDIA#9068)

* Test predicate pushdown (PPD) with timestamps decimals booleans

Signed-off-by: Haoyang Li <[email protected]>

* extend the timestamp test

Signed-off-by: Haoyang Li <[email protected]>

* style fix

Signed-off-by: Haoyang Li <[email protected]>

* update other cases

Signed-off-by: Haoyang Li <[email protected]>

* style fix

Signed-off-by: Haoyang Li <[email protected]>

* comment out two decimal tests

Signed-off-by: Haoyang Li <[email protected]>

---------

Signed-off-by: Haoyang Li <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
test Only impacts tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Test predicate pushdown (PPD) with timestamps, decimals, booleans, etc. Refer to OrcQuerySuite.scala#L464.
3 participants