Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] Fix incorrect output from averages with filters in partial only mode #612

Merged
merged 1 commit into from
Aug 27, 2020

Conversation

kuhushukla
Copy link
Collaborator

Fixes #155 .

This is WIP for a couple reasons and also as I want to get more comments on the approach shown in the PR.

As the associated issue explains in some detail, how we are ending up with nulls being sent down to the CPU's final aggregation. The fix tries to circumvent this by special casing how averages are handled by the case-when of the filter. One thing to note is, we need this special case since we don't have a clean way to pass the filter down to the GpuAverage DeclarativeAggregate itself. When a null comes into the filter we want it not pass down any further but instead default to (0.0, 0). This seems to be specific for averages while other aggregates like count require nulls. I had to use another case-when with isnotnull to make this happen. I have some concerns on the else condition added in this PR and would like to know what others think is a better way to do this. I tried some games with having initialValues be reused in the filter it was not straight forward what do to with their data types. At the end, I am also open to use this PR to morph the solution to put the hammer on filters+avg+partial_only_conf to fall back on the CPU, but that is not my first preference. Tagging @abellina for some discussion to get this PR going.

@sameerz sameerz added bug Something isn't working SQL part of the SQL/Dataframe plugin labels Aug 26, 2020
revans2
revans2 previously approved these changes Aug 27, 2020
Copy link
Collaborator

@abellina abellina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One smallish comment, otherwise lgtm.

@kuhushukla
Copy link
Collaborator Author

Addressed review comments @abellina and additionally took out a test that was redundant after this fix. Please take a look.

@kuhushukla kuhushukla changed the title [WIP] Fix incorrect output from averages with filters in partial only mode [REVIEW] Fix incorrect output from averages with filters in partial only mode Aug 27, 2020
@kuhushukla
Copy link
Collaborator Author

build

@kuhushukla kuhushukla added the P0 Must have for release label Aug 27, 2020
@kuhushukla kuhushukla added this to the Aug 17 - Aug 28 milestone Aug 27, 2020
abellina
abellina previously approved these changes Aug 27, 2020
@abellina
Copy link
Collaborator

build

@kuhushukla
Copy link
Collaborator Author

kuhushukla commented Aug 27, 2020

Waiting for approvals after which, I will rebase-squash and sign off.

@kuhushukla
Copy link
Collaborator Author

Squashed and signed off, re-runing CI

@kuhushukla
Copy link
Collaborator Author

build

@kuhushukla kuhushukla merged commit a15b228 into NVIDIA:branch-0.2 Aug 27, 2020
nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021
tgravescs pushed a commit to tgravescs/spark-rapids that referenced this pull request Nov 30, 2023
…IDIA#612)

Signed-off-by: spark-rapids automation <[email protected]>

Signed-off-by: spark-rapids automation <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working P0 Must have for release SQL part of the SQL/Dataframe plugin
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Incorrect output from averages with filters in partial only mode
4 participants