-
Notifications
You must be signed in to change notification settings - Fork 902
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix struct flattening to add a validity column only when the input column has null element #8374
Fix struct flattening to add a validity column only when the input column has null element #8374
Conversation
Locally this fixes the issue reported by @gerashegalov |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
@gpucibot merge |
Can you add a test case (ok if in another PR) so we don't defer to spark-rapids to catch these issues |
Sure. But since we are about to do code freeze, can that test be targeted to branch 21.08? |
I think it's reasonable since @abellina confirmed the fix with a Spark test run |
Codecov Report
@@ Coverage Diff @@
## branch-21.06 #8374 +/- ##
===============================================
Coverage ? 82.86%
===============================================
Files ? 106
Lines ? 17874
Branches ? 0
===============================================
Hits ? 14811
Misses ? 3063
Partials ? 0 Continue to review full report at Codecov.
|
This PR adds a simple test case for struct binary search. In this test, we do binary search for 2 structs columns in which one column has a bit mask (but no null element) while the other column does not have any bit mask. Reference: #8374 | #8187 Authors: - Nghia Truong (https://github.com/ttnghia) Approvers: - Conor Hoekstra (https://github.com/codereport) URL: #8396
Currently, struct flattening adds a validity column when the input column has a null mask (by calling to
nullable()
). In the situation when comparing two structs columns having no null but one column has a null mask, flattening them will result in two tables with different numbers of columns.This PR fix that problem by using
has_nulls()
instead ofnullable()
. As a result, the validity column will be added to the flattening result only when the input structs column has null.Note that when comparing two structs columns in which one column has null while the other doesn't, we must check for (nested) null existence and pass in
column_nullability::FORCE
for flattening both columns. This makes sure the flattening results are tables having the same number of columns.Closes #8187.