-
Notifications
You must be signed in to change notification settings - Fork 902
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix segmented_gather()
for null LIST rows
#9537
Fix segmented_gather()
for null LIST rows
#9537
Conversation
`segmented_gather()` currently assumes that null LIST rows also have a `0` size (as defined by the difference of adjacent offsets.) This might not hold, for example, for LIST columns that are members of STRUCT columns whose parent null masks are superimposed on its children. This would cause a non-empty list row to be marked null, without compaction. This leads to errors in fetching elements of a list row as seen in NVIDIA/spark-rapids/pull/3770. This commit adds the handling of uncompacted LIST rows in `segmented_gather()`.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Codecov Report
@@ Coverage Diff @@
## branch-21.12 #9537 +/- ##
================================================
- Coverage 10.79% 10.66% -0.13%
================================================
Files 116 117 +1
Lines 18869 19735 +866
================================================
+ Hits 2036 2104 +68
- Misses 16833 17631 +798
Continue to review full report at Codecov.
|
Rerun tests |
Could you run this with
Same for the extract tests too. |
Thank you for suggesting the
|
rerun tests |
Rerun tests |
rerun tests |
@gpucibot merge |
@davidwendt, @codereport, @ttnghia: Thank you for the reviews. I have merged this change. |
segmented_gather()
currently assumes that null LIST rows also havea
0
size (as defined by the difference of adjacent offsets.)This might not hold, for example, for LIST columns that are members
of STRUCT columns whose parent null masks are superimposed on its
children. This would cause a non-empty list row to be marked null,
without compaction. This leads to errors in fetching elements of a
list row as seen in NVIDIA/spark-rapids/pull/3770.
This commit adds the handling of uncompacted LIST rows in
segmented_gather()
.