-
Notifications
You must be signed in to change notification settings - Fork 907
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix offset of the string dictionary length stream #8515
Fix offset of the string dictionary length stream #8515
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, may be we can add a comment how rle and non-rle data is composed.
Added a few comments, hopefully they are helpful in understanding the layout. |
rerun tests |
Codecov Report
@@ Coverage Diff @@
## branch-21.08 #8515 +/- ##
===============================================
Coverage ? 82.95%
===============================================
Files ? 110
Lines ? 18181
Branches ? 0
===============================================
Hits ? 15082
Misses ? 3099
Partials ? 0 Continue to review full report at Codecov.
|
@gpucibot merge |
Fixes #8514
String dictionary length is RLE encoded and
rle_data_size
andnon_rle_data_size
take this into account. However, When computing chunk stream offsets, these streams were treated as non-RLE andnon_rle_data_size
was not added. This caused discrepancy between non-RLE stream sizes and available space, leading to overlap between chunk streams.Applied the
non_rle_data_size
to the offset to correct the discrepancy and added a test that uses decimal columns to increase the size of non-RLE encoded data and enable the overflow.