-
Notifications
You must be signed in to change notification settings - Fork 784
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add column index writer for parquet #1935
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1935 +/- ##
==========================================
+ Coverage 83.41% 83.54% +0.12%
==========================================
Files 214 221 +7
Lines 57004 57395 +391
==========================================
+ Hits 47551 47949 +398
+ Misses 9453 9446 -7
Continue to review full report at Codecov.
|
04f164a
to
e705846
Compare
735f4ed
to
43512f5
Compare
ea799a9
to
798af4f
Compare
798af4f
to
f76ff65
Compare
@Ted-Jiang PTAL |
Do we need to add an option to control this feature in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking very nice, I think it would be good to maintain a clearer separation between file-level metadata and the index metadata. Mixing the two not only leads to the mutability issues you've run into, but it also can make it hard to reason about what fields are populated when. Perhaps we could have something like a ColumnChunkIndex
or something?
cf7cbfc
to
ffa3d68
Compare
@tustvold reset, PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love it, some minor nits and then this can go in 😄
ffa3d68
to
a5faeb6
Compare
null_pages: Vec<bool>, | ||
min_values: Vec<Vec<u8>>, | ||
max_values: Vec<Vec<u8>>, | ||
// TODO: calc the order for all pages in this column |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 this is useful for checking whether use page index
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 this is useful for checking whether use page index
If the data in the pages are ordered by ascend or descend, we can use the binary search
to accelerate the page filter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the boundaryorder is UNORDERED, we need to filter the page one by one.
Thank you 🥇 |
Which issue does this PR close?
part of #1777
Rationale for this change
What changes are included in this PR?
Are there any user-facing changes?