-
Notifications
You must be signed in to change notification settings - Fork 784
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose SortingColumn
in parquet files
#3103
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we get an end-to-end test of this, i.e. write a parquet file to a Vec<u8>
then read it back and verify the sort column was round-tripped correctly
parquet/src/file/metadata.rs
Outdated
value: Option<Vec<SortingColumnMetaData>>, | ||
) -> Self { | ||
self.sorting_columns = value; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
value: Option<Vec<SortingColumnMetaData>>, | |
) -> Self { | |
self.sorting_columns = value; | |
value: Vec<SortingColumnMetaData>, | |
) -> Self { | |
self.sorting_columns = Some(value); |
This is consistent with set_page_offset
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got a reverse review before. The reason given by the reviewer was that if we remove Option
from signature then the function cannot be used to set None
for this field. I am going to keep this as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Getting close 😄
Thank you 👍 |
Benchmark runs are scheduled for baseline = 8bb2917 and contender = 371ec57. 371ec57 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
* Expose SortColumn from parquet file * fix formatting issues * empty commit * fix PR comments * formatting fix * add parquet round trip test * fix clippy error * update the test based on PR comment Co-authored-by: askoa <askoa@local>
Which issue does this PR close?
Closes #3090
Reading from file:
The function
footer.rs#decode_metadata
readsorting_column
from file. However the functionRowGroupMetaData::from_thrift
was not reading the field intoRowGroupMetaData
. The function was modified to read thesorting_column
intoRowGroupMetaData
arrow-rs/parquet/src/file/footer.rs
Lines 74 to 82 in 430eb84
Writing into file:
The function
format.rs#RowGroup#write_to_out_protocol
writessorting_column
to file. However the functionmetadata.rs#RowGroupMetaData#to_thrift
was not writing the field toRowGroup
. The function was modified to writesorting_column
fromRowGroupMetaData
toRowGroup
arrow-rs/parquet/src/format.rs
Lines 4155 to 4162 in 430eb84