-
Notifications
You must be signed in to change notification settings - Fork 785
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DISCUSSION] Parquet Metadata Improvements #6129
Comments
Here is a down payment for documentation #6184 |
Thanks @alamb! IMHO the current EDITED: I just saw #6197 which feels relevant to the 1st one. For the last one, there's a Type struct in the codes which seems similar to the C++ Node. I'm currently am not sure how complex or whether it worths the effort to support a field-tree with parent info in the current codes? But a simple way might be maintaining a |
I agree. We are also looking for help with the reading portion -- see comments on #6002 cc @adriangb There is something similar here https://docs.rs/parquet/latest/parquet/arrow/fn.parquet_column.html but adding a real API that handles the field resolution logic for nested fields would be very nice. Perhaps you can file a ticket requesting this feature (I have found clearly worded tickets are very often picked up by people in this community)
I am not familiar with the usecase for finding the parent of a field so I don't have much to add to this |
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
As we work on various features of Parquet metadata it is becoming clear that working with the current code organization is challenging.
I just wanted to write down some of my thoughts about how it all fits together
Here are some challenges:
ParquetMetadataWriter
allow ad-hoc encoding ofParquetMetadata
#6000file::metadata
and the thrift structures informat::metadata
,Describe the solution you'd like
I would like to propose
file::metadata
andformat::metadata
Maybe this is clear to others but it is not to me
Here is how I see the structures involved:
I would like to focus on improving the API for going back/forth between bytes and the
file::metadata
structuresDescribe alternatives you've considered
I think we probably need at least two different APIs:
Reading
[u8]
buffered in memory ( decode_footer and decode_metadata)AsyncReader
or something equivalent (MetadataLoader
is enough / needs some more information)Writing
[u8]
API for encoding/decoding ParquetMetadata with more control #6002)AsyncWriter
perhapsAdditional context
The text was updated successfully, but these errors were encountered: