API to copy an existing RowGroup, including metadata from one parquet file to another #4823
Labels
enhancement
Any new improvement worthy of a entry in the changelog
parquet
Changes to the parquet crate
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
In DataFusion, @devinjdangelo is using the
append_column
API to write parquet files in parallel (apache/datafusion#7562)However, when trying to copy the
RowGroupMetadata
to the API to copy any bloom filters / page offsets, or others is awkwardDescribe the solution you'd like
I would like a way to to call the
append_column
api given aRowGroupMetaData
object from the existing fileIdeally there would be an API that produced a
ColumnCloseResult
from aRowGroupMetaData
or some convenience API that took a reader + RowGroupMetadata from another file and did the necessary copyPerhaps something like
https://docs.rs/parquet/latest/parquet/file/writer/struct.SerializedRowGroupWriter.html#method.append_column
Describe alternatives you've considered
Additional context
The text was updated successfully, but these errors were encountered: