You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
alamb
changed the title
Write out page and column statistics when
Create page and column statistics when a parquet file is written in parallel
Sep 18, 2023
Is your feature request related to a problem or challenge?
In #7562 @devinjdangelo added the (really neat) feature to write a single parquet file in parallel.
This feature is enabled by a feature flag (`allow_single_file_parallelism), that defaults to off.
We haven't turned it on by default yet because the resulting parquet files don't have the necessary index structures (bloom filter, column_index, and offset_index) needed for high performance (see details in this conversation https://github.com/apache/arrow-datafusion/pull/7562/files#r1327037733)
Describe the solution you'd like
I would like the created parquet files to have the necessary index structures -- apache/arrow-rs#4823 tracks adding such an API upstream in arrow-rs.
Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: