You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for noticing this @crepererum, it's related to #225, which is blocked by the inability to slice structs and lists correctly.
We want to use the slice facility to limit the row group sizes, i was thinking about this a few hours ago, that maybe we could track the offset and length in parquet instead of relying on arrow.
Describe the bug
This property (that can be set via the
WriterPropertiesBuilder
):arrow-rs/parquet/src/file/properties.rs
Line 99 in 508f25c
can only be retrieved using this getter:
arrow-rs/parquet/src/file/properties.rs
Lines 132 to 135 in 508f25c
but this getter is never used. In fact quickly trying out this property has no effect. I think it should probably we wired up here:
arrow-rs/parquet/src/arrow/arrow_writer.rs
Lines 80 to 101 in 508f25c
where the incoming
RecordBatch
is split into batches of the configured size that will then fed into individual record batches.To Reproduce
Steps to reproduce the behavior:
RecordBatch
with 3 rows.WriterProperties.max_row_group_size
to 1Expected behavior
Record batches created from arrow should respect
WriterProperties.max_row_group_size
.Additional context
Commit in question is
508f25c10032857da34ea88cc8166f0741616a32
.The text was updated successfully, but these errors were encountered: