Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Parquet Reader's Arrow Schema Inference #1682

Merged
merged 8 commits into from
May 13, 2022
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
848 changes: 234 additions & 614 deletions parquet/src/arrow/array_reader/builder.rs

Large diffs are not rendered by default.

13 changes: 11 additions & 2 deletions parquet/src/arrow/array_reader/list_array.rs
Original file line number Diff line number Diff line change
Expand Up @@ -584,16 +584,25 @@ mod tests {

let mut array_reader = build_array_reader(
file_reader.metadata().file_metadata().schema_descr_ptr(),
Arc::new(arrow_schema.clone()),
Arc::new(arrow_schema),
vec![0usize].into_iter(),
Box::new(file_reader),
)
.unwrap();

let batch = array_reader.next_batch(100).unwrap();
assert_eq!(batch.data_type(), array_reader.get_data_type());
assert_eq!(
batch.data_type(),
&ArrowType::Struct(arrow_schema.fields().clone())
&ArrowType::Struct(vec![Field::new(
"table_info",
ArrowType::List(Box::new(Field::new(
"table_info",
ArrowType::Struct(vec![Field::new("name", ArrowType::Binary, false)]),
false
))),
false
)])
);
assert_eq!(batch.len(), 0);
}
Expand Down
35 changes: 35 additions & 0 deletions parquet/src/arrow/arrow_reader.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1050,6 +1050,41 @@ mod tests {
for batch in record_batch_reader {
batch.unwrap();
}

let projected_reader = arrow_reader
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a test for #1654 and #1652

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let projected_reader = arrow_reader
// Test for https://github.com/apache/arrow-rs/issues/1654 and
// https://github.com/apache/arrow-rs/issues/1652
let projected_reader = arrow_reader

.get_record_reader_by_columns(vec![3, 8, 10], 60)
.unwrap();
let projected_schema = arrow_reader
.get_schema_by_columns(vec![3, 8, 10], true)
.unwrap();

let expected_schema = Schema::new(vec![
Field::new(
"roll_num",
ArrowDataType::Struct(vec![Field::new(
"count",
ArrowDataType::UInt64,
false,
)]),
false,
),
Field::new(
"PC_CUR",
ArrowDataType::Struct(vec![
Field::new("mean", ArrowDataType::Int64, false),
Field::new("sum", ArrowDataType::Int64, false),
]),
false,
),
]);

assert_eq!(projected_reader.schema().as_ref(), &projected_schema);
assert_eq!(expected_schema, projected_schema);

for batch in projected_reader {
let batch = batch.unwrap();
assert_eq!(batch.schema().as_ref(), &projected_schema);
}
}

#[test]
Expand Down
2 changes: 1 addition & 1 deletion parquet/src/arrow/arrow_writer.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1058,7 +1058,7 @@ mod tests {
let stocks_field = Field::new(
"stocks",
DataType::Map(
Box::new(Field::new("entries", entries_struct_type, false)),
Box::new(Field::new("entries", entries_struct_type, true)),
tustvold marked this conversation as resolved.
Show resolved Hide resolved
false,
),
true,
Expand Down
Loading