Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ArrowReaderOptions to ParquetFileArrowReader, add option to skip decoding arrow metadata from parquet (#1459) #1558

Merged
merged 3 commits into from
Apr 14, 2022

Conversation

tustvold
Copy link
Contributor

Which issue does this PR close?

Closes #1459
Closes #1557
Part of #1556

Rationale for this change

See tickets

What changes are included in this PR?

See tickets

Are there any user-facing changes?

Yes

Fix inference from null logical type (apache#1557)

Replace some `&Option<T>` with `Option<&T>` (apache#1556)
@github-actions github-actions bot added the parquet Changes to the parquet crate label Apr 13, 2022
/// Returns the key value metadata, returns `None` if [`ArrowReaderOptions::skip_arrow_metadata`]
fn get_kv_metadata(&self) -> Option<&Vec<KeyValue>> {
if self.options.skip_arrow_metadata {
return None;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can't be written without the change in #1556 to move away from &Option<T>

@@ -690,6 +690,8 @@ impl ParquetTypeConverter<'_> {
t.unit
))),
},
// https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#unknown-always-null
(Some(LogicalType::UNKNOWN(_)), _) => Ok(DataType::Null),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the fix for #1557

assert_eq!(schema.field(0), &arrow_field);
}

#[test]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the closest I could get to a test of #1459 as we always write and decode the LogicalType, even when technically PARQUET_1_0 doesn't support it. The nature of thrift means this isn't actually a bug I don't think

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM -- thanks @tustvold

parquet/src/arrow/arrow_reader.rs Outdated Show resolved Hide resolved
@codecov-commenter
Copy link

Codecov Report

Merging #1558 (54755f5) into master (c9549bb) will increase coverage by 0.02%.
The diff coverage is 98.46%.

@@            Coverage Diff             @@
##           master    #1558      +/-   ##
==========================================
+ Coverage   82.83%   82.85%   +0.02%     
==========================================
  Files         190      190              
  Lines       54957    55042      +85     
==========================================
+ Hits        45521    45606      +85     
  Misses       9436     9436              
Impacted Files Coverage Δ
parquet/src/file/serialized_reader.rs 95.65% <ø> (ø)
parquet/src/arrow/arrow_reader.rs 94.48% <98.30%> (+0.69%) ⬆️
parquet/src/arrow/array_reader.rs 87.07% <100.00%> (ø)
parquet/src/arrow/schema.rs 85.71% <100.00%> (+0.03%) ⬆️
parquet/src/file/metadata.rs 94.42% <100.00%> (ø)
parquet/src/encodings/encoding.rs 93.37% <0.00%> (-0.19%) ⬇️
arrow/src/json/reader.rs 84.48% <0.00%> (+0.02%) ⬆️
arrow/src/datatypes/field.rs 54.10% <0.00%> (+0.30%) ⬆️
arrow/src/ipc/reader.rs 88.28% <0.00%> (+0.48%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c9549bb...54755f5. Read the comment docs.

@tustvold tustvold added the api-change Changes to the arrow API label Apr 14, 2022
@tustvold tustvold merged commit d701939 into apache:master Apr 14, 2022
@alamb alamb changed the title Add option to skip decoding arrow metadata from parquet (#1459) Add ArrowReaderOptions to ParquetFileArrowReader, add option to skip decoding arrow metadata from parquet (#1459) Apr 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-change Changes to the arrow API parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Error Infering Schema for LogicalType::UNKNOWN Handle Parquet Files With Inconsistent Timestamp Units
3 participants