Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate Page IO from Page Decode #1810

Merged
merged 1 commit into from
Jun 13, 2022

Conversation

tustvold
Copy link
Contributor

@tustvold tustvold commented Jun 7, 2022

Which issue does this PR close?

Part of #1163

Rationale for this change

This is part of reworking the read path to use less custom abstractions, in particular it splits reading the page bytes from interpreting those bytes.

What changes are included in this PR?

Splits the decode logic out of SerializedPageReader into free functions, and then uses this within ParquetRecordBatchStream to elide a copy.

Are there any user-facing changes?

No

@github-actions github-actions bot added the parquet Changes to the parquet crate label Jun 7, 2022
@codecov-commenter
Copy link

codecov-commenter commented Jun 7, 2022

Codecov Report

Merging #1810 (17412bf) into master (1d31d30) will decrease coverage by 0.04%.
The diff coverage is 61.00%.

❗ Current head 17412bf differs from pull request most recent head dfedade. Consider uploading reports for the commit dfedade to get more accurate results

@@            Coverage Diff             @@
##           master    #1810      +/-   ##
==========================================
- Coverage   83.44%   83.40%   -0.05%     
==========================================
  Files         199      199              
  Lines       56651    56690      +39     
==========================================
+ Hits        47272    47281       +9     
- Misses       9379     9409      +30     
Impacted Files Coverage Δ
parquet/src/arrow/async_reader.rs 0.00% <0.00%> (ø)
parquet/src/file/serialized_reader.rs 94.95% <88.40%> (+0.49%) ⬆️
arrow/src/datatypes/datatype.rs 65.42% <0.00%> (-0.38%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1d31d30...dfedade. Read the comment docs.

@tustvold tustvold force-pushed the zero-copy-page-decode branch from fd6dfa4 to dfedade Compare June 8, 2022 08:58
@tustvold tustvold marked this pull request as ready for review June 8, 2022 08:58
Copy link
Contributor

@nevi-me nevi-me left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tustvold tustvold merged commit fcf655e into apache:master Jun 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants