Skip to content
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.

Added support to read parquet asynchronously #260

Merged
merged 4 commits into from
Aug 11, 2021
Merged

Conversation

jorgecarleitao
Copy link
Owner

@jorgecarleitao jorgecarleitao commented Aug 9, 2021

This adds support to read parquet asynchronously.

New APIs

  • pub async fn read_metadata_async -> Result<FileMetadata>
  • pub async fn get_page_stream -> Result<impl Stream<Item = Result<CompressedDataPage>>
  • pub async fn page_stream_to_array -> Result<Box<dyn Array>>

which allow to read the files' metadata and pages into arrow asynchronously (i.e. via .await). For now this is only implemented for basic types. Following PRs will extend support for the remaining types.

This PR also adds an example demonstrating how to use this against s3.

@codecov
Copy link

codecov bot commented Aug 9, 2021

Codecov Report

Merging #260 (df104cd) into main (331f7ef) will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##             main     #260   +/-   ##
=======================================
  Coverage   77.03%   77.03%           
=======================================
  Files         241      241           
  Lines       20446    20446           
=======================================
  Hits        15750    15750           
  Misses       4696     4696           
Impacted Files Coverage Δ
src/io/parquet/read/binary/basic.rs 81.33% <ø> (ø)
src/io/parquet/read/boolean/basic.rs 88.13% <ø> (ø)
src/io/parquet/read/fixed_size_binary.rs 43.00% <ø> (ø)
src/io/parquet/read/mod.rs 86.08% <ø> (ø)
src/io/parquet/read/primitive/mod.rs 84.37% <ø> (ø)
src/io/parquet/write/utf8/nested.rs 95.65% <ø> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 331f7ef...df104cd. Read the comment docs.

@jorgecarleitao jorgecarleitao changed the title Added support for async parquet read Added support to read parquet asynchronously Aug 9, 2021
@jorgecarleitao jorgecarleitao force-pushed the async_parquet branch 6 times, most recently from a826f27 to bd077fa Compare August 10, 2021 14:56
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
feature A new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant