-
Notifications
You must be signed in to change notification settings - Fork 839
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide an async
ParquetReader for arrow
#111
Comments
There is a PR in arrow2 for such functionality: jorgecarleitao/arrow2#260 which may serve as an inspiration |
The approach that @jorgecarleitao took in jorgecarleitao/arrow2#260 is quite clever. Rather than a single struct that can read parquet files synchronously and asynchronously, I think he effectively added a second API for reading the required portions of the files into memory buffers and then uses shared encoding/decoding logic with the serialized reader. Thus, one idea for adding async support to the Something like
Here is the current read API: cc @yjshen |
FYI @yjshen @neverchanje , @tustvold has created a proof of concept of a |
Add Sync + Send bounds to parquet crate
Add Sync + Send bounds to parquet crate
Add Sync + Send bounds to parquet crate
Add Sync + Send bounds to parquet crate
Add Sync + Send bounds to parquet crate
Note: migrated from original JIRA: https://issues.apache.org/jira/browse/ARROW-10307
The aim of this issue is to discuss and try to implement async in the Parquet crate for read traits.
It focuses on the read part to limit the complexity and impact of the changes. The design choices should also make sense for the write part.
Related issues:
ARROW-9275 is a more generic and abstract discussion about async. This issue focuses on Parquet read
ARROW-9464 focuses on threading in datafusion but overlaps with this issue when datafusion reads from parquet
The text was updated successfully, but these errors were encountered: