Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parquet related changes for remote read #707

Closed
wants to merge 1 commit into from

Conversation

yjshen
Copy link
Member

@yjshen yjshen commented Aug 22, 2021

No description provided.

@github-actions github-actions bot added the parquet Changes to the parquet crate label Aug 22, 2021
@houqp
Copy link
Member

houqp commented Aug 22, 2021

For those who are interested in this change, there are more discussions and relevant context in this design doc: https://docs.google.com/document/d/1ZEZqvdohrot0ewtTNeaBtqczOIJ1Q0OnX9PqMMxpOF8/edit#

@@ -36,6 +36,9 @@ pub trait TryClone: Sized {
pub trait ParquetReader: Read + Seek + Length + TryClone {}
impl<T: Read + Seek + Length + TryClone> ParquetReader for T {}

pub trait ThreadSafeParquetReader: ParquetReader + Send + Sync + 'static {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative approach, that we could use in DataFusion, would be to implement something like ThreadSafeFileSource

I think the core change that is needed to work with remote s3-like object storages is to make / allow for an async api in the parquet reader.

e.g. #111

Here is one potential way of doing it from @jorgecarleitao 's arrow2 crate: jorgecarleitao/arrow2#260

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The more I think about what we are trying to do in datafusion, the more I am convinced what is needed is some async way to read parquet files - #111

I have some ideas about how to do this -- I will try to write them up / draft them in the next few days

@yjshen
Copy link
Member Author

yjshen commented Aug 31, 2021

Hi @alamb, have you got a chance to work on making parquet reader async?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants