-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
S3 support for Parquet #40
base: master
Are you sure you want to change the base?
Conversation
Any thoughts on the approach @ankane? |
@ankane any interest in accepting a change like this? If so, I can rebase and add some test coverage for the changes. |
I'd love to have this available too! Happy to contribute if help is needed. |
In #37 @ankane mentioned that he was hesitant about pulling in a rust TLS library into the pola.rs build 🤷♀️ I tried to do it in "userspace" on the ruby side but there was a lot of clever stuff around the range queries and pushdown that would have involved quite a lot of surgery to the Magnus bindings so I didn't end up persevering. |
I think a) this lib should be on-par with features as all the other libs so 👍 for this PR |
This looks awesome! Would love if this got rebased to I would use this feature 100%! |
Any advice on the state of this one? @catkins, would you be willing to update this with the current state of the rest of the project (i.e., rebase or merge in main)? |
No updates, but if @ankane is keen on it, I can rebase and add some basic test coverage? |
resolves #37
This is a rough cut of adding the extra required plumbing to work with parquet files in S3, and leveraging the smarts of
object_store
andpolars
to do predicate pushdown and only retrieve the required byte ranges from S3.Example
For the most part, I borrowed the existing patterns from py-polars.
https://github.com/pola-rs/polars/blob/main/py-polars/src/lazyframe.rs/#L258-L313