Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS, GCP: Allow access to underlying storage client in FileIO #8208

Merged
merged 1 commit into from
Aug 13, 2023

Conversation

bryanck
Copy link
Contributor

@bryanck bryanck commented Aug 2, 2023

This PR makes the client() methods in S3FileIO and GCSFileIO public, which allows more flexibility in using a file IO instance for storage operations that might not be defined in the interface.

@rdblue
Copy link
Contributor

rdblue commented Aug 2, 2023

I think this is a good idea. For context, we've been talking about how to use FileIO for things that it isn't intended. We want to use FileIO because it is already configured to work with tables and credentials, but it purposely doesn't expose a filesystem like view. The solution is to be able to get the client that is configured and use it for more custom operations. I think it's reasonable to make the client available like this.

+1 from me, but I'll leave this open for a while so others can comment. FYI @jackye1995, @amogh-jahagirdar.

@bryanck bryanck changed the title AWS, GCS: Allow access to underlying storage client in FileIO AWS, GCP: Allow access to underlying storage client in FileIO Aug 2, 2023
@rdblue rdblue merged commit 36ca2ff into apache:master Aug 13, 2023
@rdblue
Copy link
Contributor

rdblue commented Aug 13, 2023

Thanks, @bryanck!

nastra pushed a commit to nastra/iceberg that referenced this pull request Aug 15, 2023
* Spark: Update antlr4 to match Spark 3.4 (apache#7824)

* Parquet: Revert workaround for resource usage with zstd (apache#7834)

* GCP: fix single byte read in GCSInputStream (apache#8071)

* GCP: fix byte read in GCSInputStream

* add test

* Parquet: Cache codecs by name and level (apache#8182)

* GCP: Add prefix and bulk operations to GCSFileIO (apache#8168)

* AWS, GCS: Allow access to underlying storage client (apache#8208)

* spotless
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants