Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Add s3:// as a supported storage root for batch files. #1070

Closed
jamielxcarter opened this issue Oct 13, 2022 · 2 comments · Fixed by #1087
Closed

Feature: Add s3:// as a supported storage root for batch files. #1070

jamielxcarter opened this issue Oct 13, 2022 · 2 comments · Fixed by #1087

Comments

@jamielxcarter
Copy link
Contributor

Feature scope

Targets (data type handling, batching, SQL object generation, etc.)

Description

Extend the Batch feature to support AWS s3 as a supported storage root.

Other changes that would be good to ship with this feature:

  • Add cleanup_files=true|false option for Batch config on the target side. False is currently the default behavior for Batch messages. Making it configurable would allow users flexibility on whether they want to keep the batch files after processing them.
@aaronsteers aaronsteers changed the title [Feature]: Add s3:// as a supported storage root for batch files. Feature: Add s3:// as a supported storage root for batch files. Oct 13, 2022
@aaronsteers
Copy link
Contributor

aaronsteers commented Oct 13, 2022

@jamielxcarter - Is this a feature you might be interested in contributing?

I just spoke with @edgarrmondragon, and at minimum you'd want to add https://github.com/PyFilesystem/s3fs as a (probably optional or 'extra') python dependency, plus adding some tests.

One possible implementation is to assume that the S3/AWS creds are already discoverable via the underlying S3 libraries (via AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY / AWS_DEFAULT_REGION, or another method) and then just pass the S3 path to the target using the same convention.

The underlying PyFilesystem library does have S3 capabilities - so in theory the minimum implementation just need to auth successfully and have the extra libraries installed. However, if additional args or configs are needing to be sent, then we'd probably need to iterate a bit on how to send those optional args.

@jamielxcarter
Copy link
Contributor Author

@aaronsteers - Yes very much so. I'll open up a PR and start contributing when I can! Thanks for the guidance, I'll start there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants