Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a masks option to filter files in s3 datapipe #880

Closed
wants to merge 1 commit into from

Conversation

sebathomas
Copy link
Contributor

Add a new option to the constructor of S3FileListerIterDataPipe that allows to filter the list of files with a pattern, using the existing filter function match_masks.

I added a unit test for the s3 datapipe and I tested it on my machine with a real S3 bucket.

Fixes #737.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 4, 2022
@sebathomas sebathomas marked this pull request as ready for review November 4, 2022 10:37
Copy link
Contributor

@NivekT NivekT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation looks fine. Do we want to add an extra argument to this DataPipe?

@ejguan WDYT?

Copy link
Contributor

@ejguan ejguan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you

@facebook-github-bot
Copy link
Contributor

@NivekT has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@NivekT
Copy link
Contributor

NivekT commented Nov 4, 2022

We will need to update some things to make the test compatible internally. We will keep you posted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support file mask on list_files_by_s3 like list_files
4 participants