Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update tutorial about shuffling before sharding #709

Closed
ejguan opened this issue Aug 2, 2022 · 2 comments
Closed

Update tutorial about shuffling before sharding #709

ejguan opened this issue Aug 2, 2022 · 2 comments
Assignees
Labels
documentation Improvements or additions to documentation

Comments

@ejguan
Copy link
Contributor

ejguan commented Aug 2, 2022

📚 The doc issue

The tutorial needs to update the actual reason about shuffling before sharding. It's not accurate.
Shuffling before sharding is required to achieve global shuffling rather than only shuffling inside each shard.

Suggest a potential alternative/fix

No response

@ejguan ejguan self-assigned this Aug 2, 2022
@NivekT
Copy link
Contributor

NivekT commented Aug 2, 2022

As in the following sentence is inaccurate?

In order for DataPipe sharding to work with DataLoader, we need to add the following. It is crucial to add ShardingFilter after Shuffler to ensure that all worker processes have the same order of data for sharding.

@ejguan
Copy link
Contributor Author

ejguan commented Aug 2, 2022

Without shuffler, each worker process has the same order of data as well. So, it's not the essential reason.

@NivekT NivekT added topic: documentation topic category documentation Improvements or additions to documentation and removed topic: documentation topic category labels Aug 3, 2022
ejguan added a commit to ejguan/data that referenced this issue Aug 5, 2022
Summary:
Fixes pytorch#709

Per title

Pull Request resolved: pytorch#715

Reviewed By: NivekT

Differential Revision: D38432061

Pulled By: ejguan

fbshipit-source-id: a8853a86efa9ca7ed6a9e76f0d51470d34513f48
ejguan added a commit that referenced this issue Aug 5, 2022
Summary:
Fixes #709

Per title

Pull Request resolved: #715

Reviewed By: NivekT

Differential Revision: D38432061

Pulled By: ejguan

fbshipit-source-id: a8853a86efa9ca7ed6a9e76f0d51470d34513f48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants