Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[3/n] DataLoader2 initial support for randomness control #801

Closed
wants to merge 1 commit into from

Conversation

ejguan
Copy link
Contributor

@ejguan ejguan commented Sep 29, 2022

Fixes #885

Add the support for DataLoader2 to control randomness over the pipeline:

  • Implement SeedGenerator
    • spawn to generate sub-SeedGenerators for distributed workers
    • generate_seed to generate unique seeds
    • generate_shared_seed to generate distributed shared seeds
  • Change API of ReadingService to take seed generator from DataLoader2. Then, the SeedGenerator of DataLoader2 becomes the source of truth of randomness within the whole data pipeline.

A separate PR will be added for online doc regarding determinism.

Differential Revision: D38947827

Last step for #885

@facebook-github-bot facebook-github-bot added CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported labels Sep 29, 2022
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D38947827

@ejguan ejguan marked this pull request as draft September 29, 2022 20:32
ejguan added a commit to ejguan/data that referenced this pull request Sep 29, 2022
Summary:
Pull Request resolved: pytorch#801

Add the initial support for DataLoader2 to control randomness over the pipeline:
- Implement `SeedGenerator`
- Change API of `ReadingService` to take seed generator from DataLoader2

Differential Revision: D38947827

fbshipit-source-id: 17db1e13fe8685f6b2817f72c0e199edfaf3a3a1
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D38947827

ejguan added a commit to ejguan/data that referenced this pull request Sep 29, 2022
Summary:
Pull Request resolved: pytorch#801

Add the initial support for DataLoader2 to control randomness over the pipeline:
- Implement `SeedGenerator`
- Change API of `ReadingService` to take seed generator from DataLoader2

Differential Revision: D38947827

fbshipit-source-id: 5ae5065ab7aceb35e9f966c3d6bc585eb07c8ba5
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D38947827

@ejguan ejguan marked this pull request as ready for review September 30, 2022 15:07
@ejguan ejguan changed the title DataLoader2 initial support for randomness control [1/n] DataLoader2 initial support for randomness control Sep 30, 2022
@ejguan
Copy link
Contributor Author

ejguan commented Sep 30, 2022

I might need to re-create a new PR via ghexport to support a stack of Diffs.

ejguan added a commit to ejguan/data that referenced this pull request Oct 4, 2022
…vice (pytorch#801)

Summary:
Pull Request resolved: pytorch#801

Add the initial support for DataLoader2 to control randomness over the pipeline:
- Implement `SeedGenerator`
- Change API of `ReadingService` to take seed generator from DataLoader2

Reviewed By: Miiira

Differential Revision: D38947827

fbshipit-source-id: 932cabdf1df5e0feafa44a3d2bc50c290360d323
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D38947827

ejguan added a commit to ejguan/data that referenced this pull request Oct 4, 2022
…vice (pytorch#801)

Summary:
Pull Request resolved: pytorch#801

Add the initial support for DataLoader2 to control randomness over the pipeline:
- Implement `SeedGenerator`
- Change API of `ReadingService` to take seed generator from DataLoader2

Differential Revision: D38947827

fbshipit-source-id: 38cfc46ce3fbda6872a988fa27c072ff80d79c3c
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D38947827

ejguan added a commit to ejguan/data that referenced this pull request Oct 4, 2022
…vice (pytorch#801)

Summary:
Pull Request resolved: pytorch#801

Add the initial support for DataLoader2 to control randomness over the pipeline:
- Implement `SeedGenerator`
- Change API of `ReadingService` to take seed generator from DataLoader2

Differential Revision: D38947827

fbshipit-source-id: fab10a21fecf76e9b5f5c2296fbf930c3af14d2d
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D38947827

ejguan added a commit to ejguan/data that referenced this pull request Oct 5, 2022
…vice (pytorch#801)

Summary:
Pull Request resolved: pytorch#801

Add the initial support for DataLoader2 to control randomness over the pipeline:
- Implement `SeedGenerator`
- Change API of `ReadingService` to take seed generator from DataLoader2

Differential Revision: D38947827

fbshipit-source-id: c3018a408b78dd8d2e2858350edbb762ece10d37
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D38947827

torchdata/dataloader2/random/_philox.py Show resolved Hide resolved
torchdata/dataloader2/random/seed_generator.py Outdated Show resolved Hide resolved
torchdata/dataloader2/reading_service.py Outdated Show resolved Hide resolved
torchdata/dataloader2/reading_service.py Outdated Show resolved Hide resolved
torchdata/dataloader2/dataloader2.py Outdated Show resolved Hide resolved
ejguan added a commit to ejguan/data that referenced this pull request Oct 6, 2022
…vice (pytorch#801)

Summary:
Pull Request resolved: pytorch#801

Add the initial support for DataLoader2 to control randomness over the pipeline:
- Implement `SeedGenerator`
- Change API of `ReadingService` to take seed generator from DataLoader2

Reviewed By: NivekT

Differential Revision: D38947827

fbshipit-source-id: 21761db17cab2f1c9ef89058b6a53f53abe0590f
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D38947827

@ejguan ejguan changed the title [1/n] DataLoader2 initial support for randomness control [3/n] DataLoader2 initial support for randomness control Dec 29, 2022
ejguan added a commit to ejguan/data that referenced this pull request Jan 17, 2023
…vice (pytorch#801)

Summary:
Fixes pytorch#885

Pull Request resolved: pytorch#801

Add the support for DataLoader2 to control randomness over the pipeline:
- Implement `SeedGenerator`
  - `spawn` to generate sub-SeedGenerators for distributed workers
  - `generate_seed` to generate unique seeds
  - `generate_shared_seed` to generate distributed shared seeds
- Change API of `ReadingService` to take seed generator from DataLoader2. Then, the SeedGenerator of `DataLoader2` becomes the source of truth of randomness within the whole data pipeline.

A separate PR will be added for online doc regarding determinism.

Reviewed By: NivekT

Differential Revision: D38947827

fbshipit-source-id: e1a434460b4a5d43461e982debe875808b4241db
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D38947827

@facebook-github-bot
Copy link
Contributor

@ejguan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

ejguan added a commit to ejguan/data that referenced this pull request Jan 17, 2023
Summary:
Fixes pytorch#885

Add the support for DataLoader2 to control randomness over the pipeline:
- Implement SeedGenerator
  - `spawn` to generate sub-SeedGenerators for distributed workers
  - `generate_seed` to generate unique seeds
  - `generate_shared_seed` to generate distributed shared seeds
- Change API of ReadingService to take seed generator from DataLoader2. Then, the SeedGenerator of DataLoader2 becomes the source of truth of randomness within the whole data pipeline.

A separate PR will be added for online doc regarding determinism.

Last step for pytorch#885

Pull Request resolved: pytorch#801

Reviewed By: NivekT

Differential Revision: D38947827

Pulled By: ejguan

fbshipit-source-id: 006bf17cbb51b2d5a39d647ca86401b0483c7812
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D38947827

ejguan added a commit to ejguan/data that referenced this pull request Jan 17, 2023
Summary:
Fixes pytorch#885

Add the support for DataLoader2 to control randomness over the pipeline:
- Implement SeedGenerator
  - `spawn` to generate sub-SeedGenerators for distributed workers
  - `generate_seed` to generate unique seeds
  - `generate_shared_seed` to generate distributed shared seeds
- Change API of ReadingService to take seed generator from DataLoader2. Then, the SeedGenerator of DataLoader2 becomes the source of truth of randomness within the whole data pipeline.

A separate PR will be added for online doc regarding determinism.

Last step for pytorch#885

Pull Request resolved: pytorch#801

Reviewed By: NivekT

Differential Revision: D38947827

Pulled By: ejguan

fbshipit-source-id: b6fa81de133a0613e8c96ce17b136d897ca80201
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D38947827

Summary:
Fixes pytorch#885

Add the support for DataLoader2 to control randomness over the pipeline:
- Implement SeedGenerator
  - `spawn` to generate sub-SeedGenerators for distributed workers
  - `generate_seed` to generate unique seeds
  - `generate_shared_seed` to generate distributed shared seeds
- Change API of ReadingService to take seed generator from DataLoader2. Then, the SeedGenerator of DataLoader2 becomes the source of truth of randomness within the whole data pipeline.

A separate PR will be added for online doc regarding determinism.

Last step for pytorch#885

Pull Request resolved: pytorch#801

Reviewed By: NivekT

Differential Revision: D38947827

Pulled By: ejguan

fbshipit-source-id: 2f852b89cb1d638e1b9222df838786eb8855afa4
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D38947827

@facebook-github-bot
Copy link
Contributor

@ejguan merged this pull request in 38e0d03.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Determinsim about Local shuffle/random_op after sharding_filter
3 participants