Determinsim about Local shuffle/random_op after `sharding_filter` #885

ejguan · 2022-11-04T19:22:29Z

🐛 Describe the bug

Current state of determinism

Using DataLoader2 + PrototypeMultiProcessingReadingService as an example:

Before each iteration starts, a distributed shared seed will be generated (link)
With multiprocessing, each subprocess will reset all of shuffle operations to the same random seeds at the beginning of each iteration based on the distributed shared seed in step 1. (link)
And, torch, numpy and python.random will get a different process-local seeds for each subprocess (link)

Additional feature

For the step 2 in the last section, we set the same shuffle seed across distribtued/mp workers because we want to make sure the shuffled data can be sharded in a mutually exclusive and collectively exhaustive manner.
An additional feature is needed to make sure all random operations after sharding_filter having the different seeds across workers to preserve fully data randomization.

Let's say we have a pipeline as:

data_source.shuffle().sharding_filter().map(fn).batch(8).shuffle()

We will have the random state shared for the first shuffle, but different states for the second shuffle. And, those states should be generated in a deterministic manner so we will be able to reproduce it.

Versions

main branch

cc: @msaroufim @VitalyFedyunin

The text was updated successfully, but these errors were encountered:

Summary: Add a `list_dps` function to list `DataPipes` from the graph. - It's similar to [`get_all_graph_pipes `](https://github.com/pytorch/pytorch/blob/896fa8c5c9b0191c9621e04ab5e20057614d48ad/torch/utils/data/graph_settings.py#L19) from pytorch core - An extra argument of `exclude_dps` to exclude the `DataPipe` and its prior graph from the result. Reason to add this function: - It's required to set random states differently for DataPipe before/after `sharding_filter` ```py graph = traverse_dps(datapipe) sf_dps = find_dps(graph, ShardingFilter) # DataPipes prior to `sharding_filter` p_dps = [] for sf_dp in sf_dps: p_dps.extend(list_dps(traverse_dps(sf_dp))) # DataPipes after `sharding_filter` a_dps = list_dps(graph, exclude_dps=sf_dps) ``` Step 1 for #885 Pull Request resolved: #888 Reviewed By: VitalyFedyunin, NivekT Differential Revision: D41099171 Pulled By: ejguan fbshipit-source-id: d9d6e7beb498fea3921d8a3a1020649dd3955ce2

…vice (pytorch#801) Summary: Fixes pytorch#885 Pull Request resolved: pytorch#801 Add the support for DataLoader2 to control randomness over the pipeline: - Implement `SeedGenerator` - `spawn` to generate sub-SeedGenerators for distributed workers - `generate_seed` to generate unique seeds - `generate_shared_seed` to generate distributed shared seeds - Change API of `ReadingService` to take seed generator from DataLoader2. Then, the SeedGenerator of `DataLoader2` becomes the source of truth of randomness within the whole data pipeline. A separate PR will be added for online doc regarding determinism. Reviewed By: NivekT Differential Revision: D38947827 fbshipit-source-id: e1a434460b4a5d43461e982debe875808b4241db

Summary: Fixes pytorch#885 Add the support for DataLoader2 to control randomness over the pipeline: - Implement SeedGenerator - `spawn` to generate sub-SeedGenerators for distributed workers - `generate_seed` to generate unique seeds - `generate_shared_seed` to generate distributed shared seeds - Change API of ReadingService to take seed generator from DataLoader2. Then, the SeedGenerator of DataLoader2 becomes the source of truth of randomness within the whole data pipeline. A separate PR will be added for online doc regarding determinism. Last step for pytorch#885 Pull Request resolved: pytorch#801 Reviewed By: NivekT Differential Revision: D38947827 Pulled By: ejguan fbshipit-source-id: 006bf17cbb51b2d5a39d647ca86401b0483c7812

Summary: Fixes pytorch#885 Add the support for DataLoader2 to control randomness over the pipeline: - Implement SeedGenerator - `spawn` to generate sub-SeedGenerators for distributed workers - `generate_seed` to generate unique seeds - `generate_shared_seed` to generate distributed shared seeds - Change API of ReadingService to take seed generator from DataLoader2. Then, the SeedGenerator of DataLoader2 becomes the source of truth of randomness within the whole data pipeline. A separate PR will be added for online doc regarding determinism. Last step for pytorch#885 Pull Request resolved: pytorch#801 Reviewed By: NivekT Differential Revision: D38947827 Pulled By: ejguan fbshipit-source-id: b6fa81de133a0613e8c96ce17b136d897ca80201

Summary: Fixes pytorch#885 Add the support for DataLoader2 to control randomness over the pipeline: - Implement SeedGenerator - `spawn` to generate sub-SeedGenerators for distributed workers - `generate_seed` to generate unique seeds - `generate_shared_seed` to generate distributed shared seeds - Change API of ReadingService to take seed generator from DataLoader2. Then, the SeedGenerator of DataLoader2 becomes the source of truth of randomness within the whole data pipeline. A separate PR will be added for online doc regarding determinism. Last step for pytorch#885 Pull Request resolved: pytorch#801 Reviewed By: NivekT Differential Revision: D38947827 Pulled By: ejguan fbshipit-source-id: 2f852b89cb1d638e1b9222df838786eb8855afa4

ejguan mentioned this issue Nov 7, 2022

[1/n] Add graph function to list DataPipes from graph #888

Closed

ejguan mentioned this issue Dec 14, 2022

[2/n] Add setting function to set seeds to the graph #894

Closed

ejguan mentioned this issue Dec 29, 2022

[3/n] DataLoader2 initial support for randomness control #801

Closed

facebook-github-bot closed this as completed in 38e0d03 Jan 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Determinsim about Local shuffle/random_op after `sharding_filter` #885

Determinsim about Local shuffle/random_op after `sharding_filter` #885

ejguan commented Nov 4, 2022 •

edited

Loading

Determinsim about Local shuffle/random_op after sharding_filter #885

Determinsim about Local shuffle/random_op after sharding_filter #885

Comments

ejguan commented Nov 4, 2022 • edited Loading

🐛 Describe the bug

Current state of determinism

Additional feature

Versions

Determinsim about Local shuffle/random_op after `sharding_filter` #885

Determinsim about Local shuffle/random_op after `sharding_filter` #885

ejguan commented Nov 4, 2022 •

edited

Loading