Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DataLoader2] Saving and restoring initial seed generator #1123

Closed
wants to merge 1 commit into from

Conversation

NivekT
Copy link
Contributor

@NivekT NivekT commented Apr 6, 2023

Stack from ghstack:

Reland of #998 with added guard while loading randomness state in DataLoader2 for backward compatibility

Changes to DataLoader2:

  • Modifying state_dict to store randomness_state, which includes:
    • _seed: int
    • _reset_seed: bool - flag indicating whether _seed needs to be set
    • _seed_generator - the latest version at the time when state_dict is called
    • _initial_seed_generator - the versopm that is saved at the beginning of very epoch
  • Modifying from_state and load_state_dict to restore randomness_state
  • Adding a method _restore_checkpoint_beginning_of_epoch
    • This sets self._seed_generator = self._initial_seed_generator, allowing users to re-create an epoch from the beginning.

Considerations

Storing the randomness states provide more flexibility for users to restore as they see fit. The decision to do that should not be controversial.

I decided to make add a new method for checkpointing at the beginning of the epoch, ensure that users are not confused about what randomness is restored by default.

The basic idea is that we want to allow users to restore dl2._seed_generator to the previously saved version. From that point on, they can create a new __iter__ and continue from the beginning of the epoch.

  • Note that since _seed and _reset_seed are also saved, if the users were planning to use a different seed or if there was a need to re-seed, those remain valid after restoring the checkpoint.
  • Finally, if users change their mind at any point (after restoring) and want to manual set seed. That seed will override any other behavior and the seed will be used.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 6, 2023
@NivekT
Copy link
Contributor Author

NivekT commented Apr 6, 2023

Closing as I have squashed two PRs

@NivekT NivekT closed this Apr 6, 2023
@facebook-github-bot facebook-github-bot deleted the gh/NivekT/118/head branch May 6, 2023 14:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants