-
Notifications
You must be signed in to change notification settings - Fork 153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DataLoader2] Saving and restoring initial seed generator #998
Conversation
[ghstack-poisoned]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See PR description. I will add some test if the API design makes sense
[ghstack-poisoned]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have two main comments:
SeedGenerator
is pickable. No need to add those supporting functions.- IIUC, initial state should be the state at the beginning of each epoch but not the beginning of all epochs
Yea, that makes sense. I can take those out.
I'm expecting Does that look right to you? |
I am not sure if it's needed to differentiate initial seed and current state of seed generator. After SG seeds the graph, the state of SG is not going to evolve during iteration. All random states should be self-preserved by random DataPipes. During |
Changes to `DataLoader2`: - Modifying `state_dict` to store the `initial_seed_generator` that is saved at the beginning of an epoch. - Modifying `from_state` and `load_state_dict` to restore `initial_seed_generator` if the user sets the parameter to `True` - Within `__iter__, skips over the re-seeding process if no manual seed has been specified AND the `seed_generator` was explicitly restored. --- ### Consideration I decided to make modification to the existing APIs. Alternatively, we can create a new method. The basic idea is that we want to allow users to restore `dl2._seed_generator` to the previously saved version, at the same time, we need to skip over the logic that re-do seeding in `__iter__` (hence the new variable `_skip_iteration_seeding` is needed. I see 2 main scenarios: 1. Users want to restore DataPipe and ReadingService but not the initial state of RNG - I think lots of current users (including some internals) are in this category. - This should work by default because `restore_initial_seed_generator=False` unless user explicitly change it 2. Users actively want to restore DP, RS, and initial state of RNG - Users will need to set an extra variable to `True` and we will make sure `_skip_iteration_seeding=True` so no re-seeding will happen in the first subsequent call of `__iter__` Finally, if users change their mind at any point (after restoring) and want to manual set `seed`. That `seed` will override any other behavior and the `seed` will be used. [ghstack-poisoned]
Changes to `DataLoader2`: - Modifying `state_dict` to store the `initial_seed_generator` that is saved at the beginning of an epoch. - Modifying `from_state` and `load_state_dict` to restore `initial_seed_generator` if the user sets the parameter to `True` - Within `__iter__, skips over the re-seeding process if no manual seed has been specified AND the `seed_generator` was explicitly restored. --- ### Consideration I decided to make modification to the existing APIs. Alternatively, we can create a new method. The basic idea is that we want to allow users to restore `dl2._seed_generator` to the previously saved version, at the same time, we need to skip over the logic that re-do seeding in `__iter__` (hence the new variable `_skip_iteration_seeding` is needed. I see 2 main scenarios: 1. Users want to restore DataPipe and ReadingService but not the initial state of RNG - I think lots of current users (including some internals) are in this category. - This should work by default because `restore_initial_seed_generator=False` unless user explicitly change it 2. Users actively want to restore DP, RS, and initial state of RNG - Users will need to set an extra variable to `True` and we will make sure `_skip_iteration_seeding=True` so no re-seeding will happen in the first subsequent call of `__iter__` Finally, if users change their mind at any point (after restoring) and want to manual set `seed`. That `seed` will override any other behavior and the `seed` will be used. [ghstack-poisoned]
Changes to `DataLoader2`: - Modifying `state_dict` to store the `initial_seed_generator` that is saved at the beginning of an epoch. - Modifying `from_state` and `load_state_dict` to restore `initial_seed_generator` if the user sets the parameter to `True` - Within `__iter__, skips over the re-seeding process if no manual seed has been specified AND the `seed_generator` was explicitly restored. --- ### Consideration I decided to make modification to the existing APIs. Alternatively, we can create a new method. The basic idea is that we want to allow users to restore `dl2._seed_generator` to the previously saved version, at the same time, we need to skip over the logic that re-do seeding in `__iter__` (hence the new variable `_skip_iteration_seeding` is needed. I see 2 main scenarios: 1. Users want to restore DataPipe and ReadingService but not the initial state of RNG - I think lots of current users (including some internals) are in this category. - This should work by default because `restore_initial_seed_generator=False` unless user explicitly change it 2. Users actively want to restore DP, RS, and initial state of RNG - Users will need to set an extra variable to `True` and we will make sure `_skip_iteration_seeding=True` so no re-seeding will happen in the first subsequent call of `__iter__` Finally, if users change their mind at any point (after restoring) and want to manual set `seed`. That `seed` will override any other behavior and the `seed` will be used. [ghstack-poisoned]
Changes to `DataLoader2`: - Modifying `state_dict` to store the `initial_seed_generator` that is saved at the beginning of an epoch. - Modifying `from_state` and `load_state_dict` to restore `initial_seed_generator` if the user sets the parameter to `True` - Within `__iter__, skips over the re-seeding process if no manual seed has been specified AND the `seed_generator` was explicitly restored. --- ### Consideration I decided to make modification to the existing APIs. Alternatively, we can create a new method. The basic idea is that we want to allow users to restore `dl2._seed_generator` to the previously saved version, at the same time, we need to skip over the logic that re-do seeding in `__iter__` (hence the new variable `_skip_iteration_seeding` is needed. I see 2 main scenarios: 1. Users want to restore DataPipe and ReadingService but not the initial state of RNG - I think lots of current users (including some internals) are in this category. - This should work by default because `restore_initial_seed_generator=False` unless user explicitly change it 2. Users actively want to restore DP, RS, and initial state of RNG - Users will need to set an extra variable to `True` and we will make sure `_skip_iteration_seeding=True` so no re-seeding will happen in the first subsequent call of `__iter__` Finally, if users change their mind at any point (after restoring) and want to manual set `seed`. That `seed` will override any other behavior and the `seed` will be used. [ghstack-poisoned]
Changes to `DataLoader2`: - Modifying `state_dict` to store the `initial_seed_generator` that is saved at the beginning of an epoch. - Modifying `from_state` and `load_state_dict` to restore `initial_seed_generator` if the user sets the parameter to `True` - Within `__iter__, skips over the re-seeding process if no manual seed has been specified AND the `seed_generator` was explicitly restored. --- ### Consideration I decided to make modification to the existing APIs. Alternatively, we can create a new method. The basic idea is that we want to allow users to restore `dl2._seed_generator` to the previously saved version, at the same time, we need to skip over the logic that re-do seeding in `__iter__` (hence the new variable `_skip_iteration_seeding` is needed. I see 2 main scenarios: 1. Users want to restore DataPipe and ReadingService but not the initial state of RNG - I think lots of current users (including some internals) are in this category. - This should work by default because `restore_initial_seed_generator=False` unless user explicitly change it 2. Users actively want to restore DP, RS, and initial state of RNG - Users will need to set an extra variable to `True` and we will make sure `_skip_iteration_seeding=True` so no re-seeding will happen in the first subsequent call of `__iter__` Finally, if users change their mind at any point (after restoring) and want to manual set `seed`. That `seed` will override any other behavior and the `seed` will be used. [ghstack-poisoned]
Changes to `DataLoader2`: - Modifying `state_dict` to store the `initial_seed_generator` that is saved at the beginning of an epoch. - Modifying `from_state` and `load_state_dict` to restore `initial_seed_generator` if the user sets the parameter to `True` - Within `__iter__, skips over the re-seeding process if no manual seed has been specified AND the `seed_generator` was explicitly restored. --- ### Consideration I decided to make modification to the existing APIs. Alternatively, we can create a new method. The basic idea is that we want to allow users to restore `dl2._seed_generator` to the previously saved version, at the same time, we need to skip over the logic that re-do seeding in `__iter__` (hence the new variable `_skip_iteration_seeding` is needed. I see 2 main scenarios: 1. Users want to restore DataPipe and ReadingService but not the initial state of RNG - I think lots of current users (including some internals) are in this category. - This should work by default because `restore_initial_seed_generator=False` unless user explicitly change it 2. Users actively want to restore DP, RS, and initial state of RNG - Users will need to set an extra variable to `True` and we will make sure `_skip_iteration_seeding=True` so no re-seeding will happen in the first subsequent call of `__iter__` Finally, if users change their mind at any point (after restoring) and want to manual set `seed`. That `seed` will override any other behavior and the `seed` will be used. [ghstack-poisoned]
Changes to `DataLoader2`: - Modifying `state_dict` to store the `initial_seed_generator` that is saved at the beginning of an epoch. - Modifying `from_state` and `load_state_dict` to restore `initial_seed_generator` if the user sets the parameter to `True` - Within `__iter__, skips over the re-seeding process if no manual seed has been specified AND the `seed_generator` was explicitly restored. --- ### Consideration I decided to make modification to the existing APIs. Alternatively, we can create a new method. The basic idea is that we want to allow users to restore `dl2._seed_generator` to the previously saved version, at the same time, we need to skip over the logic that re-do seeding in `__iter__` (hence the new variable `_skip_iteration_seeding` is needed. I see 2 main scenarios: 1. Users want to restore DataPipe and ReadingService but not the initial state of RNG - I think lots of current users (including some internals) are in this category. - This should work by default because `restore_initial_seed_generator=False` unless user explicitly change it 2. Users actively want to restore DP, RS, and initial state of RNG - Users will need to set an extra variable to `True` and we will make sure `_skip_iteration_seeding=True` so no re-seeding will happen in the first subsequent call of `__iter__` Finally, if users change their mind at any point (after restoring) and want to manual set `seed`. That `seed` will override any other behavior and the `seed` will be used. [ghstack-poisoned]
Changes to `DataLoader2`: - Modifying `state_dict` to store the `initial_seed_generator` that is saved at the beginning of an epoch. - Modifying `from_state` and `load_state_dict` to restore `initial_seed_generator` if the user sets the parameter to `True` - Within `__iter__, skips over the re-seeding process if no manual seed has been specified AND the `seed_generator` was explicitly restored. --- ### Consideration I decided to make modification to the existing APIs. Alternatively, we can create a new method. The basic idea is that we want to allow users to restore `dl2._seed_generator` to the previously saved version, at the same time, we need to skip over the logic that re-do seeding in `__iter__` (hence the new variable `_skip_iteration_seeding` is needed. I see 2 main scenarios: 1. Users want to restore DataPipe and ReadingService but not the initial state of RNG - I think lots of current users (including some internals) are in this category. - This should work by default because `restore_initial_seed_generator=False` unless user explicitly change it 2. Users actively want to restore DP, RS, and initial state of RNG - Users will need to set an extra variable to `True` and we will make sure `_skip_iteration_seeding=True` so no re-seeding will happen in the first subsequent call of `__iter__` Finally, if users change their mind at any point (after restoring) and want to manual set `seed`. That `seed` will override any other behavior and the `seed` will be used. [ghstack-poisoned]
Changes to `DataLoader2`: - Modifying `state_dict` to store the `initial_seed_generator` that is saved at the beginning of an epoch. - Modifying `from_state` and `load_state_dict` to restore `initial_seed_generator` if the user sets the parameter to `True` - Within `__iter__, skips over the re-seeding process if no manual seed has been specified AND the `seed_generator` was explicitly restored. --- ### Consideration I decided to make modification to the existing APIs. Alternatively, we can create a new method. The basic idea is that we want to allow users to restore `dl2._seed_generator` to the previously saved version, at the same time, we need to skip over the logic that re-do seeding in `__iter__` (hence the new variable `_skip_iteration_seeding` is needed. I see 2 main scenarios: 1. Users want to restore DataPipe and ReadingService but not the initial state of RNG - I think lots of current users (including some internals) are in this category. - This should work by default because `restore_initial_seed_generator=False` unless user explicitly change it 2. Users actively want to restore DP, RS, and initial state of RNG - Users will need to set an extra variable to `True` and we will make sure `_skip_iteration_seeding=True` so no re-seeding will happen in the first subsequent call of `__iter__` Finally, if users change their mind at any point (after restoring) and want to manual set `seed`. That `seed` will override any other behavior and the `seed` will be used. [ghstack-poisoned]
Changes to `DataLoader2`: - Modifying `state_dict` to store the `initial_seed_generator` that is saved at the beginning of an epoch. - Modifying `from_state` and `load_state_dict` to restore `initial_seed_generator` if the user sets the parameter to `True` - Within `__iter__, skips over the re-seeding process if no manual seed has been specified AND the `seed_generator` was explicitly restored. --- ### Consideration I decided to make modification to the existing APIs. Alternatively, we can create a new method. The basic idea is that we want to allow users to restore `dl2._seed_generator` to the previously saved version, at the same time, we need to skip over the logic that re-do seeding in `__iter__` (hence the new variable `_skip_iteration_seeding` is needed. I see 2 main scenarios: 1. Users want to restore DataPipe and ReadingService but not the initial state of RNG - I think lots of current users (including some internals) are in this category. - This should work by default because `restore_initial_seed_generator=False` unless user explicitly change it 2. Users actively want to restore DP, RS, and initial state of RNG - Users will need to set an extra variable to `True` and we will make sure `_skip_iteration_seeding=True` so no re-seeding will happen in the first subsequent call of `__iter__` Finally, if users change their mind at any point (after restoring) and want to manual set `seed`. That `seed` will override any other behavior and the `seed` will be used. [ghstack-poisoned]
Changes to `DataLoader2`: - Modifying `state_dict` to store the `initial_seed_generator` that is saved at the beginning of an epoch. - Modifying `from_state` and `load_state_dict` to restore `initial_seed_generator` if the user sets the parameter to `True` - Within `__iter__, skips over the re-seeding process if no manual seed has been specified AND the `seed_generator` was explicitly restored. --- ### Consideration I decided to make modification to the existing APIs. Alternatively, we can create a new method. The basic idea is that we want to allow users to restore `dl2._seed_generator` to the previously saved version, at the same time, we need to skip over the logic that re-do seeding in `__iter__` (hence the new variable `_skip_iteration_seeding` is needed. I see 2 main scenarios: 1. Users want to restore DataPipe and ReadingService but not the initial state of RNG - I think lots of current users (including some internals) are in this category. - This should work by default because `restore_initial_seed_generator=False` unless user explicitly change it 2. Users actively want to restore DP, RS, and initial state of RNG - Users will need to set an extra variable to `True` and we will make sure `_skip_iteration_seeding=True` so no re-seeding will happen in the first subsequent call of `__iter__` Finally, if users change their mind at any point (after restoring) and want to manual set `seed`. That `seed` will override any other behavior and the `seed` will be used. [ghstack-poisoned]
@NivekT has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Changes to `DataLoader2`: - Modifying `state_dict` to store `randomness_state`, which includes: - `_seed: int` - `_reset_seed: bool` - flag indicating whether `_seed` needs to be set - `_seed_generator` - the latest version at the time when `state_dict` is called - `_initial_seed_generator` - the versopm that is saved at the beginning of very epoch - Modifying `from_state` and `load_state_dict` to restore `randomness_state` - Adding a method `_restore_checkpoint_beginning_of_epoch` - This sets `self._seed_generator = self._initial_seed_generator`, allowing users to re-create an epoch from the beginning. --- ### Considerations Storing the randomness states provide more flexibility for users to restore as they see fit. The decision to do that should not be controversial. I decided to make add a new method for checkpointing at the beginning of the epoch, ensure that users are not confused about what randomness is restored by default. The basic idea is that we want to allow users to restore `dl2._seed_generator` to the previously saved version. From that point on, they can create a new `__iter__` and continue from the beginning of the epoch. - Note that since `_seed` and `_reset_seed` are also saved, if the users were planning to use a different seed or if there was a need to re-seed, those remain valid after restoring the checkpoint. - Finally, if users change their mind at any point (after restoring) and want to manual set `seed`. That `seed` will override any other behavior and the `seed` will be used. [ghstack-poisoned]
Changes to `DataLoader2`: - Modifying `state_dict` to store `randomness_state`, which includes: - `_seed: int` - `_reset_seed: bool` - flag indicating whether `_seed` needs to be set - `_seed_generator` - the latest version at the time when `state_dict` is called - `_initial_seed_generator` - the versopm that is saved at the beginning of very epoch - Modifying `from_state` and `load_state_dict` to restore `randomness_state` - Adding a method `_restore_checkpoint_beginning_of_epoch` - This sets `self._seed_generator = self._initial_seed_generator`, allowing users to re-create an epoch from the beginning. --- ### Considerations Storing the randomness states provide more flexibility for users to restore as they see fit. The decision to do that should not be controversial. I decided to make add a new method for checkpointing at the beginning of the epoch, ensure that users are not confused about what randomness is restored by default. The basic idea is that we want to allow users to restore `dl2._seed_generator` to the previously saved version. From that point on, they can create a new `__iter__` and continue from the beginning of the epoch. - Note that since `_seed` and `_reset_seed` are also saved, if the users were planning to use a different seed or if there was a need to re-seed, those remain valid after restoring the checkpoint. - Finally, if users change their mind at any point (after restoring) and want to manual set `seed`. That `seed` will override any other behavior and the `seed` will be used. [ghstack-poisoned]
Changes to `DataLoader2`: - Modifying `state_dict` to store `randomness_state`, which includes: - `_seed: int` - `_reset_seed: bool` - flag indicating whether `_seed` needs to be set - `_seed_generator` - the latest version at the time when `state_dict` is called - `_initial_seed_generator` - the versopm that is saved at the beginning of very epoch - Modifying `from_state` and `load_state_dict` to restore `randomness_state` - Adding a method `_restore_checkpoint_beginning_of_epoch` - This sets `self._seed_generator = self._initial_seed_generator`, allowing users to re-create an epoch from the beginning. --- ### Considerations Storing the randomness states provide more flexibility for users to restore as they see fit. The decision to do that should not be controversial. I decided to make add a new method for checkpointing at the beginning of the epoch, ensure that users are not confused about what randomness is restored by default. The basic idea is that we want to allow users to restore `dl2._seed_generator` to the previously saved version. From that point on, they can create a new `__iter__` and continue from the beginning of the epoch. - Note that since `_seed` and `_reset_seed` are also saved, if the users were planning to use a different seed or if there was a need to re-seed, those remain valid after restoring the checkpoint. - Finally, if users change their mind at any point (after restoring) and want to manual set `seed`. That `seed` will override any other behavior and the `seed` will be used. [ghstack-poisoned]
ghstack-source-id: 9aa77a0f2de7b83ba406552228880f720e20a3b2 Pull Request resolved: #998
I looked into whether we can refactor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM with a few comments below
Changes to `DataLoader2`: - Modifying `state_dict` to store `randomness_state`, which includes: - `_seed: int` - `_reset_seed: bool` - flag indicating whether `_seed` needs to be set - `_seed_generator` - the latest version at the time when `state_dict` is called - `_initial_seed_generator` - the versopm that is saved at the beginning of very epoch - Modifying `from_state` and `load_state_dict` to restore `randomness_state` - Adding a method `_restore_checkpoint_beginning_of_epoch` - This sets `self._seed_generator = self._initial_seed_generator`, allowing users to re-create an epoch from the beginning. --- ### Considerations Storing the randomness states provide more flexibility for users to restore as they see fit. The decision to do that should not be controversial. I decided to make add a new method for checkpointing at the beginning of the epoch, ensure that users are not confused about what randomness is restored by default. The basic idea is that we want to allow users to restore `dl2._seed_generator` to the previously saved version. From that point on, they can create a new `__iter__` and continue from the beginning of the epoch. - Note that since `_seed` and `_reset_seed` are also saved, if the users were planning to use a different seed or if there was a need to re-seed, those remain valid after restoring the checkpoint. - Finally, if users change their mind at any point (after restoring) and want to manual set `seed`. That `seed` will override any other behavior and the `seed` will be used. [ghstack-poisoned]
@NivekT has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Changes to `DataLoader2`: - Modifying `state_dict` to store `randomness_state`, which includes: - `_seed: int` - `_reset_seed: bool` - flag indicating whether `_seed` needs to be set - `_seed_generator` - the latest version at the time when `state_dict` is called - `_initial_seed_generator` - the versopm that is saved at the beginning of very epoch - Modifying `from_state` and `load_state_dict` to restore `randomness_state` - Adding a method `_restore_checkpoint_beginning_of_epoch` - This sets `self._seed_generator = self._initial_seed_generator`, allowing users to re-create an epoch from the beginning. --- ### Considerations Storing the randomness states provide more flexibility for users to restore as they see fit. The decision to do that should not be controversial. I decided to make add a new method for checkpointing at the beginning of the epoch, ensure that users are not confused about what randomness is restored by default. The basic idea is that we want to allow users to restore `dl2._seed_generator` to the previously saved version. From that point on, they can create a new `__iter__` and continue from the beginning of the epoch. - Note that since `_seed` and `_reset_seed` are also saved, if the users were planning to use a different seed or if there was a need to re-seed, those remain valid after restoring the checkpoint. - Finally, if users change their mind at any point (after restoring) and want to manual set `seed`. That `seed` will override any other behavior and the `seed` will be used. Differential Revision: [D44390519](https://our.internmc.facebook.com/intern/diff/D44390519) [ghstack-poisoned]
@NivekT has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
1 similar comment
@NivekT has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Changes to `DataLoader2`: - Modifying `state_dict` to store `randomness_state`, which includes: - `_seed: int` - `_reset_seed: bool` - flag indicating whether `_seed` needs to be set - `_seed_generator` - the latest version at the time when `state_dict` is called - `_initial_seed_generator` - the versopm that is saved at the beginning of very epoch - Modifying `from_state` and `load_state_dict` to restore `randomness_state` - Adding a method `_restore_checkpoint_beginning_of_epoch` - This sets `self._seed_generator = self._initial_seed_generator`, allowing users to re-create an epoch from the beginning. --- ### Considerations Storing the randomness states provide more flexibility for users to restore as they see fit. The decision to do that should not be controversial. I decided to make add a new method for checkpointing at the beginning of the epoch, ensure that users are not confused about what randomness is restored by default. The basic idea is that we want to allow users to restore `dl2._seed_generator` to the previously saved version. From that point on, they can create a new `__iter__` and continue from the beginning of the epoch. - Note that since `_seed` and `_reset_seed` are also saved, if the users were planning to use a different seed or if there was a need to re-seed, those remain valid after restoring the checkpoint. - Finally, if users change their mind at any point (after restoring) and want to manual set `seed`. That `seed` will override any other behavior and the `seed` will be used. Differential Revision: [D44390519](https://our.internmc.facebook.com/intern/diff/D44390519) [ghstack-poisoned]
@NivekT has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
…tate for backward compatibility" Follow up to #998 for backward compatibility. Differential Revision: [D44747988](https://our.internmc.facebook.com/intern/diff/D44747988) [ghstack-poisoned]
…d compatibility" Follow up to #998 for backward compatibility. Differential Revision: [D44747988](https://our.internmc.facebook.com/intern/diff/D44747988) [ghstack-poisoned]
…tate for backward compatibility" Follow up to #998 for backward compatibility. Differential Revision: [D44747988](https://our.internmc.facebook.com/intern/diff/D44747988) [ghstack-poisoned]
…d compatibility" Follow up to #998 for backward compatibility. Differential Revision: [D44747988](https://our.internmc.facebook.com/intern/diff/D44747988) [ghstack-poisoned]
Summary: Pull Request resolved: #1124 Reland of #998 with added guard while loading randomness state in `DataLoader2` for backward compatibility Changes to `DataLoader2`: - Modifying `state_dict` to store `randomness_state`, which includes: - `_seed: int` - `_reset_seed: bool` - flag indicating whether `_seed` needs to be set - `_seed_generator` - the latest version at the time when `state_dict` is called - `_initial_seed_generator` - the versopm that is saved at the beginning of very epoch - Modifying `from_state` and `load_state_dict` to restore `randomness_state` - Adding a method `_restore_checkpoint_beginning_of_epoch` - This sets `self._seed_generator = self._initial_seed_generator`, allowing users to re-create an epoch from the beginning. --- ### Considerations Storing the randomness states provide more flexibility for users to restore as they see fit. The decision to do that should not be controversial. I decided to make add a new method for checkpointing at the beginning of the epoch, ensure that users are not confused about what randomness is restored by default. The basic idea is that we want to allow users to restore `dl2._seed_generator` to the previously saved version. From that point on, they can create a new `__iter__` and continue from the beginning of the epoch. - Note that since `_seed` and `_reset_seed` are also saved, if the users were planning to use a different seed or if there was a need to re-seed, those remain valid after restoring the checkpoint. - Finally, if users change their mind at any point (after restoring) and want to manual set `seed`. That `seed` will override any other behavior and the `seed` will be used. Test Plan: Imported from OSS f425956975 Reviewed By: bearzx Differential Revision: D44748514 Pulled By: NivekT fbshipit-source-id: 8713592902b1e0680e46e4db4280c84c708dbf55
Stack from ghstack:
restore_iteration
to ReadingService method for arbitrary checkpointing #1056Changes to
DataLoader2
:state_dict
to storerandomness_state
, which includes:_seed: int
_reset_seed: bool
- flag indicating whether_seed
needs to be set_seed_generator
- the latest version at the time whenstate_dict
is called_initial_seed_generator
- the versopm that is saved at the beginning of very epochfrom_state
andload_state_dict
to restorerandomness_state
_restore_checkpoint_beginning_of_epoch
self._seed_generator = self._initial_seed_generator
, allowing users to re-create an epoch from the beginning.Considerations
Storing the randomness states provide more flexibility for users to restore as they see fit. The decision to do that should not be controversial.
I decided to make add a new method for checkpointing at the beginning of the epoch, ensure that users are not confused about what randomness is restored by default.
The basic idea is that we want to allow users to restore
dl2._seed_generator
to the previously saved version. From that point on, they can create a new__iter__
and continue from the beginning of the epoch._seed
and_reset_seed
are also saved, if the users were planning to use a different seed or if there was a need to re-seed, those remain valid after restoring the checkpoint.seed
. Thatseed
will override any other behavior and theseed
will be used.Differential Revision: D44390519