Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DataLoader2] Saving and restoring initial seed generator #998

Closed
wants to merge 19 commits into from

Commits on Feb 8, 2023

  1. [DataLoader2] Saving and restoring initial seed generator

    [ghstack-poisoned]
    NivekT committed Feb 8, 2023
    Configuration menu
    Copy the full SHA
    05c0bf1 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    45b6998 View commit details
    Browse the repository at this point in the history

Commits on Feb 9, 2023

  1. Update on "[DataLoader2] Saving and restoring initial seed generator"

    Changes to `DataLoader2`:
    - Modifying `state_dict` to store the `initial_seed_generator` that is saved at the beginning of an epoch.
    - Modifying `from_state` and `load_state_dict` to restore `initial_seed_generator` if the user sets the parameter to `True`
    - Within `__iter__, skips over the re-seeding process if no manual seed has been specified AND the `seed_generator` was explicitly restored.
    
    
    ---
    ### Consideration
    
    I decided to make modification to the existing APIs. Alternatively, we can create a new method.
    
    The basic idea is that we want to allow users to restore `dl2._seed_generator` to the previously saved version, at the same time, we need to skip over the logic that re-do seeding in `__iter__` (hence the new variable `_skip_iteration_seeding` is needed.
    
    I see 2 main scenarios:
    1. Users want to restore DataPipe and ReadingService but not the initial state of RNG
        - I think lots of current users (including some internals) are in this category.
        - This should work by default because `restore_initial_seed_generator=False` unless user explicitly change it
    2. Users actively want to restore DP, RS, and initial state of RNG
        - Users will need to set an extra variable to `True` and we will make sure `_skip_iteration_seeding=True` so no re-seeding will happen in the first subsequent call of `__iter__`
    
    Finally, if users change their mind at any point (after restoring) and want to manual set `seed`. That `seed` will override any other behavior and the `seed` will be used.
    
    
    [ghstack-poisoned]
    NivekT committed Feb 9, 2023
    Configuration menu
    Copy the full SHA
    9d6f38d View commit details
    Browse the repository at this point in the history
  2. Update on "[DataLoader2] Saving and restoring initial seed generator"

    Changes to `DataLoader2`:
    - Modifying `state_dict` to store the `initial_seed_generator` that is saved at the beginning of an epoch.
    - Modifying `from_state` and `load_state_dict` to restore `initial_seed_generator` if the user sets the parameter to `True`
    - Within `__iter__, skips over the re-seeding process if no manual seed has been specified AND the `seed_generator` was explicitly restored.
    
    
    ---
    ### Consideration
    
    I decided to make modification to the existing APIs. Alternatively, we can create a new method.
    
    The basic idea is that we want to allow users to restore `dl2._seed_generator` to the previously saved version, at the same time, we need to skip over the logic that re-do seeding in `__iter__` (hence the new variable `_skip_iteration_seeding` is needed.
    
    I see 2 main scenarios:
    1. Users want to restore DataPipe and ReadingService but not the initial state of RNG
        - I think lots of current users (including some internals) are in this category.
        - This should work by default because `restore_initial_seed_generator=False` unless user explicitly change it
    2. Users actively want to restore DP, RS, and initial state of RNG
        - Users will need to set an extra variable to `True` and we will make sure `_skip_iteration_seeding=True` so no re-seeding will happen in the first subsequent call of `__iter__`
    
    Finally, if users change their mind at any point (after restoring) and want to manual set `seed`. That `seed` will override any other behavior and the `seed` will be used.
    
    
    [ghstack-poisoned]
    NivekT committed Feb 9, 2023
    Configuration menu
    Copy the full SHA
    389e567 View commit details
    Browse the repository at this point in the history

Commits on Feb 10, 2023

  1. Update on "[DataLoader2] Saving and restoring initial seed generator"

    Changes to `DataLoader2`:
    - Modifying `state_dict` to store the `initial_seed_generator` that is saved at the beginning of an epoch.
    - Modifying `from_state` and `load_state_dict` to restore `initial_seed_generator` if the user sets the parameter to `True`
    - Within `__iter__, skips over the re-seeding process if no manual seed has been specified AND the `seed_generator` was explicitly restored.
    
    
    ---
    ### Consideration
    
    I decided to make modification to the existing APIs. Alternatively, we can create a new method.
    
    The basic idea is that we want to allow users to restore `dl2._seed_generator` to the previously saved version, at the same time, we need to skip over the logic that re-do seeding in `__iter__` (hence the new variable `_skip_iteration_seeding` is needed.
    
    I see 2 main scenarios:
    1. Users want to restore DataPipe and ReadingService but not the initial state of RNG
        - I think lots of current users (including some internals) are in this category.
        - This should work by default because `restore_initial_seed_generator=False` unless user explicitly change it
    2. Users actively want to restore DP, RS, and initial state of RNG
        - Users will need to set an extra variable to `True` and we will make sure `_skip_iteration_seeding=True` so no re-seeding will happen in the first subsequent call of `__iter__`
    
    Finally, if users change their mind at any point (after restoring) and want to manual set `seed`. That `seed` will override any other behavior and the `seed` will be used.
    
    
    [ghstack-poisoned]
    NivekT committed Feb 10, 2023
    Configuration menu
    Copy the full SHA
    955e412 View commit details
    Browse the repository at this point in the history
  2. Update on "[DataLoader2] Saving and restoring initial seed generator"

    Changes to `DataLoader2`:
    - Modifying `state_dict` to store the `initial_seed_generator` that is saved at the beginning of an epoch.
    - Modifying `from_state` and `load_state_dict` to restore `initial_seed_generator` if the user sets the parameter to `True`
    - Within `__iter__, skips over the re-seeding process if no manual seed has been specified AND the `seed_generator` was explicitly restored.
    
    
    ---
    ### Consideration
    
    I decided to make modification to the existing APIs. Alternatively, we can create a new method.
    
    The basic idea is that we want to allow users to restore `dl2._seed_generator` to the previously saved version, at the same time, we need to skip over the logic that re-do seeding in `__iter__` (hence the new variable `_skip_iteration_seeding` is needed.
    
    I see 2 main scenarios:
    1. Users want to restore DataPipe and ReadingService but not the initial state of RNG
        - I think lots of current users (including some internals) are in this category.
        - This should work by default because `restore_initial_seed_generator=False` unless user explicitly change it
    2. Users actively want to restore DP, RS, and initial state of RNG
        - Users will need to set an extra variable to `True` and we will make sure `_skip_iteration_seeding=True` so no re-seeding will happen in the first subsequent call of `__iter__`
    
    Finally, if users change their mind at any point (after restoring) and want to manual set `seed`. That `seed` will override any other behavior and the `seed` will be used.
    
    
    [ghstack-poisoned]
    NivekT committed Feb 10, 2023
    Configuration menu
    Copy the full SHA
    90278bf View commit details
    Browse the repository at this point in the history

Commits on Feb 13, 2023

  1. Update on "[DataLoader2] Saving and restoring initial seed generator"

    Changes to `DataLoader2`:
    - Modifying `state_dict` to store the `initial_seed_generator` that is saved at the beginning of an epoch.
    - Modifying `from_state` and `load_state_dict` to restore `initial_seed_generator` if the user sets the parameter to `True`
    - Within `__iter__, skips over the re-seeding process if no manual seed has been specified AND the `seed_generator` was explicitly restored.
    
    
    ---
    ### Consideration
    
    I decided to make modification to the existing APIs. Alternatively, we can create a new method.
    
    The basic idea is that we want to allow users to restore `dl2._seed_generator` to the previously saved version, at the same time, we need to skip over the logic that re-do seeding in `__iter__` (hence the new variable `_skip_iteration_seeding` is needed.
    
    I see 2 main scenarios:
    1. Users want to restore DataPipe and ReadingService but not the initial state of RNG
        - I think lots of current users (including some internals) are in this category.
        - This should work by default because `restore_initial_seed_generator=False` unless user explicitly change it
    2. Users actively want to restore DP, RS, and initial state of RNG
        - Users will need to set an extra variable to `True` and we will make sure `_skip_iteration_seeding=True` so no re-seeding will happen in the first subsequent call of `__iter__`
    
    Finally, if users change their mind at any point (after restoring) and want to manual set `seed`. That `seed` will override any other behavior and the `seed` will be used.
    
    
    [ghstack-poisoned]
    NivekT committed Feb 13, 2023
    Configuration menu
    Copy the full SHA
    1a8ebdd View commit details
    Browse the repository at this point in the history

Commits on Feb 15, 2023

  1. Update on "[DataLoader2] Saving and restoring initial seed generator"

    Changes to `DataLoader2`:
    - Modifying `state_dict` to store the `initial_seed_generator` that is saved at the beginning of an epoch.
    - Modifying `from_state` and `load_state_dict` to restore `initial_seed_generator` if the user sets the parameter to `True`
    - Within `__iter__, skips over the re-seeding process if no manual seed has been specified AND the `seed_generator` was explicitly restored.
    
    
    ---
    ### Consideration
    
    I decided to make modification to the existing APIs. Alternatively, we can create a new method.
    
    The basic idea is that we want to allow users to restore `dl2._seed_generator` to the previously saved version, at the same time, we need to skip over the logic that re-do seeding in `__iter__` (hence the new variable `_skip_iteration_seeding` is needed.
    
    I see 2 main scenarios:
    1. Users want to restore DataPipe and ReadingService but not the initial state of RNG
        - I think lots of current users (including some internals) are in this category.
        - This should work by default because `restore_initial_seed_generator=False` unless user explicitly change it
    2. Users actively want to restore DP, RS, and initial state of RNG
        - Users will need to set an extra variable to `True` and we will make sure `_skip_iteration_seeding=True` so no re-seeding will happen in the first subsequent call of `__iter__`
    
    Finally, if users change their mind at any point (after restoring) and want to manual set `seed`. That `seed` will override any other behavior and the `seed` will be used.
    
    
    [ghstack-poisoned]
    NivekT committed Feb 15, 2023
    Configuration menu
    Copy the full SHA
    fa1f93f View commit details
    Browse the repository at this point in the history
  2. Update on "[DataLoader2] Saving and restoring initial seed generator"

    Changes to `DataLoader2`:
    - Modifying `state_dict` to store the `initial_seed_generator` that is saved at the beginning of an epoch.
    - Modifying `from_state` and `load_state_dict` to restore `initial_seed_generator` if the user sets the parameter to `True`
    - Within `__iter__, skips over the re-seeding process if no manual seed has been specified AND the `seed_generator` was explicitly restored.
    
    
    ---
    ### Consideration
    
    I decided to make modification to the existing APIs. Alternatively, we can create a new method.
    
    The basic idea is that we want to allow users to restore `dl2._seed_generator` to the previously saved version, at the same time, we need to skip over the logic that re-do seeding in `__iter__` (hence the new variable `_skip_iteration_seeding` is needed.
    
    I see 2 main scenarios:
    1. Users want to restore DataPipe and ReadingService but not the initial state of RNG
        - I think lots of current users (including some internals) are in this category.
        - This should work by default because `restore_initial_seed_generator=False` unless user explicitly change it
    2. Users actively want to restore DP, RS, and initial state of RNG
        - Users will need to set an extra variable to `True` and we will make sure `_skip_iteration_seeding=True` so no re-seeding will happen in the first subsequent call of `__iter__`
    
    Finally, if users change their mind at any point (after restoring) and want to manual set `seed`. That `seed` will override any other behavior and the `seed` will be used.
    
    
    [ghstack-poisoned]
    NivekT committed Feb 15, 2023
    Configuration menu
    Copy the full SHA
    4653af3 View commit details
    Browse the repository at this point in the history
  3. Update on "[DataLoader2] Saving and restoring initial seed generator"

    Changes to `DataLoader2`:
    - Modifying `state_dict` to store the `initial_seed_generator` that is saved at the beginning of an epoch.
    - Modifying `from_state` and `load_state_dict` to restore `initial_seed_generator` if the user sets the parameter to `True`
    - Within `__iter__, skips over the re-seeding process if no manual seed has been specified AND the `seed_generator` was explicitly restored.
    
    
    ---
    ### Consideration
    
    I decided to make modification to the existing APIs. Alternatively, we can create a new method.
    
    The basic idea is that we want to allow users to restore `dl2._seed_generator` to the previously saved version, at the same time, we need to skip over the logic that re-do seeding in `__iter__` (hence the new variable `_skip_iteration_seeding` is needed.
    
    I see 2 main scenarios:
    1. Users want to restore DataPipe and ReadingService but not the initial state of RNG
        - I think lots of current users (including some internals) are in this category.
        - This should work by default because `restore_initial_seed_generator=False` unless user explicitly change it
    2. Users actively want to restore DP, RS, and initial state of RNG
        - Users will need to set an extra variable to `True` and we will make sure `_skip_iteration_seeding=True` so no re-seeding will happen in the first subsequent call of `__iter__`
    
    Finally, if users change their mind at any point (after restoring) and want to manual set `seed`. That `seed` will override any other behavior and the `seed` will be used.
    
    
    [ghstack-poisoned]
    NivekT committed Feb 15, 2023
    Configuration menu
    Copy the full SHA
    15f774e View commit details
    Browse the repository at this point in the history

Commits on Feb 16, 2023

  1. Update on "[DataLoader2] Saving and restoring initial seed generator"

    Changes to `DataLoader2`:
    - Modifying `state_dict` to store the `initial_seed_generator` that is saved at the beginning of an epoch.
    - Modifying `from_state` and `load_state_dict` to restore `initial_seed_generator` if the user sets the parameter to `True`
    - Within `__iter__, skips over the re-seeding process if no manual seed has been specified AND the `seed_generator` was explicitly restored.
    
    
    ---
    ### Consideration
    
    I decided to make modification to the existing APIs. Alternatively, we can create a new method.
    
    The basic idea is that we want to allow users to restore `dl2._seed_generator` to the previously saved version, at the same time, we need to skip over the logic that re-do seeding in `__iter__` (hence the new variable `_skip_iteration_seeding` is needed.
    
    I see 2 main scenarios:
    1. Users want to restore DataPipe and ReadingService but not the initial state of RNG
        - I think lots of current users (including some internals) are in this category.
        - This should work by default because `restore_initial_seed_generator=False` unless user explicitly change it
    2. Users actively want to restore DP, RS, and initial state of RNG
        - Users will need to set an extra variable to `True` and we will make sure `_skip_iteration_seeding=True` so no re-seeding will happen in the first subsequent call of `__iter__`
    
    Finally, if users change their mind at any point (after restoring) and want to manual set `seed`. That `seed` will override any other behavior and the `seed` will be used.
    
    
    [ghstack-poisoned]
    NivekT committed Feb 16, 2023
    Configuration menu
    Copy the full SHA
    5de743f View commit details
    Browse the repository at this point in the history

Commits on Feb 28, 2023

  1. Update on "[DataLoader2] Saving and restoring initial seed generator"

    Changes to `DataLoader2`:
    - Modifying `state_dict` to store the `initial_seed_generator` that is saved at the beginning of an epoch.
    - Modifying `from_state` and `load_state_dict` to restore `initial_seed_generator` if the user sets the parameter to `True`
    - Within `__iter__, skips over the re-seeding process if no manual seed has been specified AND the `seed_generator` was explicitly restored.
    
    
    ---
    ### Consideration
    
    I decided to make modification to the existing APIs. Alternatively, we can create a new method.
    
    The basic idea is that we want to allow users to restore `dl2._seed_generator` to the previously saved version, at the same time, we need to skip over the logic that re-do seeding in `__iter__` (hence the new variable `_skip_iteration_seeding` is needed.
    
    I see 2 main scenarios:
    1. Users want to restore DataPipe and ReadingService but not the initial state of RNG
        - I think lots of current users (including some internals) are in this category.
        - This should work by default because `restore_initial_seed_generator=False` unless user explicitly change it
    2. Users actively want to restore DP, RS, and initial state of RNG
        - Users will need to set an extra variable to `True` and we will make sure `_skip_iteration_seeding=True` so no re-seeding will happen in the first subsequent call of `__iter__`
    
    Finally, if users change their mind at any point (after restoring) and want to manual set `seed`. That `seed` will override any other behavior and the `seed` will be used.
    
    
    [ghstack-poisoned]
    NivekT committed Feb 28, 2023
    Configuration menu
    Copy the full SHA
    18a5c26 View commit details
    Browse the repository at this point in the history
  2. Update on "[DataLoader2] Saving and restoring initial seed generator"

    Changes to `DataLoader2`:
    - Modifying `state_dict` to store the `initial_seed_generator` that is saved at the beginning of an epoch.
    - Modifying `from_state` and `load_state_dict` to restore `initial_seed_generator` if the user sets the parameter to `True`
    - Within `__iter__, skips over the re-seeding process if no manual seed has been specified AND the `seed_generator` was explicitly restored.
    
    
    ---
    ### Consideration
    
    I decided to make modification to the existing APIs. Alternatively, we can create a new method.
    
    The basic idea is that we want to allow users to restore `dl2._seed_generator` to the previously saved version, at the same time, we need to skip over the logic that re-do seeding in `__iter__` (hence the new variable `_skip_iteration_seeding` is needed.
    
    I see 2 main scenarios:
    1. Users want to restore DataPipe and ReadingService but not the initial state of RNG
        - I think lots of current users (including some internals) are in this category.
        - This should work by default because `restore_initial_seed_generator=False` unless user explicitly change it
    2. Users actively want to restore DP, RS, and initial state of RNG
        - Users will need to set an extra variable to `True` and we will make sure `_skip_iteration_seeding=True` so no re-seeding will happen in the first subsequent call of `__iter__`
    
    Finally, if users change their mind at any point (after restoring) and want to manual set `seed`. That `seed` will override any other behavior and the `seed` will be used.
    
    
    [ghstack-poisoned]
    NivekT committed Feb 28, 2023
    Configuration menu
    Copy the full SHA
    9f87c01 View commit details
    Browse the repository at this point in the history

Commits on Mar 17, 2023

  1. Update on "[DataLoader2] Saving and restoring initial seed generator"

    Changes to `DataLoader2`:
    - Modifying `state_dict` to store `randomness_state`, which includes:
        - `_seed: int`
        - `_reset_seed: bool` - flag indicating whether `_seed` needs to be set
        - `_seed_generator` - the latest version at the time when `state_dict` is called
        - `_initial_seed_generator` - the versopm that is saved at the beginning of very epoch
    - Modifying `from_state` and `load_state_dict` to restore `randomness_state`
    - Adding a method `_restore_checkpoint_beginning_of_epoch`
        -  This sets `self._seed_generator = self._initial_seed_generator`, allowing users to re-create an epoch from the beginning.
    
    
    
    ---
    ### Considerations
    
    Storing the randomness states provide more flexibility for users to restore as they see fit. The decision to do that should not be controversial.
    
    I decided to make add a new method for checkpointing at the beginning of the epoch, ensure that users are not confused about what randomness is restored by default.
    
    The basic idea is that we want to allow users to restore `dl2._seed_generator` to the previously saved version. From that point on, they can create a new `__iter__` and continue from the beginning of the epoch.
    - Note that since `_seed` and `_reset_seed` are also saved, if the users were planning to use a different seed or if there was a need to re-seed, those remain valid after restoring the checkpoint. 
    - Finally, if users change their mind at any point (after restoring) and want to manual set `seed`. That `seed` will override any other behavior and the `seed` will be used.
    
    
    [ghstack-poisoned]
    NivekT committed Mar 17, 2023
    Configuration menu
    Copy the full SHA
    3cdd2ea View commit details
    Browse the repository at this point in the history
  2. Update on "[DataLoader2] Saving and restoring initial seed generator"

    Changes to `DataLoader2`:
    - Modifying `state_dict` to store `randomness_state`, which includes:
        - `_seed: int`
        - `_reset_seed: bool` - flag indicating whether `_seed` needs to be set
        - `_seed_generator` - the latest version at the time when `state_dict` is called
        - `_initial_seed_generator` - the versopm that is saved at the beginning of very epoch
    - Modifying `from_state` and `load_state_dict` to restore `randomness_state`
    - Adding a method `_restore_checkpoint_beginning_of_epoch`
        -  This sets `self._seed_generator = self._initial_seed_generator`, allowing users to re-create an epoch from the beginning.
    
    
    
    ---
    ### Considerations
    
    Storing the randomness states provide more flexibility for users to restore as they see fit. The decision to do that should not be controversial.
    
    I decided to make add a new method for checkpointing at the beginning of the epoch, ensure that users are not confused about what randomness is restored by default.
    
    The basic idea is that we want to allow users to restore `dl2._seed_generator` to the previously saved version. From that point on, they can create a new `__iter__` and continue from the beginning of the epoch.
    - Note that since `_seed` and `_reset_seed` are also saved, if the users were planning to use a different seed or if there was a need to re-seed, those remain valid after restoring the checkpoint. 
    - Finally, if users change their mind at any point (after restoring) and want to manual set `seed`. That `seed` will override any other behavior and the `seed` will be used.
    
    
    [ghstack-poisoned]
    NivekT committed Mar 17, 2023
    Configuration menu
    Copy the full SHA
    ef850ed View commit details
    Browse the repository at this point in the history
  3. Update on "[DataLoader2] Saving and restoring initial seed generator"

    Changes to `DataLoader2`:
    - Modifying `state_dict` to store `randomness_state`, which includes:
        - `_seed: int`
        - `_reset_seed: bool` - flag indicating whether `_seed` needs to be set
        - `_seed_generator` - the latest version at the time when `state_dict` is called
        - `_initial_seed_generator` - the versopm that is saved at the beginning of very epoch
    - Modifying `from_state` and `load_state_dict` to restore `randomness_state`
    - Adding a method `_restore_checkpoint_beginning_of_epoch`
        -  This sets `self._seed_generator = self._initial_seed_generator`, allowing users to re-create an epoch from the beginning.
    
    
    
    ---
    ### Considerations
    
    Storing the randomness states provide more flexibility for users to restore as they see fit. The decision to do that should not be controversial.
    
    I decided to make add a new method for checkpointing at the beginning of the epoch, ensure that users are not confused about what randomness is restored by default.
    
    The basic idea is that we want to allow users to restore `dl2._seed_generator` to the previously saved version. From that point on, they can create a new `__iter__` and continue from the beginning of the epoch.
    - Note that since `_seed` and `_reset_seed` are also saved, if the users were planning to use a different seed or if there was a need to re-seed, those remain valid after restoring the checkpoint. 
    - Finally, if users change their mind at any point (after restoring) and want to manual set `seed`. That `seed` will override any other behavior and the `seed` will be used.
    
    
    [ghstack-poisoned]
    NivekT committed Mar 17, 2023
    Configuration menu
    Copy the full SHA
    b715066 View commit details
    Browse the repository at this point in the history

Commits on Mar 24, 2023

  1. Update on "[DataLoader2] Saving and restoring initial seed generator"

    Changes to `DataLoader2`:
    - Modifying `state_dict` to store `randomness_state`, which includes:
        - `_seed: int`
        - `_reset_seed: bool` - flag indicating whether `_seed` needs to be set
        - `_seed_generator` - the latest version at the time when `state_dict` is called
        - `_initial_seed_generator` - the versopm that is saved at the beginning of very epoch
    - Modifying `from_state` and `load_state_dict` to restore `randomness_state`
    - Adding a method `_restore_checkpoint_beginning_of_epoch`
        -  This sets `self._seed_generator = self._initial_seed_generator`, allowing users to re-create an epoch from the beginning.
    
    
    
    ---
    ### Considerations
    
    Storing the randomness states provide more flexibility for users to restore as they see fit. The decision to do that should not be controversial.
    
    I decided to make add a new method for checkpointing at the beginning of the epoch, ensure that users are not confused about what randomness is restored by default.
    
    The basic idea is that we want to allow users to restore `dl2._seed_generator` to the previously saved version. From that point on, they can create a new `__iter__` and continue from the beginning of the epoch.
    - Note that since `_seed` and `_reset_seed` are also saved, if the users were planning to use a different seed or if there was a need to re-seed, those remain valid after restoring the checkpoint. 
    - Finally, if users change their mind at any point (after restoring) and want to manual set `seed`. That `seed` will override any other behavior and the `seed` will be used.
    
    
    [ghstack-poisoned]
    NivekT committed Mar 24, 2023
    Configuration menu
    Copy the full SHA
    5e56222 View commit details
    Browse the repository at this point in the history
  2. Update on "[DataLoader2] Saving and restoring initial seed generator"

    Changes to `DataLoader2`:
    - Modifying `state_dict` to store `randomness_state`, which includes:
        - `_seed: int`
        - `_reset_seed: bool` - flag indicating whether `_seed` needs to be set
        - `_seed_generator` - the latest version at the time when `state_dict` is called
        - `_initial_seed_generator` - the versopm that is saved at the beginning of very epoch
    - Modifying `from_state` and `load_state_dict` to restore `randomness_state`
    - Adding a method `_restore_checkpoint_beginning_of_epoch`
        -  This sets `self._seed_generator = self._initial_seed_generator`, allowing users to re-create an epoch from the beginning.
    
    
    
    ---
    ### Considerations
    
    Storing the randomness states provide more flexibility for users to restore as they see fit. The decision to do that should not be controversial.
    
    I decided to make add a new method for checkpointing at the beginning of the epoch, ensure that users are not confused about what randomness is restored by default.
    
    The basic idea is that we want to allow users to restore `dl2._seed_generator` to the previously saved version. From that point on, they can create a new `__iter__` and continue from the beginning of the epoch.
    - Note that since `_seed` and `_reset_seed` are also saved, if the users were planning to use a different seed or if there was a need to re-seed, those remain valid after restoring the checkpoint. 
    - Finally, if users change their mind at any point (after restoring) and want to manual set `seed`. That `seed` will override any other behavior and the `seed` will be used.
    
    Differential Revision: [D44390519](https://our.internmc.facebook.com/intern/diff/D44390519)
    
    [ghstack-poisoned]
    NivekT committed Mar 24, 2023
    Configuration menu
    Copy the full SHA
    0433509 View commit details
    Browse the repository at this point in the history

Commits on Mar 27, 2023

  1. Update on "[DataLoader2] Saving and restoring initial seed generator"

    Changes to `DataLoader2`:
    - Modifying `state_dict` to store `randomness_state`, which includes:
        - `_seed: int`
        - `_reset_seed: bool` - flag indicating whether `_seed` needs to be set
        - `_seed_generator` - the latest version at the time when `state_dict` is called
        - `_initial_seed_generator` - the versopm that is saved at the beginning of very epoch
    - Modifying `from_state` and `load_state_dict` to restore `randomness_state`
    - Adding a method `_restore_checkpoint_beginning_of_epoch`
        -  This sets `self._seed_generator = self._initial_seed_generator`, allowing users to re-create an epoch from the beginning.
    
    
    
    ---
    ### Considerations
    
    Storing the randomness states provide more flexibility for users to restore as they see fit. The decision to do that should not be controversial.
    
    I decided to make add a new method for checkpointing at the beginning of the epoch, ensure that users are not confused about what randomness is restored by default.
    
    The basic idea is that we want to allow users to restore `dl2._seed_generator` to the previously saved version. From that point on, they can create a new `__iter__` and continue from the beginning of the epoch.
    - Note that since `_seed` and `_reset_seed` are also saved, if the users were planning to use a different seed or if there was a need to re-seed, those remain valid after restoring the checkpoint. 
    - Finally, if users change their mind at any point (after restoring) and want to manual set `seed`. That `seed` will override any other behavior and the `seed` will be used.
    
    Differential Revision: [D44390519](https://our.internmc.facebook.com/intern/diff/D44390519)
    
    [ghstack-poisoned]
    NivekT committed Mar 27, 2023
    Configuration menu
    Copy the full SHA
    0d4854c View commit details
    Browse the repository at this point in the history