-
Notifications
You must be signed in to change notification settings - Fork 167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prepare GradScaler for hivemind.Optimizer #413
Conversation
justheuristic
commented
Nov 18, 2021
•
edited
Loading
edited
- Modified hivemind.GradScaler to make it compatible with hivemind.Optimizer (backwards-compatible)
- Changed TrainingStateAverager to be compatible with hivemind.GradScaler
- Made TrainingStateAverager.main_parameters and parameter_names public for use in optimizer
@@ -100,7 +100,7 @@ def __init__( | |||
self.offload_optimizer = offload_optimizer | |||
self.custom_gradients = custom_gradients | |||
|
|||
self._main_parameters, self._parameter_names = main_parameters, parameter_names | |||
self.main_parameters, self.parameter_names = main_parameters, parameter_names |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
made them public because these params are needed in hivemind.Optimizer.step
... and they are no more private than, for instance, opt_keys_for_averaging
@@ -378,20 +378,31 @@ def step( | |||
self.finished_optimizer_step.clear() | |||
return output | |||
|
|||
def _do(self, optimizer_step: bool, zero_grad: bool, averaging_round: bool, **kwargs): | |||
def _do(self, optimizer_step: bool, zero_grad: bool, averaging_round: bool, grad_scaler: Optional[GradScaler], **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_step_inner?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's discuss this with @borzunov on the main PR
hivemind/optim/grad_scaler.py
Outdated
def unscale_(self, optimizer: Optimizer) -> bool: | ||
assert isinstance(optimizer, DecentralizedOptimizerBase) | ||
def unscale_(self, optimizer: TorchOptimizer) -> bool: | ||
assert hasattr(optimizer, "opt"), "hivemind.GradScaler only supports hivemind optimizer wrappers" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if there's a non-hivemind wrapper of TorchOptimizer? A more explicit check would use isinstance
IMO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[done]
Codecov Report
@@ Coverage Diff @@
## master #413 +/- ##
==========================================
- Coverage 84.61% 84.27% -0.35%
==========================================
Files 76 76
Lines 7286 7287 +1
==========================================
- Hits 6165 6141 -24
- Misses 1121 1146 +25
|