Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prepare GradScaler for hivemind.Optimizer #413

Merged
merged 10 commits into from
Nov 18, 2021
Merged

Conversation

justheuristic
Copy link
Member

@justheuristic justheuristic commented Nov 18, 2021

  • Modified hivemind.GradScaler to make it compatible with hivemind.Optimizer (backwards-compatible)
  • Changed TrainingStateAverager to be compatible with hivemind.GradScaler
  • Made TrainingStateAverager.main_parameters and parameter_names public for use in optimizer

@@ -100,7 +100,7 @@ def __init__(
self.offload_optimizer = offload_optimizer
self.custom_gradients = custom_gradients

self._main_parameters, self._parameter_names = main_parameters, parameter_names
self.main_parameters, self.parameter_names = main_parameters, parameter_names
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

made them public because these params are needed in hivemind.Optimizer.step

... and they are no more private than, for instance, opt_keys_for_averaging

@@ -378,20 +378,31 @@ def step(
self.finished_optimizer_step.clear()
return output

def _do(self, optimizer_step: bool, zero_grad: bool, averaging_round: bool, **kwargs):
def _do(self, optimizer_step: bool, zero_grad: bool, averaging_round: bool, grad_scaler: Optional[GradScaler], **kwargs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_step_inner?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's discuss this with @borzunov on the main PR

hivemind/optim/experimental/state_averager.py Outdated Show resolved Hide resolved
def unscale_(self, optimizer: Optimizer) -> bool:
assert isinstance(optimizer, DecentralizedOptimizerBase)
def unscale_(self, optimizer: TorchOptimizer) -> bool:
assert hasattr(optimizer, "opt"), "hivemind.GradScaler only supports hivemind optimizer wrappers"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if there's a non-hivemind wrapper of TorchOptimizer? A more explicit check would use isinstance IMO

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[done]

@mryab mryab changed the title prepare GradScaler for hivemind.Optimizer Prepare GradScaler for hivemind.Optimizer Nov 18, 2021
@codecov
Copy link

codecov bot commented Nov 18, 2021

Codecov Report

Merging #413 (9b7db56) into master (22665fd) will decrease coverage by 0.34%.
The diff coverage is 50.00%.

❗ Current head 9b7db56 differs from pull request most recent head b76b710. Consider uploading reports for the commit b76b710 to get more accurate results

@@            Coverage Diff             @@
##           master     #413      +/-   ##
==========================================
- Coverage   84.61%   84.27%   -0.35%     
==========================================
  Files          76       76              
  Lines        7286     7287       +1     
==========================================
- Hits         6165     6141      -24     
- Misses       1121     1146      +25     
Impacted Files Coverage Δ
hivemind/optim/collaborative.py 23.80% <0.00%> (ø)
hivemind/optim/grad_scaler.py 34.54% <58.33%> (+1.21%) ⬆️
hivemind/averaging/matchmaking.py 78.11% <0.00%> (-6.08%) ⬇️
hivemind/dht/node.py 91.44% <0.00%> (-1.19%) ⬇️

@justheuristic justheuristic merged commit 09e34f8 into master Nov 18, 2021
@justheuristic justheuristic deleted the grad_scaler_fix branch November 18, 2021 16:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants