Prepare GradScaler for hivemind.Optimizer #413

justheuristic · 2021-11-18T16:14:02Z

Modified hivemind.GradScaler to make it compatible with hivemind.Optimizer (backwards-compatible)
Changed TrainingStateAverager to be compatible with hivemind.GradScaler
Made TrainingStateAverager.main_parameters and parameter_names public for use in optimizer

justheuristic · 2021-11-18T16:17:46Z

hivemind/optim/experimental/state_averager.py

@@ -100,7 +100,7 @@ def __init__(
        self.offload_optimizer = offload_optimizer
        self.custom_gradients = custom_gradients

-        self._main_parameters, self._parameter_names = main_parameters, parameter_names
+        self.main_parameters, self.parameter_names = main_parameters, parameter_names


made them public because these params are needed in hivemind.Optimizer.step

... and they are no more private than, for instance, opt_keys_for_averaging

mryab · 2021-11-18T16:18:55Z

hivemind/optim/experimental/state_averager.py

@@ -378,20 +378,31 @@ def step(
                    self.finished_optimizer_step.clear()
        return output

-    def _do(self, optimizer_step: bool, zero_grad: bool, averaging_round: bool, **kwargs):
+    def _do(self, optimizer_step: bool, zero_grad: bool, averaging_round: bool, grad_scaler: Optional[GradScaler], **kwargs):


_step_inner?

Let's discuss this with @borzunov on the main PR

hivemind/optim/experimental/state_averager.py

mryab · 2021-11-18T16:22:12Z

hivemind/optim/grad_scaler.py

-    def unscale_(self, optimizer: Optimizer) -> bool:
-        assert isinstance(optimizer, DecentralizedOptimizerBase)
+    def unscale_(self, optimizer: TorchOptimizer) -> bool:
+        assert hasattr(optimizer, "opt"), "hivemind.GradScaler only supports hivemind optimizer wrappers"


What if there's a non-hivemind wrapper of TorchOptimizer? A more explicit check would use isinstance IMO

Co-authored-by: Max Ryabinin <[email protected]>

…r_fix

codecov · 2021-11-18T16:29:50Z

Codecov Report

Merging #413 (9b7db56) into master (22665fd) will decrease coverage by 0.34%.
The diff coverage is 50.00%.

❗ Current head 9b7db56 differs from pull request most recent head b76b710. Consider uploading reports for the commit b76b710 to get more accurate results

@@            Coverage Diff             @@
##           master     #413      +/-   ##
==========================================
- Coverage   84.61%   84.27%   -0.35%     
==========================================
  Files          76       76              
  Lines        7286     7287       +1     
==========================================
- Hits         6165     6141      -24     
- Misses       1121     1146      +25

Impacted Files	Coverage Δ
hivemind/optim/collaborative.py	`23.80% <0.00%> (ø)`
hivemind/optim/grad_scaler.py	`34.54% <58.33%> (+1.21%)`	⬆️
hivemind/averaging/matchmaking.py	`78.11% <0.00%> (-6.08%)`	⬇️
hivemind/dht/node.py	`91.44% <0.00%> (-1.19%)`	⬇️

justheuristic added 3 commits November 18, 2021 19:12

prepare GradScaler for hivemind.Optimizer

9b7db56

support GradScaler in state_averager.py

db53507

black-isort

d667541

justheuristic commented Nov 18, 2021

View reviewed changes

mryab reviewed Nov 18, 2021

View reviewed changes

justheuristic and others added 6 commits November 18, 2021 19:22

black

d806748

Update hivemind/optim/experimental/state_averager.py

50cc792

Co-authored-by: Max Ryabinin <[email protected]>

review

c62370b

Merge remote-tracking branch 'origin/grad_scaler_fix' into grad_scale…

17a025c

…r_fix

review

24fabe1

black-isort

f1da4ef

mryab approved these changes Nov 18, 2021

View reviewed changes

mryab changed the title ~~prepare GradScaler for hivemind.Optimizer~~ Prepare GradScaler for hivemind.Optimizer Nov 18, 2021

black

b76b710

justheuristic merged commit 09e34f8 into master Nov 18, 2021

justheuristic deleted the grad_scaler_fix branch November 18, 2021 16:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prepare GradScaler for hivemind.Optimizer #413

Prepare GradScaler for hivemind.Optimizer #413

justheuristic commented Nov 18, 2021 •

edited

Loading

justheuristic Nov 18, 2021

mryab Nov 18, 2021

justheuristic Nov 18, 2021

mryab Nov 18, 2021

justheuristic Nov 18, 2021

codecov bot commented Nov 18, 2021 •

edited

Loading

Prepare GradScaler for hivemind.Optimizer #413

Prepare GradScaler for hivemind.Optimizer #413

Conversation

justheuristic commented Nov 18, 2021 • edited Loading

justheuristic Nov 18, 2021

Choose a reason for hiding this comment

mryab Nov 18, 2021

Choose a reason for hiding this comment

justheuristic Nov 18, 2021

Choose a reason for hiding this comment

mryab Nov 18, 2021

Choose a reason for hiding this comment

justheuristic Nov 18, 2021

Choose a reason for hiding this comment

codecov bot commented Nov 18, 2021 • edited Loading

Codecov Report

justheuristic commented Nov 18, 2021 •

edited

Loading

codecov bot commented Nov 18, 2021 •

edited

Loading