Fix offloaded optimizer with single peer #450

justheuristic · 2022-01-19T10:23:56Z

The bug was originally found by @elricwan and @finger92 (seemingly independently) and reported in #447

Here's what was caused the issue:

I investigated what went wrong in when training with only one trainer. Currently, hivemind.Optimizer is hard-wired to use the averaged gradients -- as in "averaged with peers".

If you are the only peer, gradients are not averaged, so optimizer runs with zero gradients all the time. This change should fix the problem in your specific case: 4ffd9ca I have seemingly introduced that bug myself in #440 . It only affects the github version of hivemind (i.e. not the pypi version)

The bug was introduced in #440 and affects the following setups (all 3 must be true):

installed hivemind from github
there is only one training peer in the swarm
offload_optimizer is True

Here's the behavior after the fix is introduced:

codecov · 2022-01-19T10:31:57Z

Codecov Report

Merging #450 (d50dedd) into master (8aa798d) will increase coverage by 0.37%.
The diff coverage is 85.71%.

@@            Coverage Diff             @@
##           master     #450      +/-   ##
==========================================
+ Coverage   83.72%   84.10%   +0.37%     
==========================================
  Files          78       78              
  Lines        7928     7931       +3     
==========================================
+ Hits         6638     6670      +32     
+ Misses       1290     1261      -29

Impacted Files	Coverage Δ
hivemind/optim/optimizer.py	`62.28% <85.71%> (+2.05%)`	⬆️
hivemind/optim/progress_tracker.py	`97.80% <0.00%> (-1.10%)`	⬇️
hivemind/averaging/averager.py	`87.65% <0.00%> (+0.72%)`	⬆️
hivemind/utils/asyncio.py	`100.00% <0.00%> (+0.86%)`	⬆️
hivemind/optim/grad_averager.py	`93.81% <0.00%> (+1.03%)`	⬆️
hivemind/dht/node.py	`92.63% <0.00%> (+1.18%)`	⬆️
hivemind/averaging/matchmaking.py	`88.69% <0.00%> (+4.46%)`	⬆️

borzunov · 2022-01-19T10:41:05Z

hivemind/optim/optimizer.py

@@ -618,6 +618,12 @@ def _load_averaged_gradients_into_optimizer_(self):

        self.grad_averager.notify_used_averaged_gradients()

+    def _load_local_gradients_into_optimizer(self):
+        """Fallback to using local gradients in the optimizer (instead of averaged gradients)"""
+        logger.log(self.status_loglevel, f"Proceeding with local gradients")


Please add a comment that this can be optimized in case of one peer (if we'd ever need to optimize this case).

just did it

…r-fix

justheuristic added 2 commits January 19, 2022 09:01

Update optimizer.py

4ffd9ca

a more general solution

5ce829d

unscale correctly

4b713ce

borzunov reviewed Jan 19, 2022

View reviewed changes

borzunov approved these changes Jan 19, 2022

View reviewed changes

borzunov and others added 5 commits January 19, 2022 13:42

explain the rationale for the change

cd0c34b

Merge branch 'master' into single-peer-fix

4b34503

Credit bug reporter

21c4b72

Credit bug reporter

a56a370

Merge remote-tracking branch 'origin/single-peer-fix' into single-pee…

d50dedd

…r-fix

justheuristic merged commit a974b55 into master Jan 19, 2022

justheuristic deleted the single-peer-fix branch January 19, 2022 19:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix offloaded optimizer with single peer #450

Fix offloaded optimizer with single peer #450

justheuristic commented Jan 19, 2022

codecov bot commented Jan 19, 2022 •

edited

Loading

borzunov Jan 19, 2022

justheuristic Jan 19, 2022

Fix offloaded optimizer with single peer #450

Fix offloaded optimizer with single peer #450

Conversation

justheuristic commented Jan 19, 2022

codecov bot commented Jan 19, 2022 • edited Loading

Codecov Report

borzunov Jan 19, 2022

Choose a reason for hiding this comment

justheuristic Jan 19, 2022

Choose a reason for hiding this comment

codecov bot commented Jan 19, 2022 •

edited

Loading