Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement CenteredClip in averager #379

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from
Draft

Implement CenteredClip in averager #379

wants to merge 2 commits into from

Conversation

borzunov
Copy link
Member

@borzunov borzunov commented Sep 8, 2021

  • (important) Reducers should work concurrently
  • (important) Test with .isfinite(x), test that no other tensor values may corrupt CClip
  • Rename to MeanReducer, CenteredClipReducer (btw, why not *Aggregator?)
  • run_in_executor for reducers
  • nit: Rename Factory to smth else, make_reducer: Callable[...]
  • n_steps -> max_steps

@codecov
Copy link

codecov bot commented Sep 8, 2021

Codecov Report

Merging #379 (07d8f5f) into master (b84f62b) will decrease coverage by 0.30%.
The diff coverage is 53.62%.

❗ Current head 07d8f5f differs from pull request most recent head e9f3288. Consider uploading reports for the commit e9f3288 to get more accurate results

@@            Coverage Diff             @@
##           master     #379      +/-   ##
==========================================
- Coverage   83.98%   83.68%   -0.31%     
==========================================
  Files          70       71       +1     
  Lines        6383     6443      +60     
==========================================
+ Hits         5361     5392      +31     
- Misses       1022     1051      +29     
Impacted Files Coverage Δ
hivemind/averaging/accumulators.py 45.76% <45.76%> (ø)
hivemind/averaging/allreduce.py 77.56% <100.00%> (+0.14%) ⬆️
hivemind/averaging/averager.py 86.27% <100.00%> (+0.03%) ⬆️
hivemind/averaging/partition.py 98.03% <100.00%> (-0.02%) ⬇️
hivemind/averaging/matchmaking.py 83.75% <0.00%> (-0.63%) ⬇️
hivemind/dht/node.py 92.63% <0.00%> (+1.18%) ⬆️

weights: torch.Tensor,
tau: float = 1.0,
n_iters: int = 20,
stop_delta: Optional[float] = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

preference: let's default to some reasonable delta and a very large n steps

@justheuristic
Copy link
Member

Random thoughts:

  • gotta ensure that we do not fail even if user sends nan/inf/near_inf values (both with compression and with raw values)
  • would prefer to set delta to 1e-6 and increase n_iters dramatically (e.g. to 50) to make runtime adaptive
  • let's print some warnings in case your values were more than e.g. 3 tau outside the range

@borzunov borzunov added the security Security issues or improvements label Dec 30, 2021
prev_diff = result

# We only need to update `diff` (not `result`) between iterations
diff.addmm_(-coeffs.repeat(n_peers, 1), diff)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
diff.addmm_(-coeffs.repeat(n_peers, 1), diff)
diff -= coeffs @ diff

It seems like addmm_() doesn't work correctly if the destination is equal to one of the operands.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
averaging security Security issues or improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants