0.9.8: Initial libp2p support, improved DHT protection, better examples
This release contains the following improvements and bugfixes:
- Implement combining validators (#249) (@borzunov)
- Decentralized adaptive optimizers (#243) (@nevec)
- Add nltk to ALBERT example's requirements (#251) (@borzunov)
- Protect training progress and metrics with signatures and DHT schema validation (#250) (@borzunov)
- Add state checkpointing and uploading in coordinator (#241) (@leshanbog @mryab)
- Fix random freezes in averager.step, improve error handling (#254) (@justheuristic @yhn112 @borzunov @mryab)
- Fix device in Switch-MoE, overhaul Server architecture (#256) (@mryab)
- Log more stats for user, move performance stats to examples (#257) (@yhn112)
- Implement authorization for a moderated Hivemind network (#255) (@borzunov)
- Improve error handling, remove deprecated functionality (#261) (@justheuristic @mryab)
- Log correct loss in examples/albert/run_first_peer.py (#265) (@borzunov)
- Fixed nan when compressing the tensor of zeros (#266) (@Vsevolod-pl)
- Support auxiliary participants in AllReduceProtocol (#260) (@foksly)
- Log collaboration step to Wandb, store metrics only if peer is synchronized (#267) (@borzunov @yhn112 @justheuristic)
- Add initial support for connecting via libp2p (#238) (@MaximKsh @deniskamazur @skobellev @leshanbog @borzunov @mryab @yhn112)