Releases · learning-at-home/hivemind

20 Jun 19:02

mryab

1.1.0

e96f414

1.1.0: libp2p support in hivemind.moe, PowerSGD training, persistent P2P identity

Release highlights

Starting from this release, all components of hivemind.moe use libp2p for communication. This comes with the same benefits as in averaging and DHT previously (simplified NAT traversal, better performance, etc.) and marks the end of gRPC usage in hivemind. The user API is mostly the same: if you were using abstractions like RemoteMixtureOfExperts, the code should not be changed, although cross-release training is not possible.
If you need another way to reduce the network footprint during training with hivemind.Optimizer, you can now use PowerSGD for gradient averaging. This method decreases the communication costs by factorizing the gradients of the model and aggregating the factorized versions. To enable this method in your code, pass grad_averager_factory=partial(PowerSGDGradientAverager, averager_rank=RANK) when creating an instance of Optimizer. Here, RANK denotes the factorization rank; lower values give higher compression at the cost of the reconstruction quality.
Similarly to hivemind-server, it is now possible to launch a dedicated DHT instance with a command-line tool. The tool, available via hivemind-dht, can be used to quickly create a lightweight peer that is used mostly for connecting others to the DHT (for example, on a publicly available server) or for DHT metadata replication.
Previously, restarting a libp2p instance required generating a new P2P identity, which resulted in a new multiaddress. Thus, it was difficult to use the same command to connect to a peer in case of repeated launches, which is often the case during debugging. Now, you can store the persistent peer identity of a peer in a file and reuse it between launches: this is done by specifying the --identity_path argument, available both in the ALBERT example and CLI tools of hivemind.

Deprecations

The parameters quic, use_relay_hop, and use_relay_discovery of hivemind.P2P are deprecated since our update of the libp2p dependency in the p2p daemon. They will be removed in the 1.2.0 release of hivemind

What's Changed

Pin pytest version in requirements-dev, use file_descriptor in tests by @justheuristic in #454
Pin isort version, bump black by @mryab in #456
Clean compression/init.py by @borzunov in #460
Do not use offload_optimizer with local_updates by deafult by @foksly in #462
Add PowerSGD for compressed gradient averaging by @artek0chumak in #432
Bump Black to 22.3.0, pin Golang version by @mryab in #466
use_local_updates in optimizer by @justheuristic in #468
Update p2pd to v0.3.8 (and libp2p to v0.17.0) by @borzunov in #469
Generate new private key if identity file doesn't exist by @borzunov in #473
Convert hivemind.server to libp2p backend by @GreenFatGuy in #470
Implement a CLI for hivemind.DHT by @mryab in #465
Use PeerID exclusively to address MoE experts by @justheuristic in #479
Remove deprecated code in hivemind.optim and hivemind.averaging before the 1.1.0 release by @mryab in #480
Fix shape validation in GradientAverager by @mryab in #481
Change expiration time in declare_experts, fix update_period discrepancy by @justheuristic in #482
Add identity_path option for MoE.Server runners by @GreenFatGuy in #484
Simplify ExpertBackend interface by @justheuristic in #483
Clean up imports, remove unused utils by @mryab in #486
finish renaming experts -> module_backends in ConnectionHandler by @justheuristic in #487
Remove gRPC services and grpcio requirement by @mryab in #485

New Contributors

@GreenFatGuy made their first contribution in #470

Full Changelog: 1.0.1...1.1.0

Contributors

justheuristic, artek0chumak, and 4 other contributors

Assets 2

07 Feb 10:06

mryab

1.0.1

e61faac

1.0.1: Patch release

What's Changed

Improve user-friendliness and fix misc errors in Optimizer, Averager and P2P by @justheuristic @pr-Mais @borzunov @mrseeker @mryab in #428
Skip gradient averaging if there are no other peers by @justheuristic @soodoshll @borzunov in #440
Move hivemind.Server from init, streamline imports by @mryab in #441
Change make_empty to make_zeros for TensorDescriptor by @mryab in #442
Fix offloaded optimizer with single peer by @justheuristic @elricwan @borzunov in #450
Fix "too many open files" issue by @yhn112 in #444

Full Changelog: 1.0.0...1.0.1

Contributors

mrseeker, soodoshll, and 6 other contributors

Assets 2

20 Dec 13:17

mryab

1.0.0

b150768

1.0.0: hivemind.Optimizer, improved averaging stability, better logging

What's Changed

Fix averager speed for TCP connections by @borzunov in #373
Fix "Too many open files" and load state freezing by @justheuristic in #371
Prefetch while reading rpc_aggregate_part() outputs by @borzunov in #370
Use ModeClient in libp2p DHT in case of --client_mode by @borzunov in #374
Integrate p2pd logs and outputs into hivemind logging by @borzunov in #375
Split compression strategies into separate classes by @justheuristic in #366
Implement colored logs by @borzunov in #377
Parametrize max message size for persistent connections by @deniskamazur in #376
Make log handlers configurable, shorten entries by @borzunov in #378
Enable log handler in benchmarks and run_server by @borzunov in #380
Fix step_tolerance in CollaborativeOptimizer by @justheuristic in #383
Fix pickle vulnerability by @deniskamazur in #386
Remove arguments with default values from example instructions by @borzunov in #388
Implement weight as part of the allreduce protocol, not matchmaking by @justheuristic in #384
Support different AMP & buffer configurations in one experiment, fix minor bugs by @justheuristic in #389
Fix codecov_in_develop_mode with pip>=21.2 by @justheuristic in #393
Fix minor issues in documentation by @borzunov in #392
Apply averager updates asynchronously by @justheuristic in #395
Fix schema typing by @justheuristic in #396
backport PerformanceEMA from server_side_averaging by @justheuristic in #397
Add an option to pre-schedule averaging by @justheuristic in #398
Move DHT to dht/dht.py, update DHT figure by @justheuristic in #399
[hotfix] replace StepControl.can_modify with began_allreduce by @justheuristic in #402
move PerformanceEMA to utils, TrainingAverager to optim, update utils by @justheuristic in #405
Add GradientAverager with support for delayed averaging by @justheuristic in #404
[hivemind.Optimizer] TrainingStateAverager by @justheuristic in #407
Catch OSError in MPFuture by @artek0chumak in #409
[hivemind.Optimizer] ProgressTracker by @justheuristic in #408
Fix minor bugs in GradientAverager by @justheuristic in #410
Make target group size optional by @justheuristic in #412
Prepare GradScaler for hivemind.Optimizer by @justheuristic in #413
Patch recursive cancel in StepControl by @justheuristic in #411
Replace the invalid link to discord by @artek0chumak in #414
Implement state sharing priority by @justheuristic in #415
Implement core functionality of hivemind.Optimizer by @justheuristic in #403
DHT Benchmark with asynchronous w/r by @MuXauJl11110 in #406
Hotfix: load_state_from_peers with offload_optimizer by @justheuristic in #417
Improve Optimizer docs, update quickstart to use Optimizer by @justheuristic in #416
Quickstart: typos and references by @justheuristic in #420
Remove trailing dots in log messages and errors by @borzunov in #419
Do not log caller for INFO messages by @borzunov in #418
Improve hivemind.optim.experimental and averager stability by @borzunov in #421
Add minor tweaks learned from the NeurIPS demo run by @justheuristic in #422
Improve All-Reduce fault-tolerance by @justheuristic in #423
Fix All-Reduce fault-tolerance: catch Exception instead of BaseException by @justheuristic in #424
Fix Task was destroeyd but is pending (put items) by @justheuristic in #427
Use hivemind.Optimizer in examples/albert by @mryab in #426

New Contributors

@artek0chumak made their first contribution in #409
@MuXauJl11110 made their first contribution in #406

Full Changelog: 0.10.0...1.0.0

Contributors

justheuristic, artek0chumak, and 4 other contributors

Assets 2

26 Aug 08:46

mryab

0.10.0

5bc95c1

0.10.0: libp2p-based averaging, improved P2P daemon performance

This release contains the following new features and bugfixes:

Fix deadlocks in DecentralizedAverager and MPFuture (#331) (@borzunov @justheuristic)
Resolve deadlock in MPFuture (#337) (@justheuristic @borzunov @yhn112)
Convert averager to libp2p backend (#323) (@borzunov @mryab)
Refactor naming and serialization for PeerIDs (#339) (@borzunov)
Set default DHT num_workers = 4 (#342) (@borzunov @deniskamazur @justheuristic @mryab)
Fix typo in dht.md (#345) (@justheuristic)
Fix some warnings related to asyncio (#346) (@borzunov)
Speed up P2P client creation (#343) (@deniskamazur @borzunov)
Propagate startup errors from DHT and averager processes (#347) (@borzunov)
Add less comparator for PeerID (#353) (@deniskamazur @borzunov)
Fix minor asyncio issues in averager (#356) (@borzunov @justheuristic)
Optimize unary handlers with persistent connections to P2P daemon (#328) (@deniskamazur)
Fix import error breaking AllReduceRunner._send_error_to_peer() (#360) (@borzunov)
Fix logger warning in P2P (#361) (@borzunov)
Disable QUIC (#355) (@borzunov)
Disable elasticity for averaging, add error handling (#362) (@justheuristic @mryab)
Improve Matchmaking finalizers (#357) (@borzunov)
Allow to specify P2P identity file (#363) (@borzunov)
Fix loglevel for a message in _read_from_persistent_conn() (#364) (@borzunov)

Contributors

yhn112, justheuristic, and 3 other contributors

Assets 2

16 Jul 12:24

mryab

0.9.10

4a006ae

0.9.10: libp2p-based DHT, per-tensor compression, better documentation

This release contains the following features and bugfixes:

Add p2pd to package_data (#287) (@mryab)
Add per-tensor compression, make All-Reduce faster and more flexible (#272) (@justheuristic @mponty @mryab @yhn112 @borzunov)
Fix race condition while reserving ports in P2P (#299) (@borzunov)
Add graceful shutdown to DHT and Averager (#301) (@justheuristic @mryab)
Make checkpointing optional in example (#303) (@yhn112)
Refactor MPFuture to use a single pipe/thread per process (#298) (@justheuristic @borzunov @mryab @yhn112)
Split hivemind.client into hivemind.averaging and hivemind.moe (#304) (@mryab)
Update readthedocs with hivemind.optim (#288) (@yhn112 @justheuristic)
Minor fixes in examples/albert (#308) (@yhn112)
Upload the model with push_to_hub in examples (#297) (@leshanbog @mryab @justheuristic)
Account for multi-gpu devices in examples/albert (#309) (@justheuristic)
Convert DHT to libp2p backend (#296) (@borzunov @skobellev)
Simplify argument parsing, update docs in ALBERT example (#315) (@mryab @justheuristic @yhn112)
Improve P2P handler throughput and interface (#316) (@borzunov)
Remove shared memory from MPFuture, fix minor bugs (#317) (@justheuristic @borzunov @mryab)
Implement protobuf-based stream handlers over libp2p backend (#318) (@borzunov)
Refactor for v0.9.10 and fix example (#319) (@justheuristic @borzunov)
Update quickstart tutorials and acknowledgements (#307) (@justheuristic @yhn112 @borzunov @mryab)

Assets 2

22 Jun 21:31

mryab

0.9.9

86f3c0d

0.9.9: Improved libp2p installation, auxiliary peers support, logging in benchmarks

This release contains the following improvements and bugfixes:

Add relay options to P2P (#268) (@deniskamazur)
Add packaging to requirements (#269) (@deniskamazur)
Disable p2pd compilation by default (#270) (@yhn112 @justheuristic)
Measure testing coverage on pull request (#271) (@yhn112)
Update p2pd md5 checksum (#273) (@deniskamazur)
Use logging in benchmarks, fix libp2p-related issues (#280) (@justheuristic)
Add BibTeX reference for the library to README (#283) (@mryab)
Fix Codecov (#282) (@yhn112)
Remove use of packaging module (#284) (@borzunov)
Support auxiliary peers in CollaborativeOptimizer (#279) (@yhn112 @justheuristic @mryab)

Assets 2

07 Jun 18:26

mryab

0.9.8

aea7a38

0.9.8: Initial libp2p support, improved DHT protection, better examples

This release contains the following improvements and bugfixes:

Implement combining validators (#249) (@borzunov)
Decentralized adaptive optimizers (#243) (@nevec)
Add nltk to ALBERT example's requirements (#251) (@borzunov)
Protect training progress and metrics with signatures and DHT schema validation (#250) (@borzunov)
Add state checkpointing and uploading in coordinator (#241) (@leshanbog @mryab)
Fix random freezes in averager.step, improve error handling (#254) (@justheuristic @yhn112 @borzunov @mryab)
Fix device in Switch-MoE, overhaul Server architecture (#256) (@mryab)
Log more stats for user, move performance stats to examples (#257) (@yhn112)
Implement authorization for a moderated Hivemind network (#255) (@borzunov)
Improve error handling, remove deprecated functionality (#261) (@justheuristic @mryab)
Log correct loss in examples/albert/run_first_peer.py (#265) (@borzunov)
Fixed nan when compressing the tensor of zeros (#266) (@Vsevolod-pl)
Support auxiliary participants in AllReduceProtocol (#260) (@foksly)
Log collaboration step to Wandb, store metrics only if peer is synchronized (#267) (@borzunov @yhn112 @justheuristic)
Add initial support for connecting via libp2p (#238) (@MaximKsh @deniskamazur @skobellev @leshanbog @borzunov @mryab @yhn112)

Assets 2

27 Apr 13:57

mryab

0.9.7

0a1fdb1

0.9.7: Improved security, Switch-like MoE, ALBERT example

This release contains the following improvements and bugfixes:

Add RSA signature protection for DHT records (#187) (@borzunov)
Improve Runtime exception handling (#207) (@mryab)
Implement basic decentralized optimizers (#210) (@justheuristic, @mryab)
Add gradient clipping support to ExpertBackend (#214) (@mryab)
Convert SerializerBase to an abstract class (#212) (@mryab)
Replace FeedforwardBlock with a correct implementation (#211) (@mryab)
Disentangle DecentralizedAverager components, add averaging weights (#217) (@justheuristic @mryab)
Add CollaborativeOptimizer, TrainingAverager (#215) (@leshanbog @nevec @mryab)
Move compression-related code to hivemind.utils.compression (#213) (@mryab)
Prevent DecentralizedSGD from accidentally skipping a fraction of training batches (#218) (@ploshkin)
Add uniform compression (#202) (@mponty)
Add gradient buffers to CollaborativeOptimizer (#220) (@justheuristic)
Improve zero_grad behavior in CollaborativeOptimizer (#221) (@justheuristic)
Reset gradient buffers when synchronizing with peers (#222) (@justheuristic)
Add tool for custom user experts (#189) (@romakail @justheuristic)
Delta gradients transmission (#225) (@Vsevolod-pl)
Statistics averaging (#229) (@nevec)
Ensure version-consistent result rounding in load_balance_peers (#230) (@justheuristic @mryab)
Add Switch Transformers-like RemoteMixtureOfExperts (#228) (@mryab)
Add example for collaborative ALBERT training (#226) (@leshanbog @yhn112 @nevec @mryab)
Fix loss metric calculation (#240) (@yhn112)
Add DHT schema validator (#227) (@borzunov)
Fix server hanging in certain cases when connection is lost (#247) (@justheuristic)
Add Dockerfile, refactor tests (#245) (@mryab)
Fix incorrect data types/values in RemoteSwitchMixtureOfExperts (#246) (@mryab)

Assets 2

02 Apr 18:12

mryab

0.9.6

3024d38

0.9.6: Patch release

This release adds several new features:

Client-only averaging in AllReduce (#176)
Expert learning rate scheduling (#196)
Quantile compression (#182)

Also, this release contains the following fixes and improvements:

Fix scalar deserialization (#190)
Extract expert-specific methods from DHT (#192)

Assets 2

05 Mar 12:56

mryab

0.9.5

bbcbd78

0.9.5: Patch release

This release fixes several known bugs and security vulnerabilities:

Copytree implementation for py37 compatibility (#162)
Remove pickle.loads in Averager (#160)
Support edge cases for DHT key/subkey/value (#167)
Fix the remaining tests for py37 (#166)
Move Averager metadata serialization out of user scope (#168)
Handle edge cases in DecentralizedAverager (#171)
Fix a typo in quickstart.md (#174)
Serialize DHTID source with msgpack (#172)
Move CLI server launch script to hivemind/hivemind_cli (#173)

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release highlights

Deprecations

What's Changed

New Contributors

Contributors

What's Changed

Contributors

What's Changed

New Contributors

Contributors

Contributors

Releases: learning-at-home/hivemind

1.1.0: libp2p support in hivemind.moe, PowerSGD training, persistent P2P identity

Release highlights

Deprecations

What's Changed

New Contributors

Contributors

1.0.1: Patch release

What's Changed

Contributors

1.0.0: hivemind.Optimizer, improved averaging stability, better logging

What's Changed

New Contributors

Contributors

0.10.0: libp2p-based averaging, improved P2P daemon performance

Contributors

0.9.10: libp2p-based DHT, per-tensor compression, better documentation

0.9.9: Improved libp2p installation, auxiliary peers support, logging in benchmarks

0.9.8: Initial libp2p support, improved DHT protection, better examples

0.9.7: Improved security, Switch-like MoE, ALBERT example

0.9.6: Patch release

0.9.5: Patch release