[Feature Request] MoE enhancements #478

GreenFatGuy · 2022-06-04T19:29:08Z

The nature of this issue

During #470 review there was a list of thing that were not crucial for the PR but ideally they should be done. Find problems description bellow.
This issue is quite general and contains all problems found during PR.

Use only `PeerID`

The idea of p2p is that PeerID is enough to communicate with another daemon, and Multiaddr is needed only to start a new node. Thus we should decrease usage of Multiaddr where it is possible.

Move cpu-bound things inside separate executor

There are some places in code (for example forward/backward for moe.client.expert) where cpu-bound things, such as serialization/deserialization take place inside async task. In order to increase efficiency they are better to be moved inside thread executor

Check inputs on server side

Currently hivemind.Server does not check that inputs are correct. If user sends malformed inputs, it may OOM the server. We should check for that in some future PR. See #3

Sending empty input causes exception

If clients sends tensor of shape [0, ...] (empty tensor), then it will be split into zero messages and uid will not be passed. Server will receive uid=None and fail with cryptic KeyError(None). We should either forbid this on client side or ensure that zero-element tensors are serialized into a stream with first empty message.

MoE operates only with lists of tensors

The code expects inputs/ouputs to be Iterable[torch.Tensor], however it can have more complex structure, such as dict with meta information.

Test load balancing for unary handlers on python side

Load balancing is tested inside libp2p-daemon itself and also we have some tests covering stream handlers. However there is zero tests on load balancing of unary handlers on hivemind side.

Remove gRPC-specific Python file compilation

Since gRPC-based communication is no longer present in hivemind, we can remove the corresponding compilation commands from setup.py

Add `--identity_path` to `run_server.py`

Similarly to examples/albert, it would be great to have an option to fix the libp2p address of the server.

TODO List:

Use only PeerID where it possible
Move cpu-bound things inside separate executor
Check inputs on server side
Sending empty input causes exception
MoE operates only with lists of tensors
Test load balancing for unary handlers on python side
Remove gRPC-specific Python file compilation
Add --identity_path to run_server.py
make PeerID and ExpertData msgpack-serializable

The text was updated successfully, but these errors were encountered:

justheuristic · 2022-06-23T18:02:53Z

Remove gRPC-specific Python file compilation

fixed in #485

GreenFatGuy added enhancement New feature or request help wanted Extra attention is needed server p2p Everything related to the libp2p-daemon. labels Jun 4, 2022

mryab changed the title ~~[Feature Request] MoE enhancments~~ [Feature Request] MoE enhancements Jun 4, 2022

borzunov mentioned this issue Jun 9, 2022

Convert hivemind.Server/RemoteModuleCall/RemoteCallMany to libp2p backend #242

Closed

4 tasks

GreenFatGuy mentioned this issue Jun 14, 2022

Add identity_path option for MoE.Server runners #484

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] MoE enhancements #478

[Feature Request] MoE enhancements #478

GreenFatGuy commented Jun 4, 2022 •

edited by justheuristic

Loading

justheuristic commented Jun 23, 2022 •

edited

Loading

[Feature Request] MoE enhancements #478

[Feature Request] MoE enhancements #478

Comments

GreenFatGuy commented Jun 4, 2022 • edited by justheuristic Loading

The nature of this issue

Use only PeerID

Move cpu-bound things inside separate executor

Check inputs on server side

Sending empty input causes exception

MoE operates only with lists of tensors

Test load balancing for unary handlers on python side

Remove gRPC-specific Python file compilation

Add --identity_path to run_server.py

justheuristic commented Jun 23, 2022 • edited Loading

GreenFatGuy commented Jun 4, 2022 •

edited by justheuristic

Loading

Use only `PeerID`

Add `--identity_path` to `run_server.py`

justheuristic commented Jun 23, 2022 •

edited

Loading