Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: race in NetworkRecepient #6828

Merged
merged 5 commits into from
May 25, 2022

Conversation

matklad
Copy link
Contributor

@matklad matklad commented May 18, 2022

closes #6826

We have circular depedencies between PeerManager and Client, which we
break via dyn PeerManagerAdapter. As this is a real cycle, we close it
at runtime by calling set_recipient. But today nothing prevents client
from sending message to network before that, and indeed that happens if
we are unlucky with timing.

This commit fixes the issue by making PeerManagerAdapter APIs block
until the loop is closed.


After finishing the implementation, I realized that this might not be the ideal solution. It seems that actix has the API (Actor::create) for tying such loop nicely, where you can get actor's address before creating the actor. We use it, for example, here:

let peer_manager = PeerManagerActor::create(move |ctx| {
let mut client_config = ClientConfig::test(false, 100, 200, num_validators, false, true);
client_config.archive = config.archive;
client_config.ttl_account_id_router = config.ttl_account_id_router;
let network_adapter = NetworkRecipient::default();
network_adapter.set_recipient(ctx.address().recipient());
let network_adapter = Arc::new(network_adapter);

So, in theory, we can just impl PeerManagerAdapter for Recipient<PeerManagerMessageRequest> and then rewrite all call-sites to use create and friends.

Upon closer examination, it seems that that is possible, but not trivial, as create API is not orthogonal enough.

  • we use Actor::start_in_arbiter and Actor::mock, which don't have create-like alternatives. We can open-code them, but that's adding more to the framework
  • ::create itself can't return something else, so that means we'd have to use .unwraps

So I think it's better to maybe do just what this PR is doing and just fix the current code.

Test Plan

Verify that, after applying the following "unfortunate timing" diff

diff --git a/tools/mock_node/src/setup.rs b/tools/mock_node/src/setup.rs
index 30940262b..9e7493cad 100644
--- a/tools/mock_node/src/setup.rs
+++ b/tools/mock_node/src/setup.rs
@@ -322,6 +322,7 @@ pub fn setup_mock_node(
                 !archival,
             )
         });
+    std::thread::sleep_ms(300);
     network_adapter.set_recipient(mock_network_actor.clone().recipient());
 
     // for some reason, with "test_features", start_http requires PeerManagerActor,

The following test

$ cargo test --package mock_node --lib -- setup::test::test_mock_node_basic --exact --nocapture

fails on master and passes with this PR applied

@matklad matklad force-pushed the m/fix-NetworkRecipient branch 2 times, most recently from 09ba69f to a6d3594 Compare May 23, 2022 15:39
closes near#6826

We have circular dependencies between PeerManager and Client, which we
break via `dyn PeerManagerAdapter`. As this is a real cycle, we close it
at runtime by calling `set_recipient`. But today nothing prevents client
from sending message to network before that, and indeed that happens if
we are unlucky with timing.

This commit fixes the issue by making `PeerManagerAdapter` APIs block
until the loop is closed.
@matklad matklad force-pushed the m/fix-NetworkRecipient branch from a6d3594 to 9dcc3c0 Compare May 23, 2022 16:01
@matklad matklad marked this pull request as ready for review May 23, 2022 16:08
@matklad matklad requested a review from a team as a code owner May 23, 2022 16:08
@matklad matklad requested a review from mzhangmzz May 23, 2022 16:08
@matklad matklad changed the title wip: fix race in NetworkRecepient fix: race in NetworkRecepient May 23, 2022
@near-bulldozer near-bulldozer bot merged commit 9609cbb into near:master May 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fix race in PeerManagerAdaptor
2 participants