-
Notifications
You must be signed in to change notification settings - Fork 219
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: disconnected node was never ready #3312
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might be completely wrong here. But I think something else is wrong if this fixes the error.
handle_liveness_event
Is suppose to update the base_node or listing state to the ChainMetaData of other neighboring nodes. It is not suppose to stop the node broadcasting or telling other clients its own ChainMetaData.
send_chain_metadata_to_event_publisher
is sent out if it receives a new ping or pong from a neighbor. So this PR will resend that entire list it just sent again if this node restarts a ping round, but that data is only supposed to be neighbor info. In the cucumber tests in question, the node is alone and thus never receives a ping or pong message. Thats why this is now "fixes" it. But the question is why does the fact that the node cannot broadcast its own ChainMetaData if it does not receive other node's ChainMetaData
Yeah I tend to agree with @SWvheerden that something else is wrong (maybe there are no other peers to connect to in the cucumber tests?). The listening state responds to the event from the chain metadata service. So I'm going to say that it is triggering this code early in listener.rs. if peer_metadata_list.is_empty() {
debug!(
target: LOG_TARGET,
"No peer metadata to check. Continuing in listening state.",
);
if !self.is_synced {
debug!(target: LOG_TARGET, "Initial sync achieved");
self.is_synced = true;
shared.set_state_info(StateInfo::Listening(ListeningInfo::new(true)));
} This may (need to check) be breaking the base node in a subtle way where it says it is synced without first actually checking if it needs to sync first - maybe that ends up being ok because it will kick back to is_synced == false if needed, but then we may as well start the base node with is_synced == true. It is the chain metadata's job to provide a good view of the chain metadata to whoever is listening, so empty chain metadata is probably breaking that contract |
That's true. There are no other nodes. So that's why it never receives ping or pong, so the only way to trigger anything is from the
How about we put the |
I checked, liveness service will send the PingBroadcastRound every configured interval even if there are no peers in the round |
Found the actual problem. This PR fixes the problem because it now sends out an empty Vec every ping broadcast event. So it does not get stuck there. The problem is that the piece of code to |
So we agree we need to somehow trigger the
I prefer the option 1, as I encountered the problem outside of cucumber test (on my manual testing environment). |
I think option 1 is essentially saying that we decide that there is network silence after a few seconds and assume we're the only node and so are synced. That seems fine for cucumber tests, but may be a bit quick on a real base node. The way we handle network silence isn't great - feels like this responsibility should fall on chain metadata service. Basically if there are |
Description --- Disconnected node was never ready. How Has This Been Tested? --- npm test -- --name "Full block sync with small reorg" npm test -- --name "Pruned mode reorg simple" npm test -- --name "Pruned mode reorg past horizon"
Description
Disconnected node was never ready.
How Has This Been Tested?
npm test -- --name "Full block sync with small reorg"
npm test -- --name "Pruned mode reorg simple"
npm test -- --name "Pruned mode reorg past horizon"