Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: wait couple rounds for no pings to send an event #3315

Merged
merged 2 commits into from
Sep 8, 2021

Conversation

Cifko
Copy link
Contributor

@Cifko Cifko commented Sep 8, 2021

Description

Wait couple rounds of no ping before sending the event.

How Has This Been Tested?

npm test -- --name "Full block sync with small reorg"
npm test -- --name "Pruned mode reorg simple"
npm test -- --name "Pruned mode reorg past horizon"

@Cifko Cifko requested a review from sdbondi September 8, 2021 07:37
self.send_chain_metadata_to_event_publisher().await?;
// If there were no pings for awhile, we are probably alone.
self.number_of_rounds_no_pings += 1;
if self.number_of_rounds_no_pings >= 10 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to check that there are no pings (i.e. num_peers == 0)

Copy link
Member

@sdbondi sdbondi Sep 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also suggest 3 rounds (~ 1 minute to decide on network silence) - the ping interval is currently set to 30s by default in the base node, so 10 will take too long

const NUM_ROUNDS_NETWORK_SILENCE: u16 = 3; // updated from 5
Suggested change
if self.number_of_rounds_no_pings >= 10 {
if self.number_of_rounds_no_pings >= NUM_ROUNDS_NETWORK_SILENCE {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's one of the questions I had prepared. If there are peers it should trigger the pings so the counter should be nullified.
Or should I nullify the counter when the num_peers are not zero? There are several ways to do this.
Should I nullify the number_of_rounds_no_pings once we call the send_chain_metadata_to_event_publisher or not.
i.e. should we send it everytime (since the limit is reached) or just from time to time?

Copy link
Member

@sdbondi sdbondi Sep 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok I see what you did, makes sense. I think that we should add a NetworkSilence event that gets emitted every time there are n rounds in a row that have no responses before the next one starts. The reason is, because if we get pings, then stop getting pings send_chain_metadata_to_event_publisher will carry on sending old metadata from previous rounds whereas NetworkSilence is more explicit. Obviously the listener will need to handle this event in the same way that it handles PeerChainMetadataReceived([]) (which will never be broadcast after this change). What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the NetworkSilence sounds really good. I've added the ChainMetadataEvent::NetworkSilence event.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think we should reduce the NUM_ROUNDS_NETWORK_SILENCE to be around 3. That's basically 90 seconds of zero inbound or outbound comms. 10 rounds is around 10 minutes. That wont work for cucumber. But 90 seconds might not either. We might want to make this a config option and for cucumber make it like 10 seconds or so.

Copy link
Contributor Author

@Cifko Cifko Sep 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's 15 in the tests, 30 in the default_config in utils.rs (both for igor and weatherwax), Still 90 seconds is enough. I've change it to 3, and refactored it to the top as a constant.

@Cifko Cifko force-pushed the fix-timeout branch 3 times, most recently from 5123d3a to 7d566b9 Compare September 8, 2021 09:10
SWvheerden
SWvheerden previously approved these changes Sep 8, 2021
Copy link
Collaborator

@SWvheerden SWvheerden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this approach so far.

My only comment is about the number of rounds and the time delay.

stringhandler
stringhandler previously approved these changes Sep 8, 2021
@aviator-app aviator-app bot removed the mq-failed label Sep 8, 2021
@aviator-app aviator-app bot merged commit 2dcc0ea into tari-project:development Sep 8, 2021
Cifko added a commit to Cifko/tari that referenced this pull request Sep 10, 2021
)

Description
---
Wait couple rounds of `no ping` before sending the event.

How Has This Been Tested?
---
npm test -- --name "Full block sync with small reorg"
npm test -- --name "Pruned mode reorg simple"
npm test -- --name "Pruned mode reorg past horizon"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants