Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CATCHUP] Generate Network on the fly for restart nodes #3723

Merged
merged 4 commits into from
Oct 9, 2024

Conversation

lukeiannucci
Copy link
Contributor

@lukeiannucci lukeiannucci commented Oct 2, 2024

Closes #<ISSUE_NUMBER>

This PR:

Instead of starting the network for the Restart Nodes at test startup, we are pushing it to when we create the node and replace old one.

Also, broadcast the QuorumProposalRequestSend to all nodes. Previously we would send to leader, wait, then do the DA committee. Sometimes this would take too long and the REQUEST_TIMEOUT would timeout before we received the proposal causing some nodes to never receive it, leading to flaky UT's.

This PR does not:

Key places to review:

@lukeiannucci lukeiannucci changed the title generate network for restarted nodes in spinning task [CATCHUP] Generate Network on the fly for restart nodes Oct 2, 2024
@lukeiannucci lukeiannucci marked this pull request as ready for review October 2, 2024 20:55
@lukeiannucci lukeiannucci requested a review from bfish713 as a code owner October 2, 2024 20:55
Copy link
Contributor

@ss-es ss-es left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changes look good and make sense overall, just had a few questions (maybe just for my own understanding)

@@ -481,7 +482,7 @@ where
self.late_start.insert(
node_id,
LateStartNode {
network,
network: Some(network),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused here -- I would've assumed skip_late == true means we want to skip initializing the nodes. Shouldn't we set network: None in that case?

(I don't actually know -- genuinely just wondering)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this was part of the plan to eventually have all the LateStartNodes actually generate and start the network on their proper view. I have fixed this scenario I also think we can fix the other ones in a future PR. The main issue seems to be with restarting nodes and generating nodes * 2 networks before the test started.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can make a separate issue for someone to pick it up

@@ -520,7 +521,7 @@ where
self.late_start.insert(
node_id,
LateStartNode {
network,
network: Some(network),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this one makes complete sense

@@ -210,13 +220,15 @@ where
if let Some(node) = self.handles.write().await.get_mut(idx) {
tracing::error!("Node {} shutting down", idx);
node.handle.shut_down().await;
// For restarted nodes generate the network on correct view
let generated_network = (self.channel_generator)(node_id).await;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we generate the network on RestartDown rather than RestartUp?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should be fine to do either way, the main thing is to ensure we shut down the old nodes network before we generate the new network

@lukeiannucci lukeiannucci merged commit 09dd84a into main Oct 9, 2024
1 check passed
@lukeiannucci lukeiannucci deleted the li/generate-network-restart-tests branch October 9, 2024 15:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants