You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What is this task and why do we need to work on it?
Nodes currently cannot rejoin the network after they fully shutdown and restart. They can't properly get the configuration file from the orchestrator, identify themselves to the network, and get enough information to start participating in the network again. Our current "shutdown" tests do not fully shutdown nodes, but instead only pause nodes, which isn't realistic behavior. This task adds functionality for nodes to optionally read configuration items from disk when they start up. This issue is needed to support more resilient testnets.
What work will need to be done to complete this task?
Nodes should write their configuration files and other necessary information to disk and optionally read from disk when they start up.
Investigate how the sequencer currently handles this. It's possible the easiest fix is sequencer-side
Add ability for node to optionally read from a config file at a parametrizable location on disk at startup in the HotShot example code
Add ability for node to write config file to disk at startup to a parametrizable location in the HotShot example code
Test by running a small network of nodes and killing / restarting a range of those nodes. Ensure that the network continues to function.
Discuss integration of changes with sequencer team
Integrate into sequencer code
Are there any other details to include?
Ideally we should add the ability to fully shutdown nodes to our testing harness, but that is outside the scope of this issue.
What are the acceptance criteria to close this issue?
Sequencer tests pass
HotShot test network with 5 nodes successfully handles shutting down and restarting all 5 nodes at different times (such that only 1 is ever offline at a time). Note that this will be a manual test.
The text was updated successfully, but these errors were encountered:
The status on this one is that I am running into an issue with the webserver locally, where it is not pulling down the most current proposal. The config changes are done, but catchup is intermittently working.
Fixed an issue with catchup in #2192. There's still a problem where rounds where the catchup node is leader still time out (after it's caught up). This may be a voting thing, looking into it.
What is this task and why do we need to work on it?
Nodes currently cannot rejoin the network after they fully shutdown and restart. They can't properly get the configuration file from the orchestrator, identify themselves to the network, and get enough information to start participating in the network again. Our current "shutdown" tests do not fully shutdown nodes, but instead only pause nodes, which isn't realistic behavior. This task adds functionality for nodes to optionally read configuration items from disk when they start up. This issue is needed to support more resilient testnets.
What work will need to be done to complete this task?
Nodes should write their configuration files and other necessary information to disk and optionally read from disk when they start up.
Are there any other details to include?
Ideally we should add the ability to fully shutdown nodes to our testing harness, but that is outside the scope of this issue.
What are the acceptance criteria to close this issue?
The text was updated successfully, but these errors were encountered: