-
Notifications
You must be signed in to change notification settings - Fork 535
Conversation
* addded TemporaryDialPeer to
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great catch. Confirmed locally that peers will dial back after bootnode(s) disruption occurs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thank you for correcting minor things in addition to the fixes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, tested manually everything and the issue is fixed with this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good 💯
Thank you for hunting down this bug 🙏
I've left a few small comments that need to be addressed.
Please also resolve the conflicts in the go.mod
file 🙏
# Conflicts: # go.mod
Description
Right now, in a running network, once the bootnodes lose all of its peers (server restart, network issues or similar) it does not get back any peers, without restarting all the nodes.
There is a go routine that is supposed to ping random bootnode for the list of its peers, but the bootnodes stay disconnected from the rest of the network.
One can test this very easily, run a 4-node cluster with 1 bootnode. Keep all the nodes running but restart the bootnode. Once the bootnode goes back online, it will never join the network and it will always have 0 peers.
Changes include
Checklist
Testing
Manual tests
Run a 4-node cluster with 1 bootnode. Restart or power cycle the bootnode only.
After the bootnode comes back online, after a while (no more than 60s), it should start peering with all the other nodes.
Additional comments
Fixes EDGE-790