-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove syncing tower from bank on old local tower #32894
Conversation
dcf80a7
to
34113cb
Compare
34113cb
to
e65189d
Compare
@AshwinSekar Will deleting the tower file prevent an error? |
No. This sync only happens when local tower is behind on chain tower which will always be the case if your local tower is deleted. |
What if we start without the local tower at all? I've restarted it, and the node has been running for 15 minutes already. |
as long as your secondary is still producing and updating a local tower, there is a chance that the sync condition will be triggered which has a chance to lead to the panic mentioned above. |
Ah, I looked into the code, and it seems I misunderstood you. By "local tower," you meant the structure that stores the tower's state in the validator's runtime, not the local tower file. (I hope I didn't say something even more foolish than when I asked the previous question 🤦♀️). |
i don't think it's worth rolling this back at the detriment of voting validators. secondary validators can cherry-pick this if needed until we have a proper fix in place. |
What would be the timeline for a proper fix? We cannot run different binaries for the primary and secondary because their roles switch at failover. |
I'm working on a fix at the moment, should have it in time for this weeks release. |
Problem
This workflow was intended to stop accidental slashable voting on server crash + restart. It detected if local state was behind on chain state and adopted on chain state if that was the case.
This was not designed with secondary non-voting validators in mind. This completely breaks secondary validators because their tower is constantly diverged from their primary validator, due to their freedom to freeze and vote on banks differently. Because of this, their fork choice could be behind the primary validators causing errors like the following to occur:
Summary of Changes
Remove the sync until a proper fix can be designed.
Fixes #