[Epoch Sync] Make epoch sync happen before header sync on AwaitingPeers. #12563
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I think if I understand correctly, the way the AwaitingPeers state works is simply a marker for the starting state. The mechanism by which we transition away from the AwaitingPeers state is by header sync replacing it with HeaderSync when there are enough peers to run the header sync code at all.
So, before this PR, what would happen is that we start with AwaitingPeers, and epoch sync will see that and say "oh we don't have enough peers, so let's skip", but then header sync takes the stage and starts syncing headers. This ruins the header_head by moving it away from genesis, making epoch sync no longer eligible. In fact, this happens pretty reliably because at startup we would always perform header sync first before performing epoch sync, and since epoch sync is most likely slower than the first header sync response, we're continuing epoch sync with an incorrect header_head (causing either an almost-correct proof application, or a stall if the epoch sync request fails).
There are a few more hardening fixes that we should consider, but for now, this should fix the root cause, by no longer treating AwaitingPeers as special. By the way we'll also not treat StateSync as special, because that just can't be possible if the header_head is at genesis.