Skip to content

Commit

Permalink
2.4.0-rc2 Epoch sync fix (#12569)
Browse files Browse the repository at this point in the history
[Epoch Sync] Make epoch sync happen before header sync on AwaitingPeers.
(#12563)

I think if I understand correctly, the way the AwaitingPeers state works
is simply a marker for the starting state. The mechanism by which we
transition away from the AwaitingPeers state is by header sync replacing
it with HeaderSync when there are enough peers to run the header sync
code at all.

So, before this PR, what would happen is that we start with
AwaitingPeers, and epoch sync will see that and say "oh we don't have
enough peers, so let's skip", but then header sync takes the stage and
starts syncing headers. This ruins the header_head by moving it away
from genesis, making epoch sync no longer eligible. In fact, this
happens pretty reliably because at startup we would always perform
header sync first before performing epoch sync, and since epoch sync is
most likely slower than the first header sync response, we're continuing
epoch sync with an incorrect header_head (causing either an
almost-correct proof application, or a stall if the epoch sync request
fails).

There are a few more hardening fixes that we should consider, but for
now, this should fix the root cause, by no longer treating AwaitingPeers
as special. By the way we'll also not treat StateSync as special,
because that just can't be possible if the header_head is at genesis.

Co-authored-by: robin-near <[email protected]>
  • Loading branch information
staffik and robin-near authored Dec 6, 2024
1 parent 7e1213a commit 53063d8
Showing 1 changed file with 0 additions and 3 deletions.
3 changes: 0 additions & 3 deletions chain/client/src/sync/epoch.rs
Original file line number Diff line number Diff line change
Expand Up @@ -604,9 +604,6 @@ impl EpochSync {
return Ok(());
}
match status {
SyncStatus::AwaitingPeers | SyncStatus::StateSync(_) => {
return Ok(());
}
SyncStatus::EpochSync(status) => {
if status.attempt_time + self.config.timeout_for_epoch_sync < self.clock.now_utc() {
tracing::warn!("Epoch sync from {} timed out; retrying", status.source_peer_id);
Expand Down

0 comments on commit 53063d8

Please sign in to comment.