Skip to content

Commit

Permalink
fix: stuck on sync (#5739)
Browse files Browse the repository at this point in the history
Description
---
Fixes the block sync and horizon sync to stop a node from getting stuck
on sync

Motivation and Context
---
When a local node has an error in talking to a node on sync, it should
not try over and over again, it should remove that peer from the list of
peers and try another peer. If that fails it should go back to listing
mode and get a new peer list.

Currently the node will get stuck and keep trying the same thing over
again without any changes being made locally to fix the problem.

```
07:57 WARN  RPC request failed: NotFound: Requested end block sync hash was not found
07:57 WARN  This sync round failed (450)
```
In this example, the node kept asking the same peer for the same block
hash it did not have. And thus it was stuck in sync mode until manual
intervention.

How Has This Been Tested?
---
manual

---------

Co-authored-by: Hansie Odendaal <[email protected]>
  • Loading branch information
SWvheerden and hansieodendaal authored Sep 5, 2023
1 parent eb74bbb commit 33b37a8
Show file tree
Hide file tree
Showing 2 changed files with 7 additions and 9 deletions.
9 changes: 5 additions & 4 deletions base_layer/core/src/base_node/sync/block_sync/synchronizer.rs
Original file line number Diff line number Diff line change
Expand Up @@ -187,10 +187,11 @@ impl<'a, B: BlockchainBackend + 'static> BlockSynchronizer<'a, B> {
self.peer_ban_manager
.ban_peer_if_required(node_id, &Some(reason.clone()))
.await;

if reason.ban_duration > self.config.short_ban_period {
self.remove_sync_peer(node_id);
}
}
if let BlockSyncError::MaxLatencyExceeded { .. } = err {
latency_counter += 1;
} else {
self.remove_sync_peer(node_id);
}

if let BlockSyncError::MaxLatencyExceeded { .. } = err {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -199,14 +199,11 @@ impl<'a, B: BlockchainBackend + 'static> HorizonStateSynchronization<'a, B> {
self.peer_ban_manager
.ban_peer_if_required(node_id, &Some(reason.clone()))
.await;

if reason.ban_duration > self.config.short_ban_period {
self.remove_sync_peer(node_id);
}
}

if let HorizonSyncError::MaxLatencyExceeded { .. } = err {
latency_counter += 1;
} else {
self.remove_sync_peer(node_id);
}
},
}
Expand Down

0 comments on commit 33b37a8

Please sign in to comment.