-
Notifications
You must be signed in to change notification settings - Fork 220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[base node] Header sync gets stuck on banned peer #5030
Comments
Ah ok, so it's not that we still are connected to the banned peer but that the sync peer set provided by the listening state/chain metadata service is not clearing the banned peer from the list. It should probably clear all peer metadatas when transitioning to header sync in every case so that listening has to do another round of metadata pingpongs and refreshes the list (just a suggestion to keep things simple, may not be the best way) |
An interesting case here, i have a base node 99999 that is giving a higher chain (old genesis), but keeps on timing out on RPC connections. This isn't a bannable offense, so it continues to try. An RPC timeout can occur innocently or maliciously. Perhaps banning if it happens a few times is best, though that does mean keeping track of attempts. |
yeah I am very much in favor of having 3 ban levels. |
Better story line 2022-12-09 04:05:48.682740200 [c::bn::chain_state_sync_service] TRACE Received ping from neighbouring node 'cdee5a37e4478755e532a32d09'.
2022-12-09 04:05:48.682760200 [c::bn::chain_state_sync_service] DEBUG Received chain metadata from NodeId 'cdee5a37e4478755e532a32d09' #5281, Acc_diff 67,363,806,366,009,233,370,378,187
2022-12-09 04:05:48.725431200 [c::bn::header_sync] DEBUG Dialing cdee5a37e4478755e532a32d09 sync peer
2022-12-09 04:05:48.725503900 [c::bn::header_sync] INFO Successfully dialed sync peer cdee5a37e4478755e532a32d09 in 72.70µs
2022-12-09 04:05:48.725512100 [c::bn::header_sync] DEBUG Attempting to synchronize headers with `cdee5a37e4478755e532a32d09`
2022-12-09 04:05:51.469416000 [c::bn::chain_state_sync_service] TRACE Received ping from neighbouring node 'cdee5a37e4478755e532a32d09'.
2022-12-09 04:05:51.469434200 [c::bn::chain_state_sync_service] DEBUG Received chain metadata from NodeId 'cdee5a37e4478755e532a32d09' #5281, Acc_diff 67,363,806,366,009,233,370,378,187
2022-12-09 04:06:38.050929400 [c::bn::header_sync] DEBUG Header sync failed: Peer cdee5a37e4478755e532a32d09 exceeded maximum permitted sync latency. latency: 49.33ss, max: 30.00ss
2022-12-09 04:06:38.050952100 [c::bn::base_node] TRACE Base Node event in State [Synchronizing block headers]: Header Synchronization Failed
2022-12-09 04:06:38.050964800 [c::bn::state_machine_service::states::waiting] INFO The base node has started a WAITING state for 30 seconds
2022-12-09 04:06:38.052599200 [c::bn::chain_state_sync_service] TRACE Received pong from neighbouring node 'cdee5a37e4478755e532a32d09'.
2022-12-09 04:06:38.052611600 [c::bn::chain_state_sync_service] DEBUG Received chain metadata from NodeId 'cdee5a37e4478755e532a32d09' #5282, Acc_diff 67,364,252,593,290,559,176,378,768
2022-12-09 04:07:08.069426000 [c::bn::header_sync] DEBUG Dialing cdee5a37e4478755e532a32d09 sync peer
2022-12-09 04:07:08.069510200 [c::bn::header_sync] INFO Successfully dialed sync peer cdee5a37e4478755e532a32d09 in 84.00µs
2022-12-09 04:07:08.069519000 [c::bn::header_sync] DEBUG Attempting to synchronize headers with `cdee5a37e4478755e532a32d09`
2022-12-09 04:07:14.802560400 [c::bn::header_sync] DEBUG Sync peer latency is 6.73s
2022-12-09 04:07:14.802577100 [tari_core::base_node::sync::header_sync::synchronizer] INFO attempt_sync; sync_peer=SyncPeer { peer_metadata: PeerChainMetadata { node_id: NodeId(cdee5a37e4478755e532a32d09), chain_metadata: ChainMetadata { height_of_longest_chain: 5282, best_block: FixedHash([25, 102, 228, 155, 225, 135, 50, 65, 179, 51, 40, 18, 107, 24, 37, 199, 105, 177, 193, 203, 100, 122, 181, 128, 165, 164, 162, 83, 23, 255, 190, 189]), pruning_horizon: 0, pruned_height: 0, accumulated_difficulty: 67364252593290559176378768, timestamp: 1670551358 }, latency: Some(6.7328396s) }, avg_latency: RollingAverageTime { samples: RollingVec([]) } } max_latency=30s
2022-12-09 04:07:14.802595900 [c::bn::header_sync] DEBUG Initiating header sync with peer `cdee5a37e4478755e532a32d09` (sync latency = 6732ms)
2022-12-09 04:07:14.802931200 [c::bn::header_sync] DEBUG Determining if chain splits between 0 and 500 headers back from the tip (peer: `cdee5a37e4478755e532a32d09`, 54 hashes sent)
2022-12-09 04:07:23.774659000 [tari_core::base_node::sync::header_sync::synchronizer] ERROR error=Unable to find chain split from peer `cdee5a37e4478755e532a32d09`
2022-12-09 04:07:23.774677700 [c::bn::header_sync] WARN Chain split not found for peer cdee5a37e4478755e532a32d09.
2022-12-09 04:07:23.774686000 [c::bn::header_sync] WARN Banned sync peer because Peer could not find the location of a chain split
2022-12-09 04:07:23.774703600 [c::bn::header_sync] ERROR Header sync failed with all peers. Error: Sync failed for all peers
2022-12-09 04:07:23.774713900 [c::bn::header_sync] WARN Sync failed for all peers. Continuing...
2022-12-09 04:07:23.774729100 [c::bn::base_node] TRACE Base Node event in State [Synchronizing block headers]: Continuing
2022-12-09 04:07:23.774741300 [c::bn::state_machine_service::states::listening] INFO Listening for chain metadata updates
2022-12-09 04:07:23.797852600 [c::bn::state_machine_service::states::listening] INFO Our local blockchain accumulated difficulty is a little behind that of the network. We're at block #53 with an accumulated difficulty of 5,798,411,185,375,912,684, and the network chain tip is at #5282 with an accumulated difficulty of 67,364,252,593,290,559,176,378,768
2022-12-09 04:07:23.797875000 [c::bn::state_machine_service::states::listening] DEBUG Lagging (local height = 53, network height = 5282)
2022-12-09 04:07:23.797889000 [c::bn::base_node] TRACE Base Node event in State [Listening]: Fallen behind main chain - Lagging behind 1 peers (#5282, Difficulty: 67364252593290559176378768)
2022-12-09 04:07:23.797902100 [c::bn::header_sync] DEBUG Starting header sync.
2022-12-09 04:07:23.797908800 [c::bn::header_sync] INFO Synchronizing headers (1 candidate peers selected)
2022-12-09 04:07:23.797915700 [c::bn::header_sync] INFO Attempting to sync headers (1 sync peers)
2022-12-09 04:07:23.797923300 [c::bn::header_sync] DEBUG Dialing cdee5a37e4478755e532a32d09 sync peer
2022-12-09 04:07:23.801827000 [c::bn::chain_state_sync_service] DEBUG Removing disconnected/banned peer `cdee5a37e4478755e532a32d09` from chain metadata list
2022-12-09 04:07:23.802097900 [c::bn::header_sync] DEBUG Header sync failed: Connectivity Error: ConnectionFailed: Peer is banned, denying connection
2022-12-09 04:07:23.802113900 [c::bn::base_node] TRACE Base Node event in State [Synchronizing block headers]: Header Synchronization Failed
2022-12-09 04:07:23.802123900 [c::bn::state_machine_service::states::waiting] INFO The base node has started a WAITING state for 30 seconds
2022-12-09 04:41:26.365429300 [c::bn::comms_interface::inbound_handler] DEBUG Block with hash `4d0a304eed563a82389dda2b27318d138fda3c131a69b218859a726d64d9d6de` is unknown. Constructing block from known mempool transactions / requesting missing transactions from peer 'cdee5a37e4478755e532a32d09'.
2022-12-09 04:41:26.365450500 [c::bn::comms_interface::inbound_handler] INFO Block #5287 (4d0a304eed563a82389dda2b27318d138fda3c131a69b218859a726d64d9d6de) received from remote peer: cdee5a37e4478755e532a32d09
2022-12-09 04:41:26.365457200 [c::bn::comms_interface::inbound_handler] DEBUG Incoming block: ----------------- Block -----------------
2022-12-09 04:41:26.366190300 [c::bn::chain_state_sync_service] TRACE Received ping from neighbouring node 'cdee5a37e4478755e532a32d09'.
2022-12-09 04:41:26.366216300 [c::bn::chain_state_sync_service] DEBUG Received chain metadata from NodeId 'cdee5a37e4478755e532a32d09' #5287, Acc_diff 67,379,859,023,864,670,431,623,680
2022-12-09 04:41:26.366229100 [c::bn::chain_state_sync_service] TRACE Received pong from neighbouring node 'cdee5a37e4478755e532a32d09'.
2022-12-09 04:41:26.366238300 [c::bn::chain_state_sync_service] DEBUG Received chain metadata from NodeId 'cdee5a37e4478755e532a32d09' #5287, Acc_diff 67,379,859,023,864,670,431,623,680
2022-12-09 04:41:26.368218800 [c::tx::aggregated_body] TRACE Checking sender signatures
2022-12-09 04:41:26.368511900 [dht::network_discovery::peer_validator] DEBUG Peer `92befafc5bdfc9f9e4ecf7d7e4` already exists or is up to date and will not be updated
2022-12-09 04:41:26.368536000 [c::val::helpers] WARN Validation failed on block:4d0a304eed563a82389dda2b27318d138fda3c131a69b218859a726d64d9d6de:InvalidSignatureError("Metadata signature not valid!")
2022-12-09 04:41:26.368560500 [c::cs::database] WARN Block #5287 (4d0a304eed563a82389dda2b27318d138fda3c131a69b218859a726d64d9d6de) failed validation - The transaction is invalid: Signature is invalid: Metadata signature not valid!
2022-12-09 04:41:26.368624400 [c::bn::comms_interface::inbound_handler] WARN Peer cdee5a37e4478755e532a32d09 sent an invalid block: Validation error: The transaction is invalid: Signature is invalid: Metadata signature not valid!
2022-12-09 04:41:26.368669400 [dht::network_discovery::peer_validator] TRACE Adding peer `111111f21206d10c5063ef1501`
2022-12-09 04:41:26.368694100 [c::bn::base_node_service::service] ERROR Failed to handle incoming block message: Comms interface error: `Chain storage error: Validation error: The transaction is invalid: Signature is invalid: Metadata signature not valid!`
2022-12-09 04:41:26.368716200 [c::bn::chain_state_sync_service] TRACE Received ping from neighbouring node 'cdee5a37e4478755e532a32d09'.
2022-12-09 04:41:26.368737300 [c::bn::chain_state_sync_service] DEBUG Received chain metadata from NodeId 'cdee5a37e4478755e532a32d09' #5287, Acc_diff 67,379,859,023,864,670,431,623,680
2022-12-09 04:41:26.373708100 [c::bn::chain_state_sync_service] DEBUG Removing disconnected/banned peer `cdee5a37e4478755e532a32d09` from chain metadata list
2022-12-09 04:41:26.376451800 [dht::network_discovery::peer_validator] DEBUG Peer `b0f4af30a3998bc362772a8eda` already exists or is up to date and will not be updated
2022-12-09 04:41:26.376543600 [c::bn::chain_state_sync_service] TRACE Received ping from neighbouring node 'cdee5a37e4478755e532a32d09'.
2022-12-09 04:41:26.376561600 [c::bn::chain_state_sync_service] DEBUG Received chain metadata from NodeId 'cdee5a37e4478755e532a32d09' #5287, Acc_diff 67,379,859,023,864,670,431,623,680
2022-12-09 04:41:26.395014000 [c::bn::header_sync] DEBUG Dialing cdee5a37e4478755e532a32d09 sync peer
2022-12-09 04:41:56.424062900 [c::bn::header_sync] DEBUG Starting header sync.
2022-12-09 04:41:56.424070100 [c::bn::header_sync] INFO Synchronizing headers (1 candidate peers selected)
2022-12-09 04:41:56.424076700 [c::bn::header_sync] INFO Attempting to sync headers (1 sync peers)
2022-12-09 04:41:56.424085400 [c::bn::header_sync] DEBUG Dialing cdee5a37e4478755e532a32d09 sync peer
2022-12-09 04:41:56.424157600 [c::bn::header_sync] DEBUG Header sync failed: Connectivity Error: ConnectionFailed: Peer is banned, denying connection
2022-12-09 04:42:26.460843400 [c::bn::header_sync] DEBUG Starting header sync.
2022-12-09 04:42:26.460849700 [c::bn::header_sync] INFO Synchronizing headers (1 candidate peers selected)
2022-12-09 04:42:26.460856000 [c::bn::header_sync] INFO Attempting to sync headers (1 sync peers)
2022-12-09 04:42:26.460864400 [c::bn::header_sync] DEBUG Dialing cdee5a37e4478755e532a32d09 sync peer
2022-12-09 04:42:26.460935300 [c::bn::header_sync] DEBUG Header sync failed: Connectivity Error: ConnectionFailed: Peer is banned, denying connection
2022-12-09 04:42:26.460946900 [c::bn::base_node] TRACE Base Node event in State [Synchronizing block headers]: Header Synchronization Failed
etc. |
Unzip each file core.7.zip.001.zip |
Please provide the comms logs if you can (in zip or tar.gz please :)) |
Here you are @sdbondi |
So the core issue here looks like the BN bans the peer correctly, then removes the peer from the peer_chain_metadata list correctly. This is in a sense related to: #5037 So the root cause here is that the peer got banned, but its messages where still propagated up stream to the domain layer after the banned_event was sent to the domain layer. |
Thanks for the logs, What I've observed is that the peer is banned twice. The first time, for 30 minutes (between 04:07:23.802123900 and 2022-12-09 04:41:26.365429300) then again permanently (bad metadata sig). It's clear in the logs that the pong is received before being banned, the node is banned (chain metadata service clears the peer's metadata) but the message is already in the domain pipeline so another one is received. The chain metadata service now keeps the peer's chain metadata and sends it to the listening state every time because it is not cleared. TL;DR classic race condition. Order of events:
I've made a PR that removes the Vec from the chain metadata service. |
…5039) Description --- - Removes "caching" of peer metadata - Send only one peer chain metadata at a time to the listening state - increase chain metadata event channel to 20 (so that 20 of the last chain metadata received can be read by listening) Motivation and Context --- Fixes #5037 and #5030 (needs to be confirmed since this case is not easy to reproduce) What I've observed is that the peer is banned twice. The first time, for 30 minutes (between 04:07:23.802123900 and 2022-12-09 04:41:26.365429300) then again permanently (bad metadata sig). It's clear in the logs that the pong is received before being banned, the node is banned (chain metadata service clears the peer's metadata) but the message is already in the domain pipeline so another one is received. The chain metadata service now keeps the peer's chain metadata and sends it to the listening state every time because it is not cleared. TL;DR classic race condition. Order of events: - peer is banned for 30 minutes - peer ban expires - we receive a ping from the peer - at almost the same time peer is banned (see logs in [this comment](#5030 (comment))) - chain metadata service clears the peer from peer_chain_metadata - the ping/pong is received (already in the pipeline from before the ban) - the chain metadata is added to the vec, and is never cleared (because the peer is not banned again) - the peer stays in the list despite being banned and NOT connected, so the header sync continues to try to connect to it How Has This Been Tested? --- Manually: rewind-blockchain and sync, enters sync mode timeously Removed some tests that test removed functions Added a new unit test for determine_sync_state
mmm, It should not be a 30-minute ban, that's a bug as well
wait that's way too short:
|
This base node gets stuck trying to sync headers from banned peer
cdee5a37e4478755e532a32d09
:The ban incident:
Consecutive header sync retries to
cdee5a37e4478755e532a32d09
The text was updated successfully, but these errors were encountered: