Make polkadot behave correctly if node changes peer-id at restart #3673

alexggh · 2024-03-13T06:47:50Z

During the investigation for #3314 we discovered there is a number of validators that change their PeerId at every restart, that seems not work properly right now.

Work-arrounds

Don't change PeerId at every restart, the peer-id is saved on disk next to the databse in the network folder, so you can simply reuse that by always providing --base-path argument and making sure the network folder is present where you restart the node.
If you restarted already, key rotations would fix the issue.

[WIP] Things that need fixing

In general all the assumptions where we think we have just one PeerID per AuthorithyId are breaking this on a quick scan by me and @bkchr:

With kusama size nodes won't be able to connect to nodes if they aren't in the reserved set.

polkadot-sdk/polkadot/node/network/protocol/src/peer_set.rs

Lines 95 to 96 in 809d466

    
           in_peers: super::MIN_GOSSIP_PEERS as u32 / 2 - 1, 
        
           out_peers: super::MIN_GOSSIP_PEERS as u32 / 2 - 1,

Some places where the assumption that just one peerID per authority exist.

polkadot-sdk/polkadot/node/network/gossip-support/src/lib.rs

Line 408 in 809d466

.find_map(|addr| parse_addr(addr).ok().map(|(p, _)| p));

polkadot-sdk/polkadot/node/network/statement-distribution/src/v2/mod.rs

Line 451 in a756baf

"Ignoring new peer with duplicate authority ID as a bearer of that identity"

Another thing that I observed that is probably off while testing on versi is that ConnectToResolvedValidators gets called sometime with the new address and sometimes with old-address.

The text was updated successfully, but these errors were encountered:

bkchr · 2024-03-13T07:16:53Z

The first link is not really a problem. Just more the explanation why these people could not connect to all the other validators.

polkadot-sdk/polkadot/node/network/gossip-support/src/lib.rs

Lines 327 to 352 in 809d466

    
           async fn issue_connection_request<Sender>( 
        
           	&mut self, 
        
           	sender: &mut Sender, 
        
           	authorities: Vec<AuthorityDiscoveryId>, 
        
           ) where 
        
           	Sender: overseer::GossipSupportSenderTrait, 
        
           { 
        
           	let num = authorities.len(); 
        
           	let mut validator_addrs = Vec::with_capacity(authorities.len()); 
        
           	let mut failures = 0; 
        
           	let mut resolved = HashMap::with_capacity(authorities.len()); 
        
           	for authority in authorities { 
        
           		if let Some(addrs) = 
        
           			self.authority_discovery.get_addresses_by_authority_id(authority.clone()).await 
        
           		{ 
        
           			validator_addrs.push(addrs.clone()); 
        
           			resolved.insert(authority, addrs); 
        
           		} else { 
        
           			failures += 1; 
        
           			gum::debug!( 
        
           				target: LOG_TARGET, 
        
           				"Couldn't resolve addresses of authority: {:?}", 
        
           				authority 
        
           			); 
        
           		} 
        
           	}

This function needs to be called multiple times per session. Currently we don't call this once and if a validator address was not known at this point, we will never resolve it in this session.

alexggh · 2024-03-13T07:22:27Z

The first link is not really a problem

Messed up the links, fixed it in the description.

alexggh · 2024-03-20T15:15:01Z

Looked a bit more closer at Kademlia DHT and the biggest problem for this use-case seem to be is that both records will continue living in the DHT on different nodes, so on any given node any of the following might be true:

When querying the DHT a node discovers only the old record.
When querying the DHT a node discovers only the new record.
When querying the DHT a node discovers both the new record and the old record, but because they are sequentially advertised to authority-discovery only the last one will be saved. We can probably mitigate this, by modifying authorithy-discovery to save both of them and expire the old-record at a later data.

Both 1) & 3) will happen until the old record TTL is met which currently is 36 hours, after that the new record will be the only one in the network.

Why does this happen ?

The way Kademlia DHT works is that it uses the PeerId as a key for determining on which nodes records need to be replicated the receiving nodes will do replication as well every 60 minutes. The replication factor is 20.
So, regularly both the old records and new records will continue living in the network on different nodes until the old one expires.

What can we do ?

At least as far as I can tell this behaviour is entrenched in the Kademlia DHT protocol, we can probably reduce the window the old entry lives by playing with record_ttl(36h), record_replication_interval(1), record_publication_interval(24h), but there will still be a window where this happens.

So, I'm inclined to think changing the network key(PeerID) while a node is in the active set should not be considered a safe operation, to perform, hence what I'm proposing is:

Modify the behaviour of polkadot executable to not auto-generated the network key(PeerID) if none is provided and exit with error (Kudos to @s0me0ne-unkn0wn for giving me the idea)
Add an explicit flag for auto-generating the network key at startup with proper documentation that this is not a safe operation to do while the node is in the active set and what will happen.

@bkchr @dmitry-markin @eskimor Thoughts ?

bkchr · 2024-03-20T15:34:47Z

Here are my notes from last week that I wanted to put into an issue actually:

I think one problem exists in authority discovery:

polkadot-sdk/substrate/client/authority-discovery/src/worker/addr_cache.rs

Lines 77 to 86 in a6713c5

let old_addresses = self.authority_id_to_addresses.insert(authority_id.clone(), addresses);

let old_peer_ids = addresses_to_peer_ids(&old_addresses.unwrap_or_default());

// Add the new peer ids

peer_ids.difference(&old_peer_ids).for_each(|new_peer_id| {

self.peer_id_to_authority_ids

.entry(*new_peer_id)

.or_default()

.insert(authority_id.clone());

});

and

polkadot-sdk/substrate/client/authority-discovery/src/worker.rs

Lines 571 to 572 in a6713c5

.flatten()

.take(MAX_ADDRESSES_PER_AUTHORITY)

.
If the validator exists with multiple peerids in the DHT (because of the restart), we maybe get multiple DHT events with different peerids. We should keep different peerids around. Maybe store when we received them the last time. We also should not take the maximum number of addresses solely based on the addresses, we should have addresses per peer_id and then have a max number of peer_ids and addresses.

polkadot-sdk/substrate/client/network/src/discovery.rs

Lines 784 to 790 in a6713c5

// Let's directly finish the query, as we are only interested in a

// quorum of 1.

if let Some(kad) = self.kademlia.as_mut() {

if let Some(mut query) = kad.query_mut(&id) {

query.finish();

}

}

Another issue is this properly. Here we should not end directly with quorum of 1. Instead we should probably fetch all the values.

This assumes the same as you are writing. However, I would like to try to optimize the current implementation given the points above. We can also go further and extend the record with a timestamp to find out which one is newer. We also don't verify the data in the DHT storage. The verification is only done in the authority discovery. This means that the DHT may could still contain invalid values.

Modify the behaviour of polkadot executable to not auto-generated the network key(PeerID) if none is provided and exit with error

Could be done as well, but should only be done for validators. Generally we probably need to improve docs around this stuff. There also exists generate-node-key to generate a key. Maybe we should just require that --node-key is passed when --validator is set.

alexggh · 2024-03-20T15:48:01Z

Another issue is this properly. Here we should not end directly with quorum of 1. Instead we should probably fetch all the values.

Ok, I see what you mean, so basically if we let the query continue we could probably end up discovering both more often than not and have them both stored in authorithy-discovery. Let me check how that behaves.

…e robust In the case when nodes don't persist their node-key or they want to generate a new one while being in the active set, things go wrong because both the old addresses and the new ones will still be present in DHT, so because of the distributed nature of the DHT both will survive in the network untill the old ones expires which is 36 hours. Nodes in the network will randomly resolve the authorithy-id to the old address or the new one. More details in: #3673 This PR proposes we mitigate this problem, by: 1. Let the query for a DHT key retrieve all the results, that is usually bounded by the replication factor which is 20, currently we interrupt the querry on the first result. 2. Modify the authority-discovery service to keep all the discovered addresses around for 24h since they last seen an address. 3. Plumb through other subsystems where the assumption was that an authorithy-id will resolve only to one PeerId. Currently, the authorithy-discovery keeps just the last record it received from DHT and queries the DHT every 10 minutes. But they could always receive only the old address, only the new address or a flip-flop between them depending on what node wins the race to provide the record 4. Update gossip-support to try resolve authorities more often than every session. This would gives us a lot more chances for the nodes in the networks to also discover not only the old address of the node but also the new one and should improve the time it takes for a node to be properly connected in the network. The behaviour won't be deterministic because there is no guarantee the all nodes will see the new record at least once, since they could query only nodes that have the old one. Signed-off-by: Alexandru Gheorghe <[email protected]>

alexggh · 2024-03-21T15:55:18Z

Posted a draft PR with all the plumbing that would need to happen to make this a bit better than the way it currently is: #3786, let me know what you guys think, don't be nitpicky it is in a very rough form :D.

dmitry-markin · 2024-03-25T10:09:45Z

So, I'm inclined to think changing the network key(PeerID) while a node is in the active set should not be considered a safe operation, to perform, hence what I'm proposing is:

I don't understand why people regenerate the PeerIDs of validators in the first place. Obviously, a validator should have the actual database, so the contents of --base-path should be preserved across restarts. Why they remove the network key file then?

Generally I'm for not-regenerating PeerIDs of the validators without a sound reason (is there one?)

Modify the behaviour of polkadot executable to not auto-generated the network key(PeerID) if none is provided and exit with error (Kudos to @s0me0ne-unkn0wn for giving me the idea)

May be make this default for validators only? This would change the interface and force operators to update their procedures, but we can print a nice error message if somebody starts a validator without network key file present.

Add an explicit flag for auto-generating the network key at startup with proper documentation that this is not a safe operation to do while the node is in the active set and what will happen.

IMO, this should be an explicit cli command like generate-node-key that has to be invoked separately, otherwise the operators would just plug this --auto-generate-node-key option and keep everything as before.

lexnv · 2024-03-25T12:04:03Z

Likewise, we could fix this in the CLI and require validators to use --node-key with some improved documentation.

We can also go further and extend the record with a timestamp to find out which one is newer. We also don't verify the data in the DHT storage.

I think this sounds like a reasonable solution to me. We'd need to extend the DHT::ValueFound event with the kademlia record expiry time:

polkadot-sdk/substrate/client/network/src/discovery.rs

Lines 475 to 478 in e88d1cb

    
           /// The DHT yielded results for the record request. 
        
           /// 
        
           /// Returning the result grouped in (key, value) pairs as well as the request duration. 
        
           ValueFound(Vec<(RecordKey, Vec<u8>)>, Duration),

Then, in the authority-discovery, we could store only the most recent records, since we know that older records might contain stale addresses. While at it, we could also have a look at the MAX_ADDRESSES_PER_AUTHORITY and adjust it as Basti suggested 🙏

alexggh · 2024-03-25T16:46:21Z

I think this sounds like a reasonable solution to me. We'd need to extend the DHT::ValueFound event with the kademlia record expiry time:

I've been down that road, it won't work because expiry is actually a function of the distance from the key to the node answering the request: https://github.com/libp2p/rust-libp2p/blob/master/protocols/kad/src/behaviour.rs#L1816. Expiry doesn't tell how old the record is because if you get it from nodes farther from the key they would have a close expiry they should not store the record for too much time.

alexggh · 2024-03-25T16:59:49Z

I don't understand why people regenerate the PeerIDs of validators in the first place. Obviously, a validator should have the actual database, so the contents of --base-path should be preserved across restarts. Why they remove the network key file then?

In this case a large validator set >50, were mount individually the keystore and db in their pod, so they missed persisting the network:

.local/share/polkadot/chains/ksmcc3/db
.local/share/polkadot/chains/ksmcc3/keystore

alexggh · 2024-03-25T17:02:05Z

IMO, this should be an explicit cli command like generate-node-key that has to be invoked separately, otherwise the operators would just plug this --auto-generate-node-key option and keep everything as before.

I was thinking to name it more like --i-know-i-m-getting-0-rewards-but-auto-generate-node-key :D

alexggh · 2024-03-27T10:05:18Z

PR to not allow auto generate of network key unless it is explicitly required: #3852.

As discovered during investigation of #3314 and #3673 there are active validators which accidentally might change their network key during restart, that's not a safe operation when you are in the active set because of distributed nature of DHT, so the old records would still exist in the network until they expire 36h, so unless they have a good reason validators should avoid changing their key when they restart their nodes. There is an effort in parallel to improve this situation #3786, but those changes are way more intrusive and will need more rigorous testing, additionally they will reduce the time to less than 36h, but the propagation won't be instant anyway, so not changing your network during restart should be the safest way to run your node, unless you have a really good reason to change it. ## Proposal 1. Do not auto-generate the network if the network file does not exist in the provided path. Nodes where the key file does not exist will get the following error: ``` Error: 0: Starting an authorithy without network key in /home/alexggh/.local/share/polkadot/chains/ksmcc3/network/secret_ed25519. This is not a safe operation because the old identity still lives in the dht for 36 hours. Because of it your node might suffer from not being properly connected to other nodes for validation purposes. If it is the first time running your node you could use one of the following methods. 1. Pass --unsafe-force-node-key-generation and make sure you remove it for subsequent node restarts 2. Separetly generate the key with: polkadot key generate-node-key --file <YOUR_PATH_TO_NODE_KEY> ``` 2. Add an explicit parameters for nodes that do want to change their network despite the warnings or if they run the node for the first time. `--unsafe-force-node-key-generation` 3. For `polkadot key generate-node-key` add two new mutually exclusive parameters `base_path` and `default_base_path` to help with the key generation in the same path the polkadot main command would expect it. 4. Modify the installation scripts to auto-generate a key in default path if one was not present already there, this should help with making the executable work out of the box after an instalation. ## Notes Nodes that do not have already the key persisted will fail to start after this change, however I do consider that better than the current situation where they start but they silently hide that they might not be properly connected to their peers. ## TODO - [x] Make sure only nodes that are authorities on producation chains will be affected by this restrictions. - [x] Proper PRDOC, to make sure node operators are aware this is coming. --------- Signed-off-by: Alexandru Gheorghe <[email protected]> Co-authored-by: Dmitry Markin <[email protected]> Co-authored-by: s0me0ne-unkn0wn <[email protected]> Co-authored-by: Bastian Köcher <[email protected]>

…e robust (#3786) In the case when nodes don't persist their node-key or they want to generate a new one while being in the active set, things go wrong because both the old addresses and the new ones will still be present in DHT, so because of the distributed nature of the DHT both will survive in the network untill the old ones expires which is 36 hours. Nodes in the network will randomly resolve the authorithy-id to the old address or the new one. More details in: #3673 This PR proposes we mitigate this problem, by: 1. Let the query for a DHT key retrieve more than one results(4), that is also bounded by the replication factor which is 20, currently we interrupt the querry on the first result. ~2. Modify the authority-discovery service to keep all the discovered addresses around for 24h since they last seen an address.~ ~3. Plumb through other subsystems where the assumption was that an authorithy-id will resolve only to one PeerId. Currently, the authorithy-discovery keeps just the last record it received from DHT and queries the DHT every 10 minutes. But they could always receive only the old address, only the new address or a flip-flop between them depending on what node wins the race to provide the record~ 2. Extend the `SignedAuthorityRecord` with a signed creation_time. 3. Modify authority discovery to keep track of nodes that sent us old record and once we are made aware of a new record update the nodes we know about with the new record. 4. Update gossip-support to try resolve authorities more often than every session. ~This would gives us a lot more chances for the nodes in the networks to also discover not only the old address of the node but also the new one and should improve the time it takes for a node to be properly connected in the network. The behaviour won't be deterministic because there is no guarantee the all nodes will see the new record at least once, since they could query only nodes that have the old one.~ ## TODO - [x] Add unittests for the new paths. - [x] Make sure the implementation is backwards compatible - [x] Evaluate if there are any bad consequence of letting the query continue rather than finish it at first record found. - [x] Bake in versi the new changes. --------- Signed-off-by: Alexandru Gheorghe <[email protected]> Co-authored-by: Dmitry Markin <[email protected]> Co-authored-by: Alexandru Vasile <[email protected]>

alexggh · 2024-07-29T13:33:23Z

Problem mitigated with: 6720279

…e robust (paritytech#3786) In the case when nodes don't persist their node-key or they want to generate a new one while being in the active set, things go wrong because both the old addresses and the new ones will still be present in DHT, so because of the distributed nature of the DHT both will survive in the network untill the old ones expires which is 36 hours. Nodes in the network will randomly resolve the authorithy-id to the old address or the new one. More details in: paritytech#3673 This PR proposes we mitigate this problem, by: 1. Let the query for a DHT key retrieve more than one results(4), that is also bounded by the replication factor which is 20, currently we interrupt the querry on the first result. ~2. Modify the authority-discovery service to keep all the discovered addresses around for 24h since they last seen an address.~ ~3. Plumb through other subsystems where the assumption was that an authorithy-id will resolve only to one PeerId. Currently, the authorithy-discovery keeps just the last record it received from DHT and queries the DHT every 10 minutes. But they could always receive only the old address, only the new address or a flip-flop between them depending on what node wins the race to provide the record~ 2. Extend the `SignedAuthorityRecord` with a signed creation_time. 3. Modify authority discovery to keep track of nodes that sent us old record and once we are made aware of a new record update the nodes we know about with the new record. 4. Update gossip-support to try resolve authorities more often than every session. ~This would gives us a lot more chances for the nodes in the networks to also discover not only the old address of the node but also the new one and should improve the time it takes for a node to be properly connected in the network. The behaviour won't be deterministic because there is no guarantee the all nodes will see the new record at least once, since they could query only nodes that have the old one.~ ## TODO - [x] Add unittests for the new paths. - [x] Make sure the implementation is backwards compatible - [x] Evaluate if there are any bad consequence of letting the query continue rather than finish it at first record found. - [x] Bake in versi the new changes. --------- Signed-off-by: Alexandru Gheorghe <[email protected]> Co-authored-by: Dmitry Markin <[email protected]> Co-authored-by: Alexandru Vasile <[email protected]>

alexggh added this to parachains team board Mar 13, 2024

github-project-automation bot moved this to Backlog in parachains team board Mar 13, 2024

alexggh mentioned this issue Mar 13, 2024

Kusama parachains block production slowed down #3314

Closed

2 tasks

alexggh self-assigned this Mar 19, 2024

alexggh moved this from Backlog to In Progress in parachains team board Mar 19, 2024

alexggh mentioned this issue Mar 20, 2024

rococo asset-hub and others don't always advertise their collations #3519

Open

1 task

alexggh mentioned this issue Mar 21, 2024

authorithy-discovery: Make changing of peer-id while active a bit more robust #3786

Merged

4 tasks

alexggh mentioned this issue Mar 27, 2024

Prevent accidental change of network-key for active authorities #3852

Merged

2 tasks

kamilsa mentioned this issue Apr 15, 2024

AuthorityDiscovery republish qdrvm/kagome#1358

Closed

alexggh mentioned this issue Apr 29, 2024

Polkadot validator stops gaining points #4321

Closed

alexggh closed this as completed Jul 29, 2024

github-project-automation bot moved this from In Progress to Completed in parachains team board Jul 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make polkadot behave correctly if node changes peer-id at restart #3673

Make polkadot behave correctly if node changes peer-id at restart #3673

alexggh commented Mar 13, 2024 •

edited

Loading

bkchr commented Mar 13, 2024

alexggh commented Mar 13, 2024

alexggh commented Mar 20, 2024 •

edited

Loading

bkchr commented Mar 20, 2024

alexggh commented Mar 20, 2024

alexggh commented Mar 21, 2024

dmitry-markin commented Mar 25, 2024 •

edited

Loading

lexnv commented Mar 25, 2024

alexggh commented Mar 25, 2024

alexggh commented Mar 25, 2024

alexggh commented Mar 25, 2024

alexggh commented Mar 27, 2024

alexggh commented Jul 29, 2024

Make polkadot behave correctly if node changes peer-id at restart #3673

Make polkadot behave correctly if node changes peer-id at restart #3673

Comments

alexggh commented Mar 13, 2024 • edited Loading

Work-arrounds

[WIP] Things that need fixing

bkchr commented Mar 13, 2024

alexggh commented Mar 13, 2024

alexggh commented Mar 20, 2024 • edited Loading

Why does this happen ?

What can we do ?

bkchr commented Mar 20, 2024

alexggh commented Mar 20, 2024

alexggh commented Mar 21, 2024

dmitry-markin commented Mar 25, 2024 • edited Loading

lexnv commented Mar 25, 2024

alexggh commented Mar 25, 2024

alexggh commented Mar 25, 2024

alexggh commented Mar 25, 2024

alexggh commented Mar 27, 2024

alexggh commented Jul 29, 2024

alexggh commented Mar 13, 2024 •

edited

Loading

alexggh commented Mar 20, 2024 •

edited

Loading

dmitry-markin commented Mar 25, 2024 •

edited

Loading