-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Pause Kademlia if too many connections #4828
Conversation
client/network/src/service.rs
Outdated
@@ -232,6 +232,7 @@ impl<B: BlockT + 'static, S: NetworkSpecialization<B>, H: ExHashT> NetworkWorker | |||
TransportConfig::MemoryOnly => false, | |||
TransportConfig::Normal { allow_private_ipv4, .. } => allow_private_ipv4, | |||
}, | |||
u64::from(params.network_config.out_peers) * 2, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be a a fixed number of additional discovery connections, rather than a x2. Once we reach the target number of peers, it is fine for discovery to be less aggressive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed it for out_peers + 15
. The 15
is a bit magic, but I think this option is a bit too specific to warrant a field in the configuration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Substrate currently opens a lot of TCP connections.
What is "a lot" in terms of number of established connections to number of (online) peers? I assume you mean a lot of connection attempts, i.e not actually established connections? Connection attempts can be plenty, especially in "small" (in Kademlia scale) networks where the buckets are largely filled with disconnected nodes because there are few stable, highly available nodes in the network. A disconnected node is not removed from a bucket unless a connected node takes its place, for the usual reasons of avoiding easy vulnerability to bucket flushing as a result of (temporary, but maybe prolonged and maliciously triggered) network connectivity issues. So connection attempts to many disconnected nodes can happen repeatedly and indefinitely, at least every 60 seconds with this discovery mechanism and it may indeed be sensible to keep the discovery query frequency in a way inversely proportional to the number of already connected peers. A hard threshold as done here can be a start, I guess.
Devops reported ~1500 ESTABLISHED TCP connections on the validator nodes. Regardless, there should be a configurable limit on both:
|
Co-Authored-By: Toralf Wittner <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changes lgtm but not familiar with this code.
@@ -406,6 +418,10 @@ where | |||
NetworkBehaviourAction::GenerateEvent(event) => { | |||
match event { | |||
MdnsEvent::Discovered(list) => { | |||
if self.num_connections >= self.discovery_only_if_under_num { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not familiar with this code so might be a stupid question. If we already discovered the node why not keep it? Kademlia and sub protocol account for the peer limits separately right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it doesn't really matter for mDNS.
I applied the change to mDNS as well because the variable name is about stopping discovery altogether, and not just Kademlia.
Substrate currently opens a lot of TCP connections.
I'm going to properly investigate why tomorrow, but here's a small PR that makes sense to me: we should stop the discovery process if we have a lot of existing connections.