Pause Kademlia if too many connections #4828

tomaka · 2020-02-04T19:26:22Z

Substrate currently opens a lot of TCP connections.
I'm going to properly investigate why tomorrow, but here's a small PR that makes sense to me: we should stop the discovery process if we have a lot of existing connections.

arkpar · 2020-02-05T09:14:35Z

client/network/src/service.rs

@@ -232,6 +232,7 @@ impl<B: BlockT + 'static, S: NetworkSpecialization<B>, H: ExHashT> NetworkWorker
 					TransportConfig::MemoryOnly => false,
 					TransportConfig::Normal { allow_private_ipv4, .. } => allow_private_ipv4,
 				},
+				u64::from(params.network_config.out_peers) * 2,


I think this should be a a fixed number of additional discovery connections, rather than a x2. Once we reach the target number of peers, it is fine for discovery to be less aggressive.

I changed it for out_peers + 15. The 15 is a bit magic, but I think this option is a bit too specific to warrant a field in the configuration.

romanb

Substrate currently opens a lot of TCP connections.

What is "a lot" in terms of number of established connections to number of (online) peers? I assume you mean a lot of connection attempts, i.e not actually established connections? Connection attempts can be plenty, especially in "small" (in Kademlia scale) networks where the buckets are largely filled with disconnected nodes because there are few stable, highly available nodes in the network. A disconnected node is not removed from a bucket unless a connected node takes its place, for the usual reasons of avoiding easy vulnerability to bucket flushing as a result of (temporary, but maybe prolonged and maliciously triggered) network connectivity issues. So connection attempts to many disconnected nodes can happen repeatedly and indefinitely, at least every 60 seconds with this discovery mechanism and it may indeed be sensible to keep the discovery query frequency in a way inversely proportional to the number of already connected peers. A hard threshold as done here can be a start, I guess.

client/network/src/discovery.rs

arkpar · 2020-02-05T10:07:16Z

Substrate currently opens a lot of TCP connections.

What is "a lot" in terms of number of established connections to number of (online) peers?

Devops reported ~1500 ESTABLISHED TCP connections on the validator nodes.

Regardless, there should be a configurable limit on both:

Maximum number of TCP connections the node can make (no matter if these are for discovery or anything else)
The rate of new connections per seconds.

Co-Authored-By: Toralf Wittner <[email protected]>

arkpar

https://github.com/paritytech/substrate/pull/4828/files#r375135891

andresilva

changes lgtm but not familiar with this code.

andresilva · 2020-02-05T16:41:23Z

client/network/src/discovery.rs

@@ -406,6 +418,10 @@ where
 				NetworkBehaviourAction::GenerateEvent(event) => {
 					match event {
 						MdnsEvent::Discovered(list) => {
+							if self.num_connections >= self.discovery_only_if_under_num {


Not familiar with this code so might be a stupid question. If we already discovered the node why not keep it? Kademlia and sub protocol account for the peer limits separately right?

I guess it doesn't really matter for mDNS.
I applied the change to mDNS as well because the variable name is about stopping discovery altogether, and not just Kademlia.

Pause Kademlia if too many connections

8b32e29

tomaka added A0-please_review Pull request needs code review. B0-silent Changes should not be mentioned in any release notes labels Feb 4, 2020

tomaka mentioned this pull request Feb 4, 2020

Fix missing overrides of NetworkBehaviour #4829

Merged

Fix test

7785f4f

tomaka requested a review from romanb February 5, 2020 08:41

arkpar reviewed Feb 5, 2020

View reviewed changes

romanb approved these changes Feb 5, 2020

View reviewed changes

twittner reviewed Feb 5, 2020

View reviewed changes

client/network/src/discovery.rs Outdated Show resolved Hide resolved

Update client/network/src/discovery.rs

a303a5f

Co-Authored-By: Toralf Wittner <[email protected]>

arkpar suggested changes Feb 5, 2020

View reviewed changes

andresilva reviewed Feb 5, 2020

View reviewed changes

Change the limit

9cd7a15

arkpar approved these changes Feb 11, 2020

View reviewed changes

arkpar merged commit 60f0569 into paritytech:master Feb 11, 2020

tomaka deleted the pause-kademlia branch February 11, 2020 11:40

tomaka mentioned this pull request Feb 21, 2020

Polkadot service stops listening on port 30333 when reaching some threshold of load. paritytech/polkadot#856

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pause Kademlia if too many connections #4828

Pause Kademlia if too many connections #4828

tomaka commented Feb 4, 2020

arkpar Feb 5, 2020

tomaka Feb 11, 2020

romanb left a comment •

edited

Loading

arkpar commented Feb 5, 2020 •

edited

Loading

arkpar left a comment

andresilva left a comment

andresilva Feb 5, 2020

tomaka Feb 5, 2020

Pause Kademlia if too many connections #4828

Pause Kademlia if too many connections #4828

Conversation

tomaka commented Feb 4, 2020

arkpar Feb 5, 2020

Choose a reason for hiding this comment

tomaka Feb 11, 2020

Choose a reason for hiding this comment

romanb left a comment • edited Loading

Choose a reason for hiding this comment

arkpar commented Feb 5, 2020 • edited Loading

arkpar left a comment

Choose a reason for hiding this comment

andresilva left a comment

Choose a reason for hiding this comment

andresilva Feb 5, 2020

Choose a reason for hiding this comment

tomaka Feb 5, 2020

Choose a reason for hiding this comment

romanb left a comment •

edited

Loading

arkpar commented Feb 5, 2020 •

edited

Loading