Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Pause Kademlia if too many connections #4828

Merged
merged 4 commits into from
Feb 11, 2020
Merged

Conversation

tomaka
Copy link
Contributor

@tomaka tomaka commented Feb 4, 2020

Substrate currently opens a lot of TCP connections.
I'm going to properly investigate why tomorrow, but here's a small PR that makes sense to me: we should stop the discovery process if we have a lot of existing connections.

@tomaka tomaka added A0-please_review Pull request needs code review. B0-silent Changes should not be mentioned in any release notes labels Feb 4, 2020
@tomaka tomaka requested a review from romanb February 5, 2020 08:41
@@ -232,6 +232,7 @@ impl<B: BlockT + 'static, S: NetworkSpecialization<B>, H: ExHashT> NetworkWorker
TransportConfig::MemoryOnly => false,
TransportConfig::Normal { allow_private_ipv4, .. } => allow_private_ipv4,
},
u64::from(params.network_config.out_peers) * 2,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be a a fixed number of additional discovery connections, rather than a x2. Once we reach the target number of peers, it is fine for discovery to be less aggressive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it for out_peers + 15. The 15 is a bit magic, but I think this option is a bit too specific to warrant a field in the configuration.

Copy link
Contributor

@romanb romanb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Substrate currently opens a lot of TCP connections.

What is "a lot" in terms of number of established connections to number of (online) peers? I assume you mean a lot of connection attempts, i.e not actually established connections? Connection attempts can be plenty, especially in "small" (in Kademlia scale) networks where the buckets are largely filled with disconnected nodes because there are few stable, highly available nodes in the network. A disconnected node is not removed from a bucket unless a connected node takes its place, for the usual reasons of avoiding easy vulnerability to bucket flushing as a result of (temporary, but maybe prolonged and maliciously triggered) network connectivity issues. So connection attempts to many disconnected nodes can happen repeatedly and indefinitely, at least every 60 seconds with this discovery mechanism and it may indeed be sensible to keep the discovery query frequency in a way inversely proportional to the number of already connected peers. A hard threshold as done here can be a start, I guess.

@arkpar
Copy link
Member

arkpar commented Feb 5, 2020

Substrate currently opens a lot of TCP connections.

What is "a lot" in terms of number of established connections to number of (online) peers?

Devops reported ~1500 ESTABLISHED TCP connections on the validator nodes.

Regardless, there should be a configurable limit on both:

  1. Maximum number of TCP connections the node can make (no matter if these are for discovery or anything else)
  2. The rate of new connections per seconds.

Copy link
Member

@arkpar arkpar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@andresilva andresilva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changes lgtm but not familiar with this code.

@@ -406,6 +418,10 @@ where
NetworkBehaviourAction::GenerateEvent(event) => {
match event {
MdnsEvent::Discovered(list) => {
if self.num_connections >= self.discovery_only_if_under_num {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not familiar with this code so might be a stupid question. If we already discovered the node why not keep it? Kademlia and sub protocol account for the peer limits separately right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it doesn't really matter for mDNS.
I applied the change to mDNS as well because the variable name is about stopping discovery altogether, and not just Kademlia.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A0-please_review Pull request needs code review. B0-silent Changes should not be mentioned in any release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants