Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

client/cli/src/config: Warn on low file descriptor limit #6956

Merged
3 commits merged into from
Aug 31, 2020
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 18 additions & 5 deletions client/cli/src/config.rs
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ use crate::{
init_logger, DatabaseParams, ImportParams, KeystoreParams, NetworkParams, NodeKeyParams,
OffchainWorkerParams, PruningParams, SharedParams, SubstrateCli,
};
use log::warn;
use names::{Generator, Name};
use sc_client_api::execution_extensions::ExecutionStrategies;
use sc_service::config::{
Expand All @@ -38,9 +39,12 @@ use std::path::PathBuf;
/// The maximum number of characters for a node name.
pub(crate) const NODE_NAME_MAX_LENGTH: usize = 64;

/// default sub directory to store network config
/// Default sub directory to store network config.
pub(crate) const DEFAULT_NETWORK_CONFIG_PATH: &'static str = "network";

/// The recommended open file descriptor limit to be configured for the process.
const RECOMMENDED_OPEN_FILE_DESCRIPTOR_LIMIT: u64 = 10_000;

/// Default configuration values used by Substrate
///
/// These values will be used by [`CliConfiguritation`] to set
Expand Down Expand Up @@ -531,17 +535,26 @@ pub trait CliConfiguration<DCV: DefaultConfigurationValues = ()>: Sized {
///
/// This method:
///
/// 1. Set the panic handler
/// 2. Raise the FD limit
/// 3. Initialize the logger
/// 1. Sets the panic handler
/// 2. Initializes the logger
/// 3. Raises the FD limit
fn init<C: SubstrateCli>(&self) -> Result<()> {
let logger_pattern = self.log_filters()?;

sp_panic_handler::set(&C::support_url(), &C::impl_version());

fdlimit::raise_fd_limit();
init_logger(&logger_pattern);

if let Some(new_limit) = fdlimit::raise_fd_limit() {
if new_limit < RECOMMENDED_OPEN_FILE_DESCRIPTOR_LIMIT {
warn!(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should prevent starting the node at all when the file descriptors are below a certain limit. Maybe below 5k?

The warning will otherwise not seen by anyone.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1024 descriptors should really be enough for all our purposes under default settings for most users. Blocking startup on this does not seem reasonable. The number of file descriptors required does really depends on the settings. If the user wishes to serve thousands of peers and RPC clients at the same time they might need even higher number.
I'd put 1024 for the warning limit. If substrate or polkadot currently requires more with default settings this is a bug, that should be fixed not by a warning, but with reducing the number of open files and sockets.

Copy link
Contributor

@tomaka tomaka Aug 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1024 descriptors should really be enough for all our purposes under default settings for most users.

In the current plan for the Polkadot networking, all validators would be permanently connected to all other validators. So if we have 1000 validators on the network, each validator would maintain 999 TCP connections all the time. 1024 doesn't seem nearly enough.

If substrate or polkadot currently requires more with default settings this is a bug

How is it a bug to have more than 1024 descriptors? It's akin to saying "if we use more than 2 GiB of memory, that is a bug". These limits are arbitrary. The Linux kernel is capable of serving waaaay more than 1024 sockets, and I don't see what limiting ourselves to 1024 brings other than making our life more difficult.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I was saying, under default settings we should not be using that many. The default peer limit is 50. Database requires a constant number of handles of about a 100. RPC connections are limited to 500 IIRC. If we are still using more than 1024 handles I'd like to know where and for which purposes. Any unaccounted leaking of file sockets I'd consider to be a bug.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if we have 1000 validators on the network, each validator would maintain 999 TCP connections all the time. 1024 doesn't seem nearly enough.

I was clearly saying "under default settings for most user". Validators are not most users and even for them we won't be maintaining 999 connections. This is simply not scalable. We would need to select a subset and invent some kind of routing when we get to such numbers.

As a simple user, who only wishes to send a transaction, I don't expect the client to open 1000 connections and run out of file handles. If it does that, I'd consider it to be a bug.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is simply not scalable. We would need to select a subset and invent some kind of routing when we get to such numbers.

We (by "we", I mean me but mostly the Web3 foundation) have known about that problem for probably two years now, but nobody ever came up with a concrete idea.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Validators are not most users and even for them we won't be maintaining 999 connections. This is simply not scalable.

@arkpar @tomaka I am having difficulties understanding why 1_000 connections per validator in itself is problematic. Can you expand on what you are basing your assumption on that 1_000 TCP connections per validator is not scalable? My question refers to the TCP connections themselves. I am not suggesting to send every message to everyone.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was referring to the validator messaging protocol. Which is currently sending every message to every validator. If we switch that to routing at some point there won't be any need to keep connections that are not used.

"Low open file descriptor limit configured for the process. \
Current value: {:?}, recommended value: {:?}.",
new_limit, RECOMMENDED_OPEN_FILE_DESCRIPTOR_LIMIT,
);
}
}

Ok(())
}
}
Expand Down