Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

**BROKEN** Could not subdaemon channel: Too many open files #4868

Closed
whitslack opened this issue Oct 17, 2021 · 9 comments · Fixed by #4872
Closed

**BROKEN** Could not subdaemon channel: Too many open files #4868

whitslack opened this issue Oct 17, 2021 · 9 comments · Fixed by #4872

Comments

@whitslack
Copy link
Collaborator

Issue and Steps to Reproduce

My node was exhibiting some erroneous behavior due to failing to launch subdaemons due to hitting the open file descriptor resource limit. I tried raising the limit using ulimit as suggested by @cdecker in #4534 (comment), but that had no effect. (The lightningd process still had both soft and hard limits of 1024 open files.) Come to find out, these limits are actually hardcoded in lightning/lightning.c.

Please implement a configuration option to override this insufficient limit.

“640KB ought to be enough for anybody.” —Christian Decker

Thanks to @TheBlueMatt for bringing the failure to my attention.

getinfo output

This is C-Lightning 0.10.1.

@ZmnSCPxj
Copy link
Contributor

The "correct" way to fix this would be to change our closeing loop by instead maintaining a set of fds we should close on fork. I conjecture that all fds that lightningd "should" close after fork are passed to the ccan/io module, so maybe hooking there somehow would work? @rustyrussell ?

We could use the close_range thing you proposed in the other thread, but it would be much nicer if something similar also existed in MacOS and/or FreeBSD.

@ZmnSCPxj
Copy link
Contributor

Hmm! https://stackoverflow.com/a/918469

@ZmnSCPxj
Copy link
Contributor

So let me propose an internal library for closing fds without requiring (hopefully) some kind of large loop except for really old kernels.

/* Call at program init.
 * This is needed in order to open an `fd` "early" to
 * a directory of open `fd`s:
 * see #5 in https://stackoverflow.com/questions/899038/getting-the-highest-allocated-file-descriptor
 * On systems where that is not necessary, this is a no-op.
 */
void closeallexcept_init(void);

/* Call after `fork`, in the child process.
 * Closes all file descriptors except for those listed.
 * The given set of `fd`s **must** be sorted in order
 * from lowest to highest.
 */
void closeallexcept(int numfds, int fds_not_to_close[]);

@whitslack
Copy link
Collaborator Author

@ZmnSCPxj: Is there any reason you're not using O_CLOEXEC (and related) to avoid the need to manually close file descriptors before exec'ing? (One such reason might be that you're calling library code that holds open file descriptors privately and whose code you cannot modify.)

@ZmnSCPxj
Copy link
Contributor

Not all APIs expose O_CLOEXEC, possibly? Though there is fcntl F_SETFD for that... dunno really. I think we do not use libraries that keep fds around, but maybe we are just hedging against some future where we do use one?

And we do keep open some fds that talk to hsmd which is a privileged domain, so it might be safer to just close everything except what we explicitly want to pass to a subdaemon.

@ZmnSCPxj
Copy link
Contributor

Whoa whoa whoa!

So I was looking at our big nasty close loop and saw this:

lightning/lightningd/subd.c

Lines 169 to 173 in 09459a9

if (dev_disconnect_fd != -1) {
if (!move_fd(dev_disconnect_fd, 101))
goto child_errno_fail;
dev_disconnect_fd = 101;
}

@rustyrussell we hardcode a 101 here --- there is a small but distinct probability that an fd we want to pass in actually is 101. I presume dev_disconnect_fd usage is so rare, and "only in development" anyway, that this is considered acceptable, but still, maybe we can do something like the below?

/* Dup any extra fds first */
while ((fd = va_arg(*ap, int *)) != NULL) {
        int actual_fd = *fd;
        /* If this were stdin, we moved it above! */
        if (actual_fd == STDIN_FILENO)
               actual_fd = stdin_is_now;
       /* Move dev_disconnect_fd if it would be overwritten. */
       if (fdnum == dev_disconnect_fd) {
               int new_fd = dup(dev_disconnect_fd);
               if (new_fd < 0)
                       goto child_errno_fail;
               close(dev_disconnect_fd);
               dev_disconnect_fd = new_fd;
       }
       if (!move_fd(actual_fd, fdnum))
               goto child_errno_fail;
       fdnum++;
}

/* Move dev_disconnect_fd as well.  */
if (dev_disconnect_fd != -1) {
        if (dev_disconnect_fd != fdnum) {
                if (!move_fd(dev_disconnect_fd, fdnum))
                        goto child_errno_fail;
                dev_disconnect_fd = fdnum;
        }
        fdnum++;
}

Then the close loop afterward would be equivalent to a call to closefrom(fdnum).

Actually come to think of it there is also the chance that stdin_is_now could potentially be overwritten by the above remapping loop, so maybe it should be guarded as well?

So let me propose an alternate internal library for closefrom:

/* closefrom_compat_init
 *
 * Call at the start of a program before allocating any `fd`s.
 */
void closefrom_compat_init(void);

/* closefrom_compat_getfd
 *
 * The closefrom emulation may require an FD to be kept open.
 * You may query this fd, for example if you are remapping FDs
 * to specific numbers by dup2, you may want to check if
 * this facility is using that specific number.
 * @return -1 if the emulation requires no FD to be kept open,
 * or the FD that needs to be kept open otherwise.
 */
int closefrom_compat_getfd(void);

/* closefrom_compat_remapfd
 *
 * The FD that needs to be kept open may occupy a number
 * that is being remapped to via dup2.
 * This call changes the FD number to a different one that
 * is not currently in use.
 *
 * Precondition: closefrom_compat_getfd() should not
 * have returned -1.
 * Postcondition: closefrom_compat_getfd() will return a
 * different value. 
 */
void closefrom_compat_remapfd(void);

/* closefrom_compat
 *
 * Emulates the closefrom syscall as efficiently as possible
 * on the system.
 * 
 * @fromfd - The first fd to close, all other fds afterwards
 * are also closed.
 */
void closefrom_compat(int fromfd);

@ZmnSCPxj
Copy link
Contributor

Another issue is this bit:

/* Make (fairly!) sure all other fds are closed. */
int max = sysconf(_SC_OPEN_MAX);
for (int i = 3; i < max; i++)
if (i != execfail[1])
close(i);

This should also be adapted to use the closefrom emulation instead of that loop. However, that does imply that a /proc/self/fd/*-based approach is difficult to use, hmm. It also means that the closefrom emulation has to be part of ccan, not just lightningd (so we need to PR ccan first and then PR here).

Maybe make closefrom_compat_init optional, and fall back to iterating over sysconf(_SC_OPEN_MAX) (which can be as high as INT_MAX!) if closefrom_compat_init was not called. WDYT @rustyrussell ?

@ZmnSCPxj
Copy link
Contributor

Hmm actually we do not need closefrom_compat_init. I was thinking we needed it since if we try to open /proc/self/fd/ it may fail due to too many open fds, so we should open it at the start of the program and reserve our slot early.

But closefrom is going to close practically every fd anyway! So what we could do would be to try opening /proc/self/fd/ at closefrom_compat time, and if that fails due to EMFILE or ENFILE, do close(fromfd++), which should free up an fd for /proc/self/fd.

So we really only need a closefrom emulation, and a query closefrom_may_be_slow, and lightningd can call closefrom_may_be_slow and if that returns true only then limit the number of open file descriptors.

@ZmnSCPxj
Copy link
Contributor

Sent patches for closefrom to [email protected], heads up @rustyrussell !

ZmnSCPxj added a commit to ZmnSCPxj/lightning that referenced this issue Oct 19, 2021
ZmnSCPxj added a commit to ZmnSCPxj/lightning that referenced this issue Oct 19, 2021
ZmnSCPxj added a commit to ZmnSCPxj/lightning that referenced this issue Oct 19, 2021
ZmnSCPxj added a commit to ZmnSCPxj/lightning that referenced this issue Oct 19, 2021
ZmnSCPxj added a commit to ZmnSCPxj/lightning that referenced this issue Oct 19, 2021
Fixes: ElementsProject#4868

ChangeLog-Fixed: We now no longer self-limit the number of file descriptors (which limits the number of channels) in sufficiently modern systems, or where we can access `/proc`.  We still self-limit on old systems where we cannot find the list of open files on `/proc`, so if you need > 1000 channels, upgrade or mount `/proc`.
ZmnSCPxj added a commit to ZmnSCPxj/lightning that referenced this issue Oct 21, 2021
ZmnSCPxj added a commit to ZmnSCPxj/lightning that referenced this issue Oct 21, 2021
Fixes: ElementsProject#4868

ChangeLog-Fixed: We now no longer self-limit the number of file descriptors (which limits the number of channels) in sufficiently modern systems, or where we can access `/proc` or `/dev/fd`.  We still self-limit on old systems where we cannot find the list of open files on `/proc` or `/dev/fd`, so if you need > ~4000 channels, upgrade or mount `/proc`.
ccan-maintainers pushed a commit to rustyrussell/ccan that referenced this issue Oct 21, 2021
rustyrussell pushed a commit to ZmnSCPxj/lightning that referenced this issue Oct 21, 2021
Fixes: ElementsProject#4868

ChangeLog-Fixed: We now no longer self-limit the number of file descriptors (which limits the number of channels) in sufficiently modern systems, or where we can access `/proc` or `/dev/fd`.  We still self-limit on old systems where we cannot find the list of open files on `/proc` or `/dev/fd`, so if you need > ~4000 channels, upgrade or mount `/proc`.
cdecker pushed a commit that referenced this issue Oct 22, 2021
Fixes: #4868

ChangeLog-Fixed: We now no longer self-limit the number of file descriptors (which limits the number of channels) in sufficiently modern systems, or where we can access `/proc` or `/dev/fd`.  We still self-limit on old systems where we cannot find the list of open files on `/proc` or `/dev/fd`, so if you need > ~4000 channels, upgrade or mount `/proc`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants