-
Notifications
You must be signed in to change notification settings - Fork 911
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use closefrom
and remove our fds max limit.
#4872
Conversation
99b0ce6
to
5682ea0
Compare
I think increasing to 4k in the worst case is also the Right Thing, which means this won't make things worse, I say move it out of draft (with the simple fixes preferably) and we use the -rcs for further testing... |
Removed from 0.10.2 milestone for now: the CI FreeBSD smoke test shows this line: https://github.com/ElementsProject/lightning/pull/4872/checks?check_run_id=3935023800#step:3:4379
FreeBSD should support |
5682ea0
to
f58a403
Compare
Right right, Also did the |
Configurator test on FreeBSD now works! https://github.com/ElementsProject/lightning/pull/4872/checks?check_run_id=3958492505#step:3:4385 Will bring out of draft. |
Signed-off-by: ZmnSCPxj jxPCSnmZ <[email protected]> Signed-off-by: Rusty Russell <[email protected]>
This also inadvertently fixes a latent bug: before this patch, in the `subd` function in `lightningd/subd.c`, we would close `execfail[1]` *before* doing an `exec`. We use an EOF on `execfail[1]` as a signal that `exec` succeeded (the fd is marked CLOEXEC), and otherwise use it to pump `errno` to the parent. The intent is that this fd should be kept open until `exec`, at which point CLOEXEC triggers and close that fd and sends the EOF, *or* if `exec` fails we can send the `errno` to the parent process vua that pipe-end. However, in the previous version, we end up closing that fd *before* reaching `exec`, either in the loop which `dup2`s passed-in fds (by overwriting `execfail[1]` with a `dup2`) or in the "close everything" loop, which does not guard against `execfail[1]`, only `dev_disconnect_fd`.
Fixes: ElementsProject#4868 ChangeLog-Fixed: We now no longer self-limit the number of file descriptors (which limits the number of channels) in sufficiently modern systems, or where we can access `/proc` or `/dev/fd`. We still self-limit on old systems where we cannot find the list of open files on `/proc` or `/dev/fd`, so if you need > ~4000 channels, upgrade or mount `/proc`.
f58a403
to
9d5fbc1
Compare
Included in upstream ccan/ trivial rebase. |
This PR currently still fails an edge case (which the existing version would also fail at):
I think the PR can be pushed as-is, since it does no worse than the existing code (and maybe we have some protection already against the above...? Do we pass more than one
-- Edit: This is the actual correct solution
|
CI failure seems unrelated, it is pointing at I have code now to implement the fix to the really edgy edge case in the previous comment, but am wary of adding it currently, because I think we will not hit that edge case in practice anyway (because if so |
yep, I think there is any method to restart only one check |
No, you have to restart the entire thing :( known GH actions limitation it seems... |
ACK 9d5fbc1 |
If anyone is interested and has a MacOS, here is some code I would like to have compiled and executed: #include<errno.h>
#include<libproc.h>
#include<stdbool.h>
#include<stdio.h>
#include<stdlib.h>
#include<sys/types.h>
#include<unistd.h>
static
void closefrom(int fromfd)
{
int saved_errno = errno;
int maxfd;
/* #ifdef HAVE_PROC_PIDINFO */
int orig_sz, sz, i, num;
pid_t pid = getpid();
struct proc_fdinfo *infos;
orig_sz = proc_pidinfo(pid, PROC_PIDLISTFDS, 0, NULL, 0);
{ /* debug */
if (orig_sz >= 0)
printf("proc_pidinfo(NULL, 0) ok\n");
} /* debug */
if (orig_sz == 0)
return;
else if ((orig_sz > 0) && ((infos = malloc(orig_sz)) != NULL)) {
sz = proc_pidinfo(pid, PROC_PIDLISTFDS, 0,
infos, orig_sz);
if (sz >= 0 && sz <= orig_sz) {
{ /* debug */
printf("proc_pidinfo(infos, orig_sz) ok\n");
} /* debug */
num = sz / sizeof(struct proc_fdinfo);
for (i = 0; i < sz; ++i) {
if (infos[i].proc_fd >= fromfd)
close(infos[i].proc_fd);
}
free(infos);
goto quit;
} else
free(infos);
}
/* #endif // HAVE_PROC_PIDINFO */
{ /* debug */
if (orig_sz >= 0)
printf("fell back oh no!\n");
} /* debug */
maxfd = sysconf(_SC_OPEN_MAX);
for (; fromfd < maxfd; ++fromfd)
close(fromfd);
quit:
errno = saved_errno;
}
static
bool can_proc_pidinfo(void)
{
/* #ifdef HAVE_PROC_PIDINFO */
int sz = proc_pidinfo(getpid(), PROC_PIDLISTFDS, 0, NULL, 0);
if (sz >= 0)
return true;
/* #endif // HAVE_PROC_PIDINFO */
return false;
}
static
bool closefrom_may_be_slow(void)
{
if (can_proc_pidinfo())
return false;
else
return true;
}
int main(void)
{
int fds[0];
ssize_t wres;
char buf = '\0';
printf("closefrom_may_be_slow: %s\n",
closefrom_may_be_slow() ? "true" : "false");
if (pipe(fds) < 0) {
perror("pipe");
return 1;
}
closefrom(STDERR_FILENO + 1);
/* Writing to the write end should fail. */
do {
wres = write(fds[1], &buf, 1);
} while ((wres < 0) && (errno == EINTR));
if ((wres < 0) && (errno == EBADF)) {
printf("successfully closed by closefrom!\n");
return 0;
} else if (wres < 0) {
perror("write");
return 1;
} else {
printf("unexpected success on supposedly-closed socket!\n");
return 1;
}
} On MacOS we use Unfortunately, allocating off the stack is difficult. Variable-length arrays are in C99, but |
@ZmnSCPxj: If you don't want to use |
Certainly a possibility if we consider the GC-with-contiguous-address-space thing, but that is marginal; |
|
@NicolasDorier thanks! |
Fixes #4868
This also inadvertently fixes a latent bug: before this patch, in the
subd
function inlightningd/subd.c
, we would closeexecfail[1]
before doing an
exec
.We use an EOF on
execfail[1]
as a signal thatexec
succeeded (thefd is marked CLOEXEC), and otherwise use it to pump
errno
to theparent.
The intent is that this fd should be kept open until
exec
, at whichpoint CLOEXEC triggers and close that fd and sends the EOF, or if
exec
fails we can send theerrno
to the parent process vua thatpipe-end.
However, in the previous version, we end up closing that fd before
reaching
exec
, either in the loop whichdup2
s passed-in fds (byoverwriting
execfail[1]
with adup2
) or in the "close everything"loop, which does not guard against
execfail[1]
, onlydev_disconnect_fd
.Ping @whitslack.
Also ping @NicolasDorier and @cdecker --- on Linux < 5.9, this new code uses
/proc/$$/fd
to see the open file descriptors (and avoid looping all the way to 1048576), but I do not know if/proc
is limited or inaccessible within Docker, and the original #2977 was triggered by Docker on root. Cursory search suggests it should be accessible but I would prefer expert opinion.We can argue that we can just "kick the can" and bump up the max limit to 4096, but at some point @whitslack is going to have 10,000 channels all by himself, and besides
closefrom
is "the future" and should be implementable on more systems at some point, maybe. More to the point: if we have N channels, the limit has to be at least N, and we need to launch N subdaemons (one for each channel), and iterating from 0 to N takes N steps, which we do N times at each subdaemon, meaning kicking the can and just raising the limit is O( n^2 ).Created as draft for now; in particular the code is only minimally tested on Linux 5.11 (and I tried disabling
HAVE_NR_CLOSE_RANGE
to test the/proc
scanning, which should work on a lot of systems that are not runninglightningd
in achroot
jail). It should be tested on more systems and container types, and while I can try to figure out how to get the tests running on FreeBSD in a VM, I do not know if I can legally run a MacOS VM if I refuse to purchase anything from Apple.