-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Process.WaitForExit() gets slower the larger your open file limit (ulimit -n) is on Linux #6555
Comments
This might be caused by the need of the child process to close the file descriptors on fork, to prevent the child from accessing the parent's descriptors. What we could do is set close-on-exec flag on any file descriptor we open, to avoid paying this price at fork/exec time. |
relevant code that Miguel mentioned: mono/mono/metadata/w32process-unix.c Lines 1987 to 1991 in 5839d7b
|
I should point out that the |
I would definitely prefer the close-on-exec flag over what we have today. Let me work on a PR for that. |
The problem I am running into is |
how about adding |
@lewurm that would work. The slight issue is |
This area is messed up but better on Windows -- the flag/default for handle inheritance is reversed. So you tend not to inherit many builds. However if you do mark a handle for inheritance, it has two other problems, actually likely also on Posix. The problem is two part. First, imagine a multi threaded program, that runs child processes, an sometimes wants some inherited handles. A real example is Visual Studio running git and wanting to redirect stdout/stderr. Prior to Vista, any handle marked inheritable, would be inherited by all child processes recieving any inheritable handles. You couldn't direct specific handles to specific processes. Vista extended the CreateProcess API to address this, but inadequately. While you can direct specific handles to specific processes, they still have to be marked inheritable, and they will still be inherited through any CreateProcess using the old API. The fix is obvious in retrospect. The extended CreateProcess API should not require the handles to be marked inheritable. It is implied by passing them in the new parameters. This does cause real problems. An awkward workaround that worked in my code, but did not work with Visual Studio + git is that you can also specify the parent, from which to inherit. So you create a dummy parent, duplicate your handles into it, inheritable, and then use it as the parent. For Visual Studio + git they modified git to workaround, i.e. to take files on the command line to redirected into, instead of using stdout/stderr. The problem is the kernel APIs assume either usermode is single threaded, or that within a single process, there is some sort of oversight over all the CreateProcess calls, like they can all share a lock. But large programs don't work that way. |
On Linux, we can use https://www.freebsd.org/cgi/man.cgi?query=open&manpath=SuSE+Linux/i386+11.3 It was designed to avoid a race condition in multi-threaded programs that do https://mail.gnome.org/archives/gtk-devel-list/2015-March/msg00038.html It seems that it might not be worth fixing this issue. Almost every program on a system with those ulimits will run into this problem, closing all the descriptors is a common Unix idiom, and you can see these |
I understand the open+fcntl problem, but what is wrong with open(cloexec?) And, how about using posix_spawn? It looks promising on Macosx at least, you can list specifically what fd to inherit -- like Windows tried but messed up. |
open(cloexec) has the problem mentioned on my previous post - and additionally, this assumes we get every handle to work properly, any mistakes here would be a problem for our users. As for As for Linux, it is implemented as a userlevel method, not a system call, so the user-level capability will just loop in the same way ours does. Does not seem worth the confusion |
I understand the need to get them all. I thought someone proposed we ban open and instead mono_open or such, that gets it right. Perhaps that is still fragile -- how to ban open? pragma poison in glib.h maybe? If you do get them all, I still don't understand the problem, with open(cloexec). As well though, maybe we should just use clone? And don't give it the flag to inherit any files? And then if we need to inherit some, dup them right after? Do we even need to inherit any? OSX really defaults to 256 file limit, wow, wierd to have 8 bit limits on 64bit systems. $ cat 1.c && gcc 1.c && ./a.out int main() There might still be a problem of other code in the process calling open or fork, but mono-only scenarios would work? So I have to look at what is fork vs. clone, easy to replace or not. |
Hm I don't see if there is a way to dup an fd into a process other than via fork. :( Win32 DuplicateHandle -- lots of cross process operations are in Win32 (VirtualAllocEx, CreateRemoteThread, VirtualQueryEx, ReadProcessMemory), but seemingly almost none in Posix. :( |
Rust appears to have taken the cloexec in every open approach. |
It is not just open, it is on every file descriptor opened, both via managed and unmanaged code. The rust approach has both the problem mentioned above by the GNome guys, and one that is even worse and unaccounted for. Even if we were to cleanup all of Mono, this would not cover any file descriptor opened by libraries (for example, Gtk+, Cocoa, and so on). So we should removed some scenarios, not all of them, so correct execution would still require us to close everything on fork. |
Ah, everyone has to pass the flag for every open/dup/etc. Is clone w/o any file inheritance too draconion, need stdin/out/err? I think so. It looks like Apple almost got it right but not quite. |
This is reproducible on AIX, and if you have your ulimits to unlimited, it starts at Stupid idea for a workaround when procfs is available and CLOEXEC variants are off the table: enumerate |
Looks like CoreCLR addresses this mostly, fcntl(FD_SETFD, FD_CLOEXEC) after every open. |
On systems with a large file descriptor limit, Mono takes a very long time to spawn a new process; with informal testing on the AIX CI builder, (with a POWER7) it took ~30 minutes. This is obviously very undesirable, but hand our is forced - libraries the user might be calling could be creating non-CLOEXEC files we're unaware of. As such, we started from the FD limit and worked our way down until we hit stdio. Using APIs such as posix_spawn aren't desirable as we do some fiddling with the child before we exec; and even then, closing FDs with posix_spawn relies on non-standard file actions like Solaris' addclosefrom not present on many systems. (All of this is unnecessary on Windows, of course.) This presents an alternative way (currently only implemented on AIX but with notes how for other platforms) to try to close the child's loose FDs before exec; by trying to get the highest number FD in use, then work our way down. In the event we can't, we simply fall back to the old logic. See mono#6555 for a discussion and the initial problem being mitigated.
…10761) * Mitigation for spawn FD closing taking forever with a big FD limit On systems with a large file descriptor limit, Mono takes a very long time to spawn a new process; with informal testing on the AIX CI builder, (with a POWER7) it took ~30 minutes. This is obviously very undesirable, but hand our is forced - libraries the user might be calling could be creating non-CLOEXEC files we're unaware of. As such, we started from the FD limit and worked our way down until we hit stdio. Using APIs such as posix_spawn aren't desirable as we do some fiddling with the child before we exec; and even then, closing FDs with posix_spawn relies on non-standard file actions like Solaris' addclosefrom not present on many systems. (All of this is unnecessary on Windows, of course.) This presents an alternative way (currently only implemented on AIX but with notes how for other platforms) to try to close the child's loose FDs before exec; by trying to get the highest number FD in use, then work our way down. In the event we can't, we simply fall back to the old logic. See #6555 for a discussion and the initial problem being mitigated. * Use an another strategy of closing only known to be open handles ...on FreeBSD and AIX. Use the kinfo_getfiles library call on FreeBSD and only close what's safe to close. On AIX, use the third and fourth arguments to getprocs to check what's open. However, the array to get all the handles takes 1 MB, so allocate it on the heap; like what kinfo_getfiles does. We don't need to free these as the child will exec or exit, which blows it all away. The previous strategy from previous commit is still used and on AIX, enhanced. * Add Linux strategy to fork FD close Tested on WSL, shows benefits with big FD ulimit.
Fixed by 0d29dfb. |
…ono/mono#10761) * Mitigation for spawn FD closing taking forever with a big FD limit On systems with a large file descriptor limit, Mono takes a very long time to spawn a new process; with informal testing on the AIX CI builder, (with a POWER7) it took ~30 minutes. This is obviously very undesirable, but hand our is forced - libraries the user might be calling could be creating non-CLOEXEC files we're unaware of. As such, we started from the FD limit and worked our way down until we hit stdio. Using APIs such as posix_spawn aren't desirable as we do some fiddling with the child before we exec; and even then, closing FDs with posix_spawn relies on non-standard file actions like Solaris' addclosefrom not present on many systems. (All of this is unnecessary on Windows, of course.) This presents an alternative way (currently only implemented on AIX but with notes how for other platforms) to try to close the child's loose FDs before exec; by trying to get the highest number FD in use, then work our way down. In the event we can't, we simply fall back to the old logic. See mono/mono#6555 for a discussion and the initial problem being mitigated. * Use an another strategy of closing only known to be open handles ...on FreeBSD and AIX. Use the kinfo_getfiles library call on FreeBSD and only close what's safe to close. On AIX, use the third and fourth arguments to getprocs to check what's open. However, the array to get all the handles takes 1 MB, so allocate it on the heap; like what kinfo_getfiles does. We don't need to free these as the child will exec or exit, which blows it all away. The previous strategy from previous commit is still used and on AIX, enhanced. * Add Linux strategy to fork FD close Tested on WSL, shows benefits with big FD ulimit. Commit migrated from mono/mono@0d29dfb
Steps to Reproduce
csc test.cs
:$ (ulimit -n 1000; time mono test.exe)
$ (ulimit -n 100000; time mono test.exe)
$ (ulimit -n 1000000; time mono test.exe)
Current Behavior
It gets slower the higher your ulimit -n is.
Note: you may need to change your system settings (
/etc/security/limits.conf
) to allow higher limits.Expected Behavior
Not getting slower.
On which platforms did you notice this
[ ] macOS
[ X ] Linux, Ubuntu 14.04/16.04
[ ] Windows
Version Used: master, 5.8.088 and 4.2.1 (so either a very old bug or something outside our control)
This was the root cause behind an issue (#6537) we had on Jenkins because the Azure Linux VM builders had ulimit -n set to 1048576.
The text was updated successfully, but these errors were encountered: