-
-
Notifications
You must be signed in to change notification settings - Fork 356
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
File descriptors get passed down to ‘system’ child processes #12576
Comments
In http://gtoolkit.com we've added Perhaps this should be added to Pharo :-).
|
Ah yes, would be good to add that to Pharo: I see the implementation is in ‘GToolkit-Utility-File’, as an extension on Socket which delegates to a corresponding method on SqSocket. As for the alternative I had in mind of explicitly mapping the file descriptors, I’ve noted there’s a method |
I would strongly support integrating this in base Pharo, and in fact making it the default for new FDs (file and socket both), requiring a I see in the code that GT is using a custom primitive for
And the code will then work on most platforms without the plugin. This is taking advantage of undefined behavior, so the usual caveats about nasal demons apply, but it's a workable stopgap solution. Ultimately, given the architecture of the Pharo file/socket plugins, I imagine the solution would be a primitive as part of those plugins that performs an I could possibly even write those primitives—I've never written C, but I can read it well enough, and most of it would be boilerplate that can be copied from existing primitives—but I'd definitely want someone to look them over, and I can't actually compile the VM myself to test them (long story). Footnotes
|
Thanks Daniel. I tagged it so that Pablo can have a look. Now if people would play the game and contribute back to Pharo by sending PRs this would help us. But we will do it by ourselves. |
Thanks @Rinzwind for the extension link :). |
Like I said, I'd be open to collaborating on this—certainly I could fill in the image side of things, referring to a primitive that doesn't yet exist but has an obvious implementation, if someone with VM experience can do that half. I could probably struggle through the VM side, too (I see I can download artifacts from a CI build, so the fact that I can't build my own VM wouldn't have to stop me entirely...), but if someone with more VM experience has time, I'd rather work on other things in the image... |
I know. My remark was more general. |
My recommendation would be to stay close to POSIX on these things. It is normal that all fds are available in a child process. This is needed because a lot of things rely on processes share stdin/stdout/stderr and for other things. Putting a half baked idea in pharo might not be a good idea. @daniels220 If it is about the race condition with FD_CLOEXEC there exist O_CLOEXEC and SOCK_CLOEXEC on creation time to exactly remove that problem. So if there is an FFI problem with setting these flags we should fix this So I'm against changing a default behaviour to somehting new that just opens a lot of other problems. Parent processes can explicitly set FD_CLOEXEC before forking or child processes can clear them after the exec(). There is no such thing as a better default. Especially not if we depart from POSIX |
Thanks Norbert! |
Norbert, you have a point...I know I can perhaps be overeager in my opinions when I don't have experience of the full context (in this case, I haven't written C or similar, or dealt directly with POSIX APIs). Certainly it's fair to do some more research...but when I did a quick Google of "is it best practice to always use O_CLOEXEC", the top few results were mostly "yes," either "...unless you specifically know you need to share that FD," or "and if you do need to share one, clear FD_CLOEXEC immediately before fork and re-set it afterward, to avoid accidentally sharing it with a different child process". So this seems like a case where POSIX is held back by the need for backwards-compatibility, creating a footgun for new C programmers for the rest of time, while for Pharo, I think the number of programs that actually rely on sharing FDs (other than stdio) with child processes is...probably either zero or one, honestly. I would note also that you can't actually retrieve the FD of an open file without private-struct hijinks, so how would you even tell the child process what to use? I imagine it can examine the FD table itself, but that seems weird and error-prone. Another way of looking at it: Python, Ruby, Java, and even Rust all use O_CLOEXEC on all files, with no way I saw at a glance to not do this. They expose One definite point of clarification—I forgot to mention stdio—those I would agree should remain as they are even if the default for files and sockets were to change. The existing OSSubprocess library already has the ability to control if/how they are inherited in a thread-safe way, in any case. Regarding optional use of
The subsequent
So if we were to...okay, let's keep things simple and say we just add an argument specifically for this and make the primitive Perhaps a more builder-styled API would be better—add a property to And none of this does anything for applications that make use of library code that opens its own files or sockets, without |
Thanks for the discussion! This is important to have multiple views. I like the fact that other languages are doing this. |
Clearly nothing is going to happen with this for Pharo 12...could someone move it to the Pharo 13 milestone and maybe we can revisit in a couple months? |
Done |
A customer of ours ran into this issue, and it took a while to trace down. A socket to a GemStone server was not being closed because one of Pharo's grandchild processes had inherited it, so the GemStone logout hung. After reading the discussion above, here's what we'd like to see:
Rationale for default close-on-exec to true: Norbert's comments about staying close to POSIX are good. Any deviations should be carefully considered and well-commented. However, it is good to remember (as noted above) that parts of POSIX are generally recognized as mistakes, and only there for backward compatibility. In this particular case, the most compelling factor that I see is fault detection. If you need a file descriptor to be inherited and it is not, your code will immediately fail. OTOH, if the fd is inherited and should not be, things work until several years later someone spends some hours tracking down an obscure failure. I include disk files as well as sockets in this request because I noticed when debugging this problem that Pharo's grandchild process had open fds on not only Pharo's sources and changes files, but those of the pharo-launcher. That could lead to subtle hard-to-find problems. I look forward to seeing further thoughts any of you have, and seeing something get done in this area for Pharo 13. |
@martinmcclure, agreed on all points—that sounds like an excellent plan. And your point about fault-detection is an additional compelling argument, I think—I am always in favor of systems that either work correctly, or fail early and loudly. It's the bugs you find out about months down the line that are really costly... And ouch, good observation about Pharo launcher! I can see that causing real problems in the wild today, in fact—e.g. if you launch an image, quit Pharo Launcher, and then go to update Pharo Launcher and can't because the image unknowingly has its files open. And clearly this was not the intent of the original programmers, which I think is a good indication that O_CLOEXEC-by-default is the less-surprising behavior for the majority of programmers, especially those used to higher-level-than-C languages (which, as previously mentioned, universally set O_CLOEXEC by default). |
Regarding the comments about other languages setting |
Child processes created through
#system:
or#resultOfCommand:
keep open copies of the file descriptors of the Pharo VM process. In the following example, this poses a problem: the second send of#bindTo:port:
signals a SocketError (“Address already in use”) because the ‘sleep’ process still has the socket open. The third send does not signal an error because by then the ‘sleep’ process has exited.The ‘sleep’ process’s open file descriptors can be listed through ‘lsof’ as done in the following example. The file descriptors include the one for the socket, as well as ones for the Pharo sources and changes files (from
#sourcesFileStream
and#changesFileStream
):There is a flag (
FD_CLOEXEC
) that can be set on a file descriptor (throughfcntl
) such that it will not remain open in the ‘sleep’ process. In Zinc issue #110, I gave a snippet that sets the flag on the Zinc server socket. Perhaps that should be generalized to a method for setting the flag on any file descriptor. Alternatively, there should be a variant of#system:
that takes an explicit file descriptor mapping for the child process, closing any file descriptor in the child process that is not mapped, something like the following:The text was updated successfully, but these errors were encountered: