Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File descriptors get passed down to ‘system’ child processes #12576

Open
Rinzwind opened this issue Feb 6, 2023 · 16 comments
Open

File descriptors get passed down to ‘system’ child processes #12576

Rinzwind opened this issue Feb 6, 2023 · 16 comments
Assignees
Labels
Version: Pharo 13 Issue to fix in Pharo 13

Comments

@Rinzwind
Copy link
Contributor

Rinzwind commented Feb 6, 2023

Child processes created through #system: or #resultOfCommand: keep open copies of the file descriptors of the Pharo VM process. In the following example, this poses a problem: the second send of #bindTo:port: signals a SocketError (“Address already in use”) because the ‘sleep’ process still has the socket open. The third send does not signal an error because by then the ‘sleep’ process has exited.

socket1 := Socket newTCP.
socket1 bindTo: NetNameResolver localHostAddress port: 1701.
LibC system: 'sleep 2 &'.
socket1 close.
socket2 := Socket newTCP.
error := [ socket2 bindTo: NetNameResolver localHostAddress port: 1701 ] on: Error do: #value.
(Delay forSeconds: 3) wait.
socket2 bindTo: NetNameResolver localHostAddress port: 1701.
socket2 close.

The ‘sleep’ process’s open file descriptors can be listed through ‘lsof’ as done in the following example. The file descriptors include the one for the socket, as well as ones for the Pharo sources and changes files (from #sourcesFileStream and #changesFileStream):

socket1 := Socket newTCP.
socket1 bindTo: NetNameResolver localHostAddress port: 1701.
pid := LibC resultOfCommand: 'sleep 2 >/dev/null & echo $!'.
socket1 close.
lsofOut := LibC resultOfCommand: 'lsof -P -p ' , pid.

There is a flag (FD_CLOEXEC) that can be set on a file descriptor (through fcntl) such that it will not remain open in the ‘sleep’ process. In Zinc issue #110, I gave a snippet that sets the flag on the Zinc server socket. Perhaps that should be generalized to a method for setting the flag on any file descriptor. Alternatively, there should be a variant of #system: that takes an explicit file descriptor mapping for the child process, closing any file descriptor in the child process that is not mapped, something like the following:

outputStream := '/tmp/output' asFileReference writeStream.
otherStream := '/tmp/other' asFileReference readStream.
LibC system: command fileDescriptors: {
	"FD 0, a copy of file descriptor 0 of the Pharo VM process:"
	Stdio stdin.
	"FD 1 and 2, a copy of outputStream's file descriptor:"
	outputStream.
	outputStream.
	"FD 3, not opened:"
	nil.
	"FD 4, a copy of otherStream’s file descriptor:"
	otherStream.
	"FD 5 and above, not opened:"
	} 
@akgrant43
Copy link
Collaborator

akgrant43 commented Feb 6, 2023

In http://gtoolkit.com we've added Socket>>setCloseOnExec: which does just what you suggest above.

Perhaps this should be added to Pharo :-).

gtoolkit.com
Glamorous Toolkit is the moldable development environment.

@Rinzwind
Copy link
Contributor Author

Rinzwind commented Feb 6, 2023

Ah yes, would be good to add that to Pharo: I see the implementation is in ‘GToolkit-Utility-File’, as an extension on Socket which delegates to a corresponding method on SqSocket.

As for the alternative I had in mind of explicitly mapping the file descriptors, I’ve noted there’s a method #closeAllButStandardFileStreams in OSSubprocess though it’s not actually implemented. OSSubprocess makes use of posix_spawn; the functions for manipulating the file_actions unfortunately don’t seem to provide a way to close a range of file descriptors. The rationale for close in POSIX.1-2017 also says “The standard developers rejected a proposal to add closefrom() to the standard […] it is not possible to standardize an interface that closes arbitrary file descriptors above a certain value […].”

@daniels220
Copy link
Contributor

I would strongly support integrating this in base Pharo, and in fact making it the default for new FDs (file and socket both), requiring a setCloseOnExec: false1 to get the old behavior. I suggest this because there is a potential race condition when one Process opens a file/socket, then sets FD_CLOEXEC, while another forks a child process. Presumably if the one doing the fork wants the FD inherited, there would be no race condition with them clearing FD_CLOEXEC before forking2. So rather than add an extra argument to a huge number of methods—consider all the implementors of [binary]writeStream[Encoded:][do:]!—it seems better to apply a safe default, then have a single point of entry to change it back.

I see in the code that GT is using a custom primitive for fcntl(), noting that FFI doesn't support varargs functions on ARM64 (in fact, if my research is correct, it's specifically ARM64/macOS that is different). For anyone wishing to use this code in their own application without loading, presumably, a binary plugin, it appears that on x86_64, a varargs function is called just like a fixed-args function declared with that exact pattern of args—that is, calling fcntl(my_fd,F_SETFD,existing_flags|FD_CLOEXEC) works exactly as if fcntl() was declared as int fcntl(int fd, int cmd, int arg), even though in fact the declaration is int fcntl(int fd, int cmd, ... /* arg */ ). So we can declare the FFI method this way, bypassing the lack of explicit varargs support:

fcntl: fileDescriptor _: cmd _: arg
	"Perform the specified fcntl command."

	^ self ffiCall: #( int fcntl #( int fileDescriptor , int cmd , int arg ) )

And the code will then work on most platforms without the plugin. This is taking advantage of undefined behavior, so the usual caveats about nasal demons apply, but it's a workable stopgap solution.

Ultimately, given the architecture of the Pharo file/socket plugins, I imagine the solution would be a primitive as part of those plugins that performs an fcntl() on the underlying FD without requiring the image code to traverse the private structs involved in a file handle—which would be good, as that is an obvious hack in its own right.

I could possibly even write those primitives—I've never written C, but I can read it well enough, and most of it would be boilerplate that can be copied from existing primitives—but I'd definitely want someone to look them over, and I can't actually compile the VM myself to test them (long story).

Footnotes

  1. Though I might name it just #closeOnExec[:] or #isCloseOnExec[:] rather than a get/set pair.

  2. Potentially there could be a race if multiple Processes want to fork a child OS-process, each of which inherits only some particular FD, but that would be the case in C code as well as the flag is effectively "global", it lives on the FD rather than being part of the execve() call, and the solution is the same—the "clear FD_CLOEXEC, fork child, re-set FD_CLOEXEC" sequence needs to be a mutual critical section, guarded by a Semaphore or similar.

@Ducasse Ducasse added this to the 12.0.0 milestone Dec 28, 2023
@Ducasse
Copy link
Member

Ducasse commented Dec 28, 2023

Thanks Daniel. I tagged it so that Pablo can have a look. Now if people would play the game and contribute back to Pharo by sending PRs this would help us. But we will do it by ourselves.
I imagine that we will discuss this is in January after holidays.

@Ducasse
Copy link
Member

Ducasse commented Dec 28, 2023

Thanks @Rinzwind for the extension link :).

@daniels220
Copy link
Contributor

Like I said, I'd be open to collaborating on this—certainly I could fill in the image side of things, referring to a primitive that doesn't yet exist but has an obvious implementation, if someone with VM experience can do that half. I could probably struggle through the VM side, too (I see I can download artifacts from a CI build, so the fact that I can't build my own VM wouldn't have to stop me entirely...), but if someone with more VM experience has time, I'd rather work on other things in the image...

@Ducasse
Copy link
Member

Ducasse commented Dec 29, 2023

I know. My remark was more general.
Our VM guys will have a look.

@noha
Copy link
Member

noha commented Dec 29, 2023

My recommendation would be to stay close to POSIX on these things. It is normal that all fds are available in a child process. This is needed because a lot of things rely on processes share stdin/stdout/stderr and for other things. Putting a half baked idea in pharo might not be a good idea.

@daniels220 If it is about the race condition with FD_CLOEXEC there exist O_CLOEXEC and SOCK_CLOEXEC on creation time to exactly remove that problem. So if there is an FFI problem with setting these flags we should fix this

So I'm against changing a default behaviour to somehting new that just opens a lot of other problems. Parent processes can explicitly set FD_CLOEXEC before forking or child processes can clear them after the exec(). There is no such thing as a better default. Especially not if we depart from POSIX

@Ducasse
Copy link
Member

Ducasse commented Dec 29, 2023

Thanks Norbert!

@daniels220
Copy link
Contributor

daniels220 commented Dec 29, 2023

Norbert, you have a point...I know I can perhaps be overeager in my opinions when I don't have experience of the full context (in this case, I haven't written C or similar, or dealt directly with POSIX APIs). Certainly it's fair to do some more research...but when I did a quick Google of "is it best practice to always use O_CLOEXEC", the top few results were mostly "yes," either "...unless you specifically know you need to share that FD," or "and if you do need to share one, clear FD_CLOEXEC immediately before fork and re-set it afterward, to avoid accidentally sharing it with a different child process". So this seems like a case where POSIX is held back by the need for backwards-compatibility, creating a footgun for new C programmers for the rest of time, while for Pharo, I think the number of programs that actually rely on sharing FDs (other than stdio) with child processes is...probably either zero or one, honestly. I would note also that you can't actually retrieve the FD of an open file without private-struct hijinks, so how would you even tell the child process what to use? I imagine it can examine the FD table itself, but that seems weird and error-prone.

Another way of looking at it: Python, Ruby, Java, and even Rust all use O_CLOEXEC on all files, with no way I saw at a glance to not do this. They expose fcntl(), allowing you to clear the flag later, but don't bother wrapping it in any high-level API. So that's pretty strong evidence too that the programming community at large considers it a "should have been the default all along" sort of thing.

One definite point of clarification—I forgot to mention stdio—those I would agree should remain as they are even if the default for files and sockets were to change. The existing OSSubprocess library already has the ability to control if/how they are inherited in a thread-safe way, in any case.

Regarding optional use of O_CLOEXEC and SOCK_CLOEXEC—the problem isn't just adding support to the FilePlugin and SocketPlugin primitives, it's with how much surface area in the in-image API would be affected. If I execute 'foo.txt' asFileReference writeStream, at one point the stack looks like this:

FileHandle(FileSystemHandle)>>setReference:writable:
FileHandle class(FileSystemHandle class)>>on:writable:
FileHandle class(FileSystemHandle class)>>open:writable:
FileSystem>>open:writable:
FileSystem>>binaryWriteStreamOn:
FileReference>>binaryWriteStream
FileReference(AbstractFileReference)>>writeStreamEncoded:
FileReference>>writeStream
UndefinedObject>>DoIt

The subsequent #open in FileSystem>>open:writable: is a no-op with a TODO flag, and the actual opening happens with the stack looking like:

File class>>open:writable: (primitive)
[] in File>>basicOpenForWrite:
File class>>retryWithGC:until:forFileNamed:
File>>basicOpenForWrite:
File>>openForWrite:
File>>openForWrite
File>>writeStream
FileHandle>>binaryWriteStream
FileSystem>>binaryWriteStreamOn:
...as above...

So if we were to...okay, let's keep things simple and say we just add an argument specifically for this and make the primitive open:writable:closeOnExec:. (This feels dirty to me, but there don't seem to be many other similar flags, so maybe it's not so unreasonable?) Now we have 13 other methods that need a variant with a parameter in order to pass the argument down the chain. And there are at least 2 more methods on the "write" side, and something like 8 more on the read side, plus oddballs like appendStream, that aren't part of this call chain but would need the same treatment. And what does the high-level API even look like? #writeStreamWithCloseOnExec? Not exactly pretty...

Perhaps a more builder-styled API would be better—add a property to FileReference so that it looks like ('foo.txt' asFileReference) useCloseOnExec; writeStream. Then we could continue the pattern and add a similar property to FileReference and File. This is probably less added code overall, but it's semantically questionable—is it really a property of the file reference that "oh, hey, if we open this file we should do so with O_CLOEXEC"? What happens when you combine relative paths—does the new FileReference inherit the flag from the one it was constructed from? Etc. etc. So it's a tradeoff and not clearly better.

And none of this does anything for applications that make use of library code that opens its own files or sockets, without O_CLOEXEC, and elsewhere in the application happens to fork child processes. So add on top of that needing to thread the argument through and/or add option properties to all the various XML, JSON, CSV, etc. parsing libraries that have their own wrappers to open a file and transparently translate the markup language.

@Ducasse
Copy link
Member

Ducasse commented Dec 30, 2023

Thanks for the discussion! This is important to have multiple views. I like the fact that other languages are doing this.

@daniels220
Copy link
Contributor

Clearly nothing is going to happen with this for Pharo 12...could someone move it to the Pharo 13 milestone and maybe we can revisit in a couple months?

@jecisc jecisc removed this from the 12.0.0 milestone Mar 25, 2024
@jecisc jecisc added the Version: Pharo 13 Issue to fix in Pharo 13 label Mar 25, 2024
@jecisc
Copy link
Member

jecisc commented Mar 25, 2024

Done

@martinmcclure
Copy link
Contributor

A customer of ours ran into this issue, and it took a while to trace down. A socket to a GemStone server was not being closed because one of Pharo's grandchild processes had inherited it, so the GemStone logout hung.

After reading the discussion above, here's what we'd like to see:

  1. Provide accessors file descriptors / Windows handles so advanced users can extend functionality through FFI calls. This is, of course, advanced usage that voids your warranty, but sometimes you need it.

  2. Provide access to and the ability to change Unix close-on-exec / Windows inheritance of sockets and files.

  3. For all fds other than stdin, stdout, stderr, default to close-on-exec on both files and sockets when created, but provide the option of overriding that default (see rationale below).

Rationale for default close-on-exec to true:

Norbert's comments about staying close to POSIX are good. Any deviations should be carefully considered and well-commented. However, it is good to remember (as noted above) that parts of POSIX are generally recognized as mistakes, and only there for backward compatibility.

In this particular case, the most compelling factor that I see is fault detection. If you need a file descriptor to be inherited and it is not, your code will immediately fail. OTOH, if the fd is inherited and should not be, things work until several years later someone spends some hours tracking down an obscure failure.

I include disk files as well as sockets in this request because I noticed when debugging this problem that Pharo's grandchild process had open fds on not only Pharo's sources and changes files, but those of the pharo-launcher. That could lead to subtle hard-to-find problems.

I look forward to seeing further thoughts any of you have, and seeing something get done in this area for Pharo 13.

@daniels220
Copy link
Contributor

@martinmcclure, agreed on all points—that sounds like an excellent plan. And your point about fault-detection is an additional compelling argument, I think—I am always in favor of systems that either work correctly, or fail early and loudly. It's the bugs you find out about months down the line that are really costly...

And ouch, good observation about Pharo launcher! I can see that causing real problems in the wild today, in fact—e.g. if you launch an image, quit Pharo Launcher, and then go to update Pharo Launcher and can't because the image unknowingly has its files open. And clearly this was not the intent of the original programmers, which I think is a good indication that O_CLOEXEC-by-default is the less-surprising behavior for the majority of programmers, especially those used to higher-level-than-C languages (which, as previously mentioned, universally set O_CLOEXEC by default).

@Rinzwind
Copy link
Contributor Author

Regarding the comments about other languages setting FD_CLOEXEC by default, for Rust specifically, this looks to have been implemented in Rust pull request #24034 to handle Rust issue #12148, with follow-ups to use O_CLOEXEC and SOCK_CLOEXEC where possible listed in Rust issue #24237. Linked to the first issue is Racket pull request #4860 which made a similar change last year though seemingly more as an optimization: “The root of the problem is that a fork+exec on Unix keeps all file descriptors from the original process open in the new process. Racket has long used a traditional correction of closing all file descriptors in a forked process, specifically by trying close on every file descriptor up to the maximum value. That worked fine when the file-descriptor limit was something like 256 or 1024. These days, it can be 1M, and 1M erroring close system calls can take a while. […] A modern alternative is to set the FD_CLOEXEC flag on descriptors so that they are automatically closed on the exec step, and leave FD_CLOEXEC off on the file descriptors that you intend to be comunicated to the subprocess. […] The change here adapts Racket at the rktio level to set FD_CLOEXEC on all created file descriptors, […] Foreign libraries are expected to play along as well as they can, and libraries like glib, sqlite3, and Gtk+ do seem to work that way (i.e., they set FD_CLOEXEC, and they tend to use the atomic interface where available).”

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Version: Pharo 13 Issue to fix in Pharo 13
Projects
None yet
Development

No branches or pull requests

8 participants