-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reinitialize IO mutexes after fork #12886
Conversation
Not entirely related, but I also noticed that in |
Looks like thread-related tests are failing in bytecode mode with this error:
Is this a problem with the cygwin CI build? |
This has already been fixed in 4.14: ea90edc . There is no consensus that reinitializing the I/O mutexes in the child process is better than doing nothing, but this can be rediscussed. It's mentioned as a "remaining issue for OCaml 5" in #12399. For OCaml 4, the behavior changed in #12646 (commit ea90edc) to fix #12636. |
This PR was closed by mistake. Reopening and adding more context to my previous comment. |
I see, thanks. Is this patch the proper way to port reinitialization to the 5 runtime, then? I agree reinitializing the mutexes is not obviously better than doing nothing, but if we don't, I don't think it's ever feasible to fork with threads (in which case it shouldn't be supported at all). |
I tend to think so. For OCaml 4, an option we didn't investigate was to forget the old mutexes (but not destroy nor free them!) and reallocate new mutexes on demand, at the cost of leaking memory. In OCaml 5, I/O channel descriptors contain a mutex, not a pointer to a heap-allocated mutex, so this option is not available. Reinitializing a mutex is technically undefined behavior, but looks like the safest option to me. This said, this part of the OCaml code was rewritten by @Engil, then much amended by @gadmm, so both of them should chime in now. |
Before I reply, has it been discussed to change fork such that it starts by acquiring all the channel mutexes? I have a vague recollection of a discussion, but I did not find it and do not remember the issue if any. |
I thought about it briefly, but it won't work: a channel mutex can be held forever (e.g. while reading from a socket or pipe where the other end is not providing data), so this would prevent |
Ideally, programs that are wrong should fail as early as possible. Due to the "best-effort" and anti-POSIX aspect of the implementation, though, it is not clear how to reach a clear specification while keeping the feature useful. The best way to look at it is from a backwards-compatibility perspective. One should make sure to preserve programs that are correct a posteriori (e.g. due to relying on underspecified behaviour, due to machine- and OS-specific reasons...). From this point of view, a program that leaves some channel in an inconsistent state, but that never observes this inconsistent state (for whichever programming reason), is correct. Since no-one seems to have the time nor motivation to do an in-depth study of programs in the wild, it is far much simpler to preserve the behaviour from OCaml 4 in a litteral sense. So I think this PR should be reviewed with an eye towards acceptance. In the longer term, a way to better specify the behaviour, whilst preserving most of the correct programs, could be to try to acquire all channel locks before forking (using
It does not seem to be a problem to me. Since the signal handlers/finalisers would run on a different thread, correctly they would block if they tried to access the same channel. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR gives programs a chance to crash (or not) rather than deadlock when observing a channel in an inconsistent state, following a fork while the channel is locked by another thread.
I recommend to accept it because this is closer to OCaml 4, and one does not really know what programs in the wild do (nor has the time to look into it, probably). Reinitializing the mutex directly differs a bit from OCaml 4, but is already used as a method for resetting the domain lock just above.
Still, this is not great and meant as a short-term fix. A possible better fix using try_lock
on all channels is described above.
However, it does not seem useful nor necessary to reinitialize the all_opened_channels
mutex, see below.
otherlibs/systhreads/st_stubs.c
Outdated
/* Reinitialize IO mutexes, in case the fork happened while another thread | ||
had locked the channel. If so, we're likely in an inconsistent state, | ||
but we may be able to proceed anyway. */ | ||
caml_plat_mutex_init(&caml_all_opened_channels_mutex); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
caml_all_opened_channels_mutex
was introduced in OCaml 5 for the case of concurrent access by two domains. For single-domain programs, caml_all_opened_channels
is protected by the domain lock. Since fork
is only for single-domain programs, I think caml_all_opened_channels_mutex
is always in an unlocked state and its data is valid. Thus, maybe there is no need to reinitialise it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, so in a single domain program, caml_all_opened_channels_mutex
can't be locked at the fork because it's never locked during a blocking section. Is that right?
(Deleted this change)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, or I would see it as a bug with OCaml 5 (there is no such mutex in OCaml 4).
This looks ready to merge (at least as a temporary fix), other than the failing Windows build. I'm not sure what's going on there, can someone take a look? |
My guess would be a missing |
Yep, that was it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am approving based on @gadmm's review and my understanding of the generally positive conversation on this best-effort change. Thanks!
I think the Changes entry for this was mistakingly put in the 5.2 section (in |
Actually I think that this is a bugfix (as it, it un-crashes programs in the wild), so I will cherry-pick into 5.2 instead of moving the Changes entry to trunk. |
Reinitialize IO mutexes after fork (cherry picked from commit 5ef295c)
Cherry-picked in 5.2 as 800ba1f (nice hash). |
I have a question about this change. /* default handler for unix_fork, will be called by unix_fork. */
static void caml_atfork_default(void)
{
caml_reset_domain_lock();
caml_acquire_domain_lock();
/* FIXME: For best portability, the IO channel locks should be
reinitialised as well. (See comment in
caml_reset_domain_lock.) */
} The FIXME is suggesting the same thing that was added by the present PR in otherlibs/systhreads/st_stubs.c. If I understand correctly, this makes the comment slightly inconsistent, but not wrong:
My current understanding is that the IO-channel logic should move from What do you think? |
This said, the purpose and use of |
I looked at (The ocamltest codebase also calls |
My use case for this PR was also daemonizing a child process (which worked in 4 but failed in 5), but this change was sufficient because the fork occurred in an OCaml library that used systhreads. I think the fork in ocamlfuse is currently not supported, since it's either forking with multiple domains or using the runtime from a C thread without linking systhreads. (i.e. what @xavierleroy said) |
(Upstreaming this)
In OCaml 4, systhreads would re-initialize IO channel mutexes while resetting after a fork.
This PR makes the 5 runtime do so as well, fixing the relevant FIXME.
When resetting the mutexes matters, we're likely in an inconsistent state (i.e. another thread locked the mutex and then disappeared), but as the comment in
caml_thread_reinitialize
says, support for forking with threads is only best-effort.