-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
_os_unfair_lock_corruption_abort
after fork on MacOS
#660
Comments
Can you get a Could also try with |
No, it does not fail with Interestingly, setting Similarly, I've tried with more domains, fewer domains, more procs per domain, all kinds of combinations, etc. I can't get it to crash under
On current
|
It would be good to find a way of making it fail more often. e.g. the example runs a shell; what happens if you call I tried running it on Linux with eio_posix and tsan and that didn't report anything. Another way to make it fail sooner might be to replace the use of the high-level |
Here's my version using Eio_posix.Low_level.Process, it fails with ASAN fails almost immediately with
|
So the child process is dying with SIGILL and your It looks like it must be either the forked launcher (e.g. What happens if you remove the pipe stuff ( |
(I edited the previous message with ASAN information, make sure to refresh the page) Yeah, child process dies with SIGILL and failwith is just used to display the error. I do not get a coredump. It fails the same way when I use |
Maybe coredumps need to be enabled: https://stackoverflow.com/a/2080938/50926 (not sure if OCaml is expected to work with ASAN - does it work with other programs?) |
I've had ASAN successfully detect a race condition in C++ code invoked from OCaml before. I totally forgot about
So yeah, this line. |
It's difficult to find information on it where the Also this: https://hacks.mozilla.org/2022/10/improving-firefox-responsiveness-on-macos/
However: eio/lib_eio/unix/fork_action.c Lines 1 to 5 in 5e014fc
EDIT: This OCaml 4.14 issue contains information regarding the pitfalls of mixing |
@talex5 this PR (less than 30 minutes old!) may be related: ocaml/ocaml#12886 Quoting myself:
The PR mentions thread switches causing issues, and that extra load of |
Yeah, this sounds related. I figured out that code was missing while debugging a forked process that aborted after failing to lock a locked pthreads mutex. The macos equivalent appears to be |
_os_unfair_lock_corruption_abort
after fork on MacOS
@TheNumbat I just tried your proposed fix (ocaml/ocaml#12886) and unfortunately it doesn't fix this issue, so maybe they are not the same issue after all? |
Does that only apply to https://stackoverflow.com/questions/58076064/forkexec-process-with-threads-broken-on-macos-children-process-stuck-in-atf says "So far it seems that fork in multi-threaded environment is just broken on macOS." http://sealiesoftware.com/blog/archive/2017/6/5/Objective-C_and_fork_in_macOS_1013.html suggests that |
Every solution to this problem seems to mention the need to use |
I cannot explain it with certainty, but the problem is gone. I've tried to replicate it on the Eio commit where I first encountered the issue. It's not happening anymore, whether I use the OCaml trunk branch or any of the numerous open OCaml PRs addressing various systhreads issues. I've installed a large OS update recently (same major version though) and I'm going to tentatively assume it contained a fix or mitigation in some way. I'll reopen this issue if it reoccurs. |
On MacOS,
dune exec stress/stress_proc.exe
fails withIt usually completes a few rounds successfully before failing. It sometimes fails with just a single exception instead of 2+ like the example above.
It completes a lot more rounds (and sometimes even reaches 100 rounds successfully) on the
pool-systhreads
branch.The text was updated successfully, but these errors were encountered: