-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MacOS: posix_spawnp: invalid argument (Bad file descriptor) #251
Comments
There is some more info about the bug here: commercialhaskell/stack#5763 (comment) |
Thanks for opening this! Unfortunately, the
I suspect the |
Not sure what you mean by that since
Dtruss logs for the following command:
Log link: https://gist.github.com/psibi/dc56d065590253f033e1de6a0a820c4d |
@psibi, you may need to either:
It looks like this failed for yet another reason:
Are you sure |
@bgamari I did these things:
I cannot allow Darwin to execute
Log link: https://gist.github.com/psibi/8c65361bab44fda64523e20d45790dc0 |
Unfortunately it appears that |
@bgamari It's straightforward to reproduce it in MacOS with the stack codebase. You can follow these steps:
Once you have the executable ready, executing something like this will reproduce the bug:
The above command will run the integration test of |
Specifically how should I go about building it such that it links against my |
@bgamari Sorry, I should been more clear. These are the steps:
Let me know if you are stuck or need any more help. :-) (I don't have a personal Mac and I did my entire testing using Amazon's EC2 Mac instance. I can create a new instance and help you further, if any steps are unclear) |
Thanks @psibi! Very helpful. |
@psibi I can't build with your instructions. Can you please help?
|
Solved by adding |
Seems that the error is due to using |
@mpickering You mean in the Stack codebase ? Because the same codebase is working for both Linux & Windows OS (and the previous process version used to work for Mac too). |
Also, just to add context: I currently have an workaround for this in Mac: commercialhaskell/stack#5763 (comment) But ideally it's good to avoid that workaround. |
What I suggested is a workaround and highlights the cause of the issue. Your workaround also works for the same reason, because |
In particular I can't reproduce this directly but I think the error is something to do with nested posix_spawnp, ie one Haskell executable calling another haskell executable, calling another executable. |
Here's at least one test which does something different on Linux/Mac but I'm not sure it's the same issue. https://gist.github.com/bd27f3db19ddbad480c8eae212058103 If you compile both of these files then run |
I tried reproducing this on a C level; I'm not getting quite the same errors as reported here so it's probably something slightly different, but the program below (compile with spawn.c
|
Thanks for narrowing this down, @mpickering. The problem indeed appears to be that
Which suggests that the |
The semantics suggested above are quite unfortunate since We can, however, work around this by using the fact that |
Previously to spawn a process with a closed standard handle, we would use `posix_spawn_file_action_addclose`. However, it turns out that POSIX specifies that `spawnp()` may fail if `addclose()` is used on an fd that is already closed. While glibc and musl appear to ignore this aspect of the specification, Darwin indeed follows it leading to haskell#251. This behavior is rather unfortunate as `posix_spawn_file_action_addclose` is a convenient way to close a handle in a subprocess in a race-free manner (e.g. unlike `O_CLOEXEC`, which is global). To avoid haskell#251 we must first use `posix_spawn_file_action_addopen` on the fd (e.g. opening `/dev/null`) to be closed to ensure that it is valid, which has the side-effect of closing the inherited fd. We can then safely use `posix_spawn_file_action_addclose` to close the fd. Fixes haskell#251.
@mpickering has confirmed that #257 fixes the Stack reproducer. |
Thanks everyone for fixing this! |
It turns out that this test is subtly broken. In particular, the test will fail if any file is opened in the subprocess before the child is run since the closed fd 0 may be reused for the new file. This tends to happen in the threaded RTS due to the event manager's control pipe (see GHC #22395). Unfortunately, it's not really clear how else haskell#251 can reliably be tested.
Previously to spawn a process with a closed standard handle, we would use `posix_spawn_file_action_addclose`. However, it turns out that POSIX specifies that `spawnp()` may fail if `addclose()` is used on an fd that is already closed. While glibc and musl appear to ignore this aspect of the specification, Darwin indeed follows it leading to haskell#251. This behavior is rather unfortunate as `posix_spawn_file_action_addclose` is a convenient way to close a handle in a subprocess in a race-free manner (e.g. unlike `O_CLOEXEC`, which is global). To avoid haskell#251 we must first use `posix_spawn_file_action_addopen` on the fd (e.g. opening `/dev/null`) to be closed to ensure that it is valid, which has the side-effect of closing the inherited fd. We can then safely use `posix_spawn_file_action_addclose` to close the fd. Fixes haskell#251.
It turns out that this test is subtly broken. In particular, the test will fail if any file is opened in the subprocess before the child is run since the closed fd 0 may be reused for the new file. This tends to happen in the threaded RTS due to the event manager's control pipe (see GHC #22395). Unfortunately, it's not really clear how else haskell#251 can reliably be tested.
Using the process shipped with GHC 9.2.3 (and even the latest master), I have been able to reproduce a bug while running integration tests for Stack. Following points about it:
For reproducing it via Mac, use this branch:
A quick way to reproduce it doing something like this
stack-integration-test -m 111-custom-snapshot
.Note that the above branch is sprinkled with lots of logs. It was done to help debugging. Let me know if you want me to remove it. I have also sprinkled some logs on my process fork and it indicates to me that the bug happens when creating the
ghc-pkg
process. Also the log indicates that I get the exit status code of 9.I invoked dtruss using the following command:
This is the dtruss logs for the above command: https://gist.github.com/psibi/0c3c89dd2b90012d7d9f3a64ceffb73a
The text was updated successfully, but these errors were encountered: