Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

startProcess: dup2: invalid argument (Bad file descriptor) on macOS #247

Closed
mpilgrem opened this issue May 28, 2022 · 10 comments
Closed

startProcess: dup2: invalid argument (Bad file descriptor) on macOS #247

mpilgrem opened this issue May 28, 2022 · 10 comments

Comments

@mpilgrem
Copy link

I am not sure if this is a process-1.6.14.0 issue, but I think it may well be, because:

  1. as I explain further below, all I can see that has changed is a move from process-1.6.9.0 (supplied in resolver lts-17.15) to process-1.6.14.0;
  2. if, however, I move from package-1.6.9.0 to package-1.6.12.0, the identified problems do not occur; and
  3. I noted posix_spawn: Don't attempt to dup2 identical fds #214 and wondered if it was somehow related, given the dup2 and macOS connections. May I call on the expertise of @bgamari ?

By way of background, I am trying (here commercialhaskell/stack#5736) to get the existing stack project to (a) build using GHC 9.0.2 rather than with lts-17.15 (GHC 8.10.4) and (b) pass the CI on commercialhaskell/stack. I am using lts-19.7 with an extra dependency on process-1.6.14.0 (because of the macOS bugs that it fixed). (If I use lts-19.7 with an extra dependency on process-1.6.12.0 there are no simlar problems.)

It (stack, built that way), and the CI, works fine on Windows.

It (stack) also seems to work fine on macOS; my test is 'Can I build stack with stack?'.

However, on macos-latest (macOS 11.6.6) the CI fails - specifically, during the run of the testing executable stack-integration-test. The testing also fails in the same way on a local macOS 10.15.5 machine. The error messages are (reformated for clarity):

[debug> Run process within <temp_dir>: /Users/runner/.stack/programs/x86_64-osx/ghc-9.0.2/bin/ghc-9.0.2
  -rtsopts -threaded -clear-package-db -global-package-db -hide-all-packages
  -package base
  -main-is StackSetupShim.mainOverride -package Cabal-3.4.1.0 
  <temp_home_dir>/.stack/setup-exe-src/setup-mPHDZzAJ.hs 
  <temp_home_dir>/.stack/setup-exe-src/setup-shim-mPHDZzAJ.hs
  -o <temp_home_dir>/.stack/setup-exe-cache/x86_64-osx/tmp-Cabal-simple_mPHDZzAJ_3.4.1.0_ghc-9.0.2

[error] /Users/runner/.stack/programs/x86_64-osx/ghc-9.0.2/bin/ghc-9.0.2: startProcess: dup2: invalid argument (Bad file descriptor)

I am pretty sure the error is arising in a RIO.Process.runProcess_ in Stack.Build.Execute.getSetupExe. The relevant extract being the else ... branch below.

exists <- liftIO $ D.doesFileExist $ toFilePath exePath

    if exists
        then return $ Just exePath
        else do
            tmpExePath <- fmap (setupDir </>) $ parseRelFile $ "tmp-" ++ exeNameS
            tmpOutputPath <- fmap (setupDir </>) $ parseRelFile $ "tmp-" ++ outputNameS
            ensureDir setupDir
            let args = buildSetupArgs ++
                    [ "-package"
                    , "Cabal-" ++ cabalVersionString
                    , toFilePath setupHs
                    , toFilePath setupShimHs
                    , "-o"
                    , toFilePath tmpOutputPath
                    ]
            compilerPath <- getCompilerPath
            withWorkingDir (toFilePath tmpdir) (proc (toFilePath compilerPath) args $ \pc0 -> do
              let pc = setStdout (useHandleOpen stderr) pc0
              runProcess_ pc)
                `catch` \ece ->
                    throwM $ SetupHsBuildFailure (eceExitCode ece) Nothing compilerPath args Nothing []
            renameFile tmpExePath exePath
            return $ Just exePath

None of the above source code has been changed in moving from GHC 8.10.4 to GHC 9.0.2, so I am at a loss to understand why errors are now being thrown on macOS. I don't know the position on other Unix-like operating systems - I don't have access to a linux machine and the linux CI fails for other reasons (something to do wtih 'alpine' and 'docker' images).

My own knowledge is very limited - I am a Windows user with access also to a macOS machine - and a little knowledge is a dangerous thing, but I am assuming that the error message I am seeing ultimately comes from child_failed(pipe, "dup2") in cbits/posix/fork_exec.c setup_std_handle_fork. Tracing back from that line in the source code, I reach do_spawn_fork and then do_spawn in runProcess.c. If I understand correctly, in do_spawn, do_spawn_posix must have returned -2 and it has then tried do_spawn_fork. Tracing further back, I then reach runInteractiveProcess in runProcess.c and, finally, System.Process.Posix.c_runInteractiveProcess. That is the boundary of my own understanding.

@snoyberg
Copy link
Collaborator

@bgamari do you have any thoughts on what could have changed in process that could trigger this kind of behavior?

@psibi
Copy link
Contributor

psibi commented Jun 14, 2022

Adding more information:

  • The above bug is reproducible on a MacOS environment (the same code compiles and works fine on both Linux/Windows).
  • The bug seems to happen because of the function withWorkingDir. If I remove it, the error disappers.
  • There is another bug related to process which happens on MacOS only. This is the error logs:
Running: /Users/ec2-user/.local/bin/stack script.hs
/Users/ec2-user/.stack/programs/x86_64-osx/ghc-9.2.3/bin/ghc-pkg-9.2.3: startProcess: posix_spawnp: does not exist (No such file or directory)
Main.hs: Exited with exit code: ExitFailure 1
CallStack (from HasCallStack):
  error, called at /Users/ec2-user/stack/test/integration/lib/StackTest.hs:63:34 in main:StackTest
  stack, called at /Users/ec2-user/stack/test/integration/tests/script-extra-dep/Main.hs:4:8 in main:Main

End of log for script-extra-dep
Failed tests:
- script-extra-dep - ExitFailure 1

@mpilgrem
Copy link
Author

@psibi, I also ran into posix_spawnp problems on macOS but cured them. The 'cure' was to use sinkProcessStderrStdout rather than sinkProcessStdout in Stack.PackageDump.ghcPkgCmdArgs. I'll explain further in your PR on commercialhaskell/stack.

@psibi
Copy link
Contributor

psibi commented Jun 14, 2022

@mpilgrem Thanks, not sure if we both found the same cure - but I did apply a similar workaround: https://github.com/commercialhaskell/stack/blob/98797cc93431ba970aa72ae891fb0783d5c16cc3/src/Stack/PackageDump.hs#L70 (I'm doing my experimental changes in a separate branch now)

While the workaround solved majority of the failing tests, I'm still getting that issue as part of two integration test failure for which I'm yet to find any proper solution.

@bgamari
Copy link
Contributor

bgamari commented Jun 14, 2022

@snoyberg, apologies for not seeing this until now.

It seems like there are a few issues being reported on this ticket which are likely due to different causes. @mpilgrem, could you open separate tickets, ideally with minimal reproducers, for the distinct issues you have seen? #214 indeed sounds quite relevant here but without being able to reproduce the issue it is very hard to say what is going on.

@bgamari
Copy link
Contributor

bgamari commented Jun 14, 2022

I believe the dup2 issue described here should be fixed by #250.

@psibi
Copy link
Contributor

psibi commented Jun 15, 2022

@bgamari's patch fixes the dup2 based failures in the Stack codebase. The only remaining issue is the posix_spawnp issue that's still lingering on MacOS:

$ ~/.local/bin/stack-integration-test -m script
.....
/Users/ec2-user/.stack/programs/x86_64-osx/ghc-9.2.3/bin/ghc-pkg-9.2.3: startProcess: posix_spawnp: invalid argument (Bad file descriptor)
Main.hs: Exited with exit code: ExitFailure 1
CallStack (from HasCallStack):
  error, called at /Users/ec2-user/stack/test/integration/lib/StackTest.hs:63:34 in main:StackTest
  stack, called at /Users/ec2-user/stack/test/integration/tests/script-extra-dep/Main.hs:4:8 in main:Main

@mpilgrem
Copy link
Author

I will close this then.

@bgamari
Copy link
Contributor

bgamari commented Jun 15, 2022

Thanks for the confirmation, @psibi.

Can you open a new issue to track the posix_spawnp issue? It would be great if you could provide a dtruss trace from the failing process.

@psibi
Copy link
Contributor

psibi commented Jun 16, 2022

@bgamari Thanks, I have opened a issue here: #251

I have also attached the dtruss logs. Do let me know if you need anything more and I would be happy to help you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants