-
Notifications
You must be signed in to change notification settings - Fork 700
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cabal 3.4.0.0 is broken on Windows 7 (use_process_jobs vs. hsc2hs.exe) #7309
Comments
Hi, i've tried to reproduce it in windows 7 with no luck. I've tried to build with cabal-3.4.0.0 |
Here are fully reproducible instructions: OS: Windows 7 Professional (running inside VirtualBox 6.1) Download:
Extract all 4 to Note: The Now you have these directories:
Now open windows
Output
Note that it completely hangs at the end, and if you open Windows task manager you will see that there is a "zombie" |
Many thanks for the detailed instructions, in my end i've tested and succesfullt built zlib:
|
I tried this binary and it does not exhibit the bug. It is because it was built with "process" package < 1.6.9 See:
This is confirmed when I run
The rc4 binary I linked above: https://oleg.fi/cabal-install-3.4.0.0-rc4/cabal-install-3.4.0.0-x86_64-windows.zip shows the bug. Using the same "strings" trick I can see that it indeed uses Also, I compiled cabal-install myself and it has the bug (same as the rc4 binary). Turns out MSYS is not needed, here are simpler instructions to reproduce: Make sure to use the exact cabal-install rc4 binary given here, or build cabal-install yourself using process >= 1.6.9 Download:
Extract all 3 to Note: The Now you have these directories:
Now open windows
The output is the same as I pasted above. Thank you |
That error in packaging for rc's ended to be in the final release too 😟, but it was corrected
Well, fortunately the bug is gone in the definitive release without having any issue reported at that point 😅 |
Well, then the real bug is that the official cabal 3.4.0.0 final release windows binary was compiled with an old version of "process". This means that windows users aren't getting the fix that was added in #6536, which means that they "may see sporatic build failures" (quote from that pull request). I don't agree that this has anything to do with rc4/final. Like I mentioned previously, if you compile cabal-install 3.4.0.0 (final) yourself, the default is that it will use a recent version of "process", and then you will still encounter the bug (on Windows 7). |
oh, sorry, i thought the newer one doesnt trigger the error but it is the other way around |
Great diagnosis, @bitc. Indeed I hadn't appreciated that a process can be associated with only a single job. This significantly complicates matters for us, sadly. I really don't know how we can reliably track potential forking under this restriction. We could try to set What I do not see in the MSDN documentation is what behavior Windows 8 and later will exhibit when one attempts to associate a process with multiple jobs. Given that jobs cannot be "nested" I suppose the last assignment wins? If so, this sounds like it also carries the potential for brokenness (e.g. if a process crashes before its children have finished; that being said, we already cut some necessary corners in the face of crashing since we cannot robustly determine which process within a job is the true child of the fork). |
@Mistuke, perhaps you know the answer to the question I pose in the second paragraph of my previous post? |
Actually, never mind. To its credit, the documentation does specify the behavior for later versions of Windows:
|
I have opened ghc#19473 to track this on the GHC side, as this issue affects far more than just Cabal. Ultimately I'm not entirely sure what to do here. Windows 7 was EoL'd in July 2020, so it doesn't seem terribly high priority to retain support for it. Moreover, as far as I can tell it is essentially impossible to robustly support As far as I can tell there are no good options for dealing with WIndows 7:
Unfortunately this isn't true.
Yes, this is possible. However, the result is potentially fragile in the case of crashing. I would only want to do this where nested jobs truly aren't supported (which we would need to determine at runtime since it is common to ship around binaries on Windows). |
Yes nested jobs on older Windows versions are indeed somewhat broken. I will have to think about this somewhat, but I think it's safe for process to specify The problem we're trying to catch is for programs that use
One thing to check is if resource cleanup still works as expected.... i.e. if cabal sends a sigkill do the children of GHC still die? (it should since we specificy kill child on parent exit I believe.. but needs checking). |
So you can think of any Haskell program as a synchronization barrier, each Haskell program can safely break away from the parent without affecting correctness (we don't use jobs for resource restrictions (aka cgroups)). Remember that the issue is programs that use So I'm convinced it'll work for execution correctness. But resource leaks should be checked out. |
Maybe it worths translating here the last comment of the ghc issue:
|
While cabal-install continue being buildable with a ghc version which support process < 1.6.9.0 (ghc < 8.8.4) a workaround could be build cabal-install from source with I've just tried with cabal head and it seems to work |
Describe the bug
cabal 3.4.0.0 uses the
use_process_jobs
flag for all created processes. See: #6529 and #6536The problem is that
hsc2hs.exe
also uses theuse_process_jobs
flag to create its sub-processes (like when it callsgcc.exe
)See: https://github.com/haskell/hsc2hs/blob/24100ea521596922d3edc8370b3d9f7b845ae4cf/Common.hs#L43
This is a problem because on Windows 7:
which is documented by Microsoft here: https://docs.microsoft.com/en-us/windows/win32/api/jobapi2/nf-jobapi2-assignprocesstojobobject (Note that on Windows 8 and above there is no problem)
What ends up happening is that inside
hsc2exe.exe
, insidecreateProcess
, the call to the Win32 functionAssignProcessToJobObject
fails:Explanation:
hsc2hs.exe
is running inside a "job" (that was created bycabal.exe
).hsc2hs.exe
launchesgcc.exe
and nowgcc.exe
is also running inside the same "job". Immediately after launchinggcc.exe
,hsc2hs.exe
creates a new "job" and tries to putgcc.exe
into this new job (AssignProcessToJobObject
) but this call fails becausegcc.exe
already is part of a "job" (the one it inherited fromhsc2hs.exe
).AssignProcessToJobObject
is called here: https://github.com/haskell/process/blob/5a0cbd46eca6d30b78726688058b7fd258a2253d/cbits/runProcess.c#L742Solution
When cabal calls
hsc2hs.exe
it should setuse_process_jobs = False
, so thathsc2hs.exe
will be able to use this flag. This is valid becausehsc2hs.exe
doesn't useexec
, it properly awaits on its own children.When cabal calls
ghc.exe
,gcc.exe
, or other tools, it may setuse_process_jobs = True
(because they themselves don't set theuse_process_jobs
flag)Another Solution
It may be possible to utilize
CREATE_BREAKAWAY_FROM_JOB
somehow (either in cabal or in hsc2hs). See: https://docs.microsoft.com/en-us/windows/win32/api/jobapi2/nf-jobapi2-assignprocesstojobobjectAlternative Solution
cabal should officially drop support for Windows 7 (a shame)
System information
Thank you
The text was updated successfully, but these errors were encountered: