Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Runtime failure on Windows on version 0.7.0.0 #65

Closed
L0neGamer opened this issue Mar 7, 2024 · 29 comments · Fixed by #70
Closed

Runtime failure on Windows on version 0.7.0.0 #65

L0neGamer opened this issue Mar 7, 2024 · 29 comments · Fixed by #70

Comments

@L0neGamer
Copy link

L0neGamer commented Mar 7, 2024

To observe the issue, do the following on Windows (having installed GHC 9.2.8+).

Have an example.cabal with the following contents:

cabal-version: 3.8
name: example
version: 0.1
executable example
  build-depends: base, zlib == 0.7.0.0
  main-is: Main.hs

Have a file Main.hs with the following contents:

module Main where
import Codec.Compression.Zlib.Raw
main = do
    putStrLn "Test"
c = compress

run cabal build, then cabal exec example. For some reason, Test is not printed to stdout. Running echo $lastexitcode shows that the exit code given is -1073741701, which from a quick google is typically related to incorrect linkings. Note that this is a runtime failure, not a build failure.

Changing the zlib version to 0.6.3.0 (which is the previous version) means that this program works.

This is probably related to Do not force bundled-c-zlib on Windows, but force it for WASM. in the previous release, if I had to guess.

This error arose when similar code was written using a library massively downstream of zlib (discord-haskell, with code as below). This is even more surprising, since I'm pretty sure that restCall shouldn't directly reference compress or similar values

module Main where

import Discord

main :: IO ()
main = do
    putStrLn "Test"

a :: (Request (r a), FromJSON a) => r a -> DiscordHandler (Either RestCallErrorCode a)
a = restCall

Other notes include is that my Windows haskell setup is entirely fresh and made specifically to test this out, so it's unlikely to be an issue with my machine (also considering that someone else brought this issue to me).

@Bodigrim
Copy link
Contributor

Bodigrim commented Mar 7, 2024

@L0neGamer is it possible to reproduce the issue with other versions of GHC, newer than 9.2.8?

We have a CI job for Windows + GHC 9.2.8 which seems to succeed, so I'm at loss what's up.

@L0neGamer
Copy link
Author

Sorry I wasn't clear, when I said 9.2.8+ I meant that version and onwards. Also tested on .4.8 and .6.4

@Bodigrim
Copy link
Contributor

Bodigrim commented Mar 7, 2024

That's very weird. Can you contribute a reproducer expressed as a CI job?

@Bodigrim
Copy link
Contributor

Bodigrim commented Mar 7, 2024

Also, what's the Cabal version you are using?

@L0neGamer
Copy link
Author

cabal --version -> 3.10.2.1

I'm not sure how I'd do the CI job thing but I can try look into it? It'd probably be best for someone else to though.

@L0neGamer
Copy link
Author

L0neGamer commented Mar 7, 2024

Looking at the CI jobs, the only two relating to windows I can immediately see is one that builds and one that runs with bundled-c-zlib enabled, which is likely the issue here.

Can confirm that running cabal run -c 'zlib +bundled-c-zlib' results in the correct behaviour (that is, Test prints).

@Bodigrim
Copy link
Contributor

Bodigrim commented Mar 7, 2024

Well, but the job without bundled-c-zlib also succeeds in CI environment, right? If it runs tests, it means that it linked successfully.

@L0neGamer
Copy link
Author

True. I don't know enough how this stuff works or what the windows environment looks like; if you've reading material or a suggestion of where to read up I can have a go at some stage.

@Bodigrim
Copy link
Contributor

Bodigrim commented Mar 8, 2024

The thing is that zlib links fine on a Windows machine I have access to. So I cannot investigate any further without a portable reproducer.

It might be worth to raise the issue at https://gitlab.haskell.org/ghc/ghc/-/issues: it's GHC's responsibility to link correctly (or abort compilation if it's impossible to do so).

@L0neGamer
Copy link
Author

I'll look into raising it over there soon; at the very least maybe I'll be able to get a reproducer for here from them.

@fendor
Copy link

fendor commented Mar 9, 2024

I was also bitten by this on my windows 10 machine. I was able to reproduce the issue while building cabal HEAD. GHC 9.4.8 and cabal 3.10.2.1.

@Bodigrim
Copy link
Contributor

Bodigrim commented Mar 9, 2024

@fendor please give me a reproducer in a form of CI job.

@fendor
Copy link

fendor commented Mar 9, 2024

No windows runner supported by github (it is just windows-2019 and windows-2022) seems to be able to reproduce the issue right now.

@Bodigrim
Copy link
Contributor

Bodigrim commented Mar 9, 2024

@fendor you can also try flipping pkg-config flag: I suspect GHA runners are likely to have it pre-installed, but your local environment probably does not.

Otherwise file a GHC issue please.

@fendor
Copy link

fendor commented Mar 9, 2024

With the pkg-config flag:

$ cabal repl exes --constraint="zlib +pkg-config"                                                                                                                                                                                                                                                           Resolving dependencies...
Error: cabal-3.10.2.1.exe: Could not resolve dependencies:
[__0] trying: zlib-ghc-windows-0.1 (user goal)
[__1] trying: zlib-0.7.0.0 (dependency of zlib-ghc-windows)
[__2] trying: zlib:-bundled-c-zlib
[__3] rejecting: zlib:+pkg-config (conflict: pkg-config package zlib-any, not
found in the pkg-config database)
[__3] rejecting: zlib:-pkg-config (constraint from command line flag requires
opposite flag selection)
[__3] fail (backjumping, conflict set: zlib, zlib:bundled-c-zlib,
zlib:pkg-config)
After searching the rest of the dependency tree exhaustively, these were the
goals I've had most trouble fulfilling: zlib, zlib-ghc-windows,
zlib:bundled-c-zlib, zlib:pkg-config
Try running with --minimize-conflict-set to improve the error message.    

I will file a ghc issue either way.

@Bodigrim
Copy link
Contributor

Bodigrim commented Mar 9, 2024

@fendor I think ultimately it's either Cabal or GHC responsibility: if extra-libraries: z is not available or is no good to link with, they should tell so loudly instead of producing segfaulting artefacts.

@fendor
Copy link

fendor commented Mar 9, 2024

I agree. I am looking into it a little bit.

@fendor
Copy link

fendor commented Mar 11, 2024

Tracking this issue in ghc: https://gitlab.haskell.org/ghc/ghc/-/issues/24531

@andreasabel
Copy link
Member

@mpilgrem
Copy link

mpilgrem commented Apr 20, 2024

From recollection, MSYS2 does not come with pkg-config.exe by default and you have to manually install https://packages.msys2.org/package/mingw-w64-x86_64-pkgconf. EDIT: Recollection confirmed with a fresh Stack-supplied MSYS2:

❯ stack exec -- where.exe pkg-config
INFO: Could not find files for the given pattern(s).

@Bodigrim
Copy link
Contributor

Bodigrim commented Apr 20, 2024

@andreasabel I think that one is an orthogonal, Stack-specific issue, not quite related to the error here (which is that pkg-config exists, advertises zlib C library as available, zlib C library is advertised as available, but linking fails eventually).

@mpilgrem
Copy link

mpilgrem commented Apr 20, 2024

@Bodigrim, this may be off-topic for this particular issue (EDIT: perhaps on topic for #64), but why has zlib-0.7 chosen to make the default for its Cabal flag pkg-config true on Windows? If I set the flag to false, zlib-0.7 works fine 'out of the box' on Windows.

The problem I have is: if I have a dependency on zlib (as I do in stack.cabal), and I am using Windows, how do I specify that its pkg-config Cabal flag needs to be set to false? I don't think you can do that with Cabal, and Stack's flags configuration option is not conditional on operating system. Is the only solution to set the pkg-config flag to false for all operating systems (EDIT: that is, using Stack's flags configuration option)?

@Bodigrim
Copy link
Contributor

The problem I have is: if I have a dependency on zlib (as I do in stack.cabal), and I am using Windows, how do I specify that its pkg-config Cabal flag needs to be set to false? I don't think you can do that with Cabal, and Stack's flags configuration option is not conditional on operating system.

pkg-config is an automatic flag and Cabal is happy to solve it depending on environment, so normally there is nothing to specify. Even if it was not automatic, cabal.project supports conditions based on OS.

As I said commercialhaskell/stack#6557, Stackage snapshots should set pkg-config to false uniformly, yes.

@RyanGlScott
Copy link
Member

RyanGlScott commented Apr 21, 2024

As noted in https://gitlab.haskell.org/ghc/ghc/-/issues/24531#note_559785, the situation on Windows is a little complicated. GHC always links against <ghc-install-dir>/mingw/lib, as this contains libraries that are needed for GHC's RTS (among other things). However, this library also contains libz.dll.a, an import library that tells GHC to dynamically load the zlib1.dll shared library at runtime. As far as the linker is concerned, the presence of libz.dll.a at link time means that everything is working as expected.

Where things go wrong is when you actually run the executable. Due to how dynamic linking works on Windows, the loader can't know ahead of time where zlib1.dll is (there are no rpaths on Windows), so the loader instead searches your PATH for zlib1.dll. There is a zlib1.dll file located in <ghc-install-dir>/mingw/bin, but most users won't have that on their PATH (and it's unclear if that would be advisable in general). Therefore, the executable will fail at runtime when it can't find zlib1.dll.

Many GHC users also have MinGW-w64 installed (via MSYS2), and when you run something in an MSYS2 shell, it will add a directory to your PATH that contains another copy of zlib1.dll. As such, this issue may not occur for you locally if you are running in MSYS2. If that is the case, try running the same commands in PowerShell (and make sure that you didn't add any MSYS2 directories to your PATH).

@RyanGlScott
Copy link
Member

Having said all of that, it's unclear to me what can be done about this on the GHC side. I am not a Windows GHC expert, so I presume that there is a good reason for including libz.dll.a in <ghc-install-dir>/mingw/lib, but it does have the unfortunate side effect of messing with .cabal files that depend on extra-libraries: z.

A workaround would be to compile zlib using the bundled-c-zlib or pkg-config flags. I wonder if bundled-c-zlib should be the default on Windows until we figure out how to resolve https://gitlab.haskell.org/ghc/ghc/-/issues/24531.

@Bodigrim
Copy link
Contributor

However, this library also contains libz.dll.a, an import library that tells GHC to dynamically load the zlib1.dll shared library at runtime.

@RyanGlScott is there any way to force static linking? Or is libz.dll.a only dynamically-linkable?

it does have the unfortunate side effect of messing with .cabal files that depend on extra-libraries: z.

Is my understanding correct that we can never trust extra-libraries: z, because we do not know whether it is a static or dynamic library?

@RyanGlScott
Copy link
Member

RyanGlScott commented Apr 21, 2024

is there any way to force static linking?

In principle, yes, although I haven't managed to figure out its quirks. GHC accepts the -l:libXYZ.a syntax, which instructs the linker to link against a specific file. With this, you can tell GHC to link against libz.a (a static archive) instead of defaulting to libz.dll.a import library (which is what would happen if you passed -lz).

That being said, this appears to be somewhat buggy in practice. I tried modifying zlib.cabal like so:

diff --git a/zlib.cabal b/zlib.cabal
index 24e2595..22aff8b 100644
--- a/zlib.cabal
+++ b/zlib.cabal
@@ -118,7 +118,7 @@ library
       pkgconfig-depends: zlib
     else
       -- On Windows zlib is shipped with GHC starting from 7.10
-      extra-libraries: z
+      extra-libraries: :libz.a

 test-suite tests
   type: exitcode-stdio-1.0

But that fails with a different linker error when building the executable:

Building executable 'example' for zlib-ghc-windows-0.1..
[2 of 2] Linking C:\\Users\\winferno\\Documents\\Hacking\\Haskell\\zlib-ghc-windows-65\\dist-newstyle\\build\\x86_64-windows\\ghc-9.4.8\\zlib-ghc-windows-0.1\\x\\example\\build\\example\\example.exe
ld.lld: warning: ignoring unknown argument: -exclude-symbols:zcalloc
ld.lld: warning: ignoring unknown argument: -exclude-symbols:zcfree
ld.lld: error: -exclude-symbols:zcalloc is not allowed in .drectve
ld.lld: error: -exclude-symbols:zcfree is not allowed in .drectve
ld.lld: warning: ignoring unknown argument: -exclude-symbols:_tr_init
ld.lld: warning: ignoring unknown argument: -exclude-symbols:_tr_stored_block
ld.lld: warning: ignoring unknown argument: -exclude-symbols:_tr_flush_bits
ld.lld: warning: ignoring unknown argument: -exclude-symbols:_tr_align
ld.lld: warning: ignoring unknown argument: -exclude-symbols:_tr_flush_block
ld.lld: warning: ignoring unknown argument: -exclude-symbols:_tr_tally
ld.lld: warning: ignoring unknown argument: -exclude-symbols:_dist_code
ld.lld: warning: ignoring unknown argument: -exclude-symbols:_length_code
ld.lld: error: -exclude-symbols:_tr_init is not allowed in .drectve
ld.lld: error: -exclude-symbols:_tr_stored_block is not allowed in .drectve
ld.lld: error: -exclude-symbols:_tr_flush_bits is not allowed in .drectve
ld.lld: error: -exclude-symbols:_tr_align is not allowed in .drectve
ld.lld: error: -exclude-symbols:_tr_flush_block is not allowed in .drectve
ld.lld: error: -exclude-symbols:_tr_tally is not allowed in .drectve
ld.lld: error: -exclude-symbols:_dist_code is not allowed in .drectve
ld.lld: error: -exclude-symbols:_length_code is not allowed in .drectve
ld.lld: warning: ignoring unknown argument: -exclude-symbols:inflate_table
ld.lld: error: -exclude-symbols:inflate_table is not allowed in .drectve
ld.lld: warning: ignoring unknown argument: -exclude-symbols:inflate_fast
ld.lld: error: -exclude-symbols:inflate_fast is not allowed in .drectve
clang: error: linker command failed with exit code 1 (use -v to see invocation)
ghc-9.4.8.exe: `clang.exe' failed in phase `Linker'. (Exit code: 1)

Is my understanding correct that we can never trust extra-libraries: z, because we do not know whether it is a static or dynamic library?

The issue isn't really static vs. dynamic libraries, but rather dynamic libraries that are on your runtime search path (e.g., MinGW-w64 libraries) versus ones that aren't (e.g., libraries that are bundled with GHC). Using a dynamically linked libz is perfectly fine provided that the dyanamic loader knows where it is at runtime, and this is precisely why the pkg-config option works most of the time.

@mpilgrem
Copy link

mpilgrem commented Apr 21, 2024

On Windows, in the Stack environment, I think the GHC-supplied zlib1.dll is always on the PATH (and first on the PATH). For example, on my system:

❯ stack --snapshot ghc-9.6.5 exec -- where.exe zlib*
C:\Users\mike\AppData\Local\Programs\stack\x86_64-windows\ghc-9.6.5\mingw\bin\zlib1.dll
C:\Program Files\gnuplot\bin\zlib1.dll
C:\Program Files (x86)\gnupg\bin\zlib1.dll
C:\Program Files\Inkscape\bin\zlib1.dll

As indicated above, a number of applications that I use put a copy of zlib1.dll on the PATH. In the past, outside of the Stack environment, I have had problems with Haskell code picking up an out-of-date version of zlib1.dll on the PATH (fixed by replacing it with an up-to-date version).

@Bodigrim
Copy link
Contributor

Thanks for the investigation @RyanGlScott!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants