Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PackageTests/NewBuild/T3827 fails on MacOS on GHC 8.10.7 #8032

Open
robx opened this issue Mar 3, 2022 · 26 comments
Open

PackageTests/NewBuild/T3827 fails on MacOS on GHC 8.10.7 #8032

robx opened this issue Mar 3, 2022 · 26 comments

Comments

@robx
Copy link
Collaborator

robx commented Mar 3, 2022

Affected:

  • macOS with GHC >= 8.10.7
  • Linux with GHC = 9.0.1, 9.2.1

PackageTests/NewBuild/T3827 fails in CI on MacOS on GHC 8.10.7:

[1 of 1] Compiling P                ( P.hs, /Users/runner/work/cabal/cabal/cabal-testsuite/PackageTests/NewBuild/T3827/cabal.dist/work/dist/build/x86_64-osx/ghc-8.10.7/p-1.0/build/P.p_o )

P.hs:1:8: error:
Error:     Could not find module �Prelude�
    Perhaps you haven't installed the profiling libraries for package �base-4.14.3.0�?
    Use -v (or `:set -v` in ghci) to see a list of the files searched for.
  |
1 | module P where
  |        ^
CallStack (from HasCallStack):
  withMetadata, called at src/Distribution/Simple/Utils.hs:374:14 in Cabal-3.7.0.0-inplace:Distribution.Simple.Utils
-----BEGIN CABAL OUTPUT-----
Error: cabal: Failed to build p-1.0-inplace.
Failed to build q-1.0 because it depends on q-1.0 which itself failed to build.
-----END CABAL OUTPUT-----

*** unexpected failure for PackageTests/NewBuild/T3827/cabal.test.hs

https://github.com/haskell/cabal/runs/5412517972?check_suite_focus=true

It looks like a profiling build of base isn't available, might be a CI environment issue?

@robx robx added platform: mac type: testing Issues about project test suites labels Mar 3, 2022
@robx
Copy link
Collaborator Author

robx commented Mar 3, 2022

The test seems to pass for older GHC versions, while for newer those particular tests aren't run in CI.

robx added a commit to robx/cabal that referenced this issue Mar 3, 2022
@Mikolaj
Copy link
Member

Mikolaj commented Mar 3, 2022

I can't find the ticket now, but I think indeed some newer GHCs in ghcup don't have profiling libs bundled. Tough luck.

@robx
Copy link
Collaborator Author

robx commented Mar 3, 2022

Hmm, there's this: https://gitlab.haskell.org/ghc/ghc/-/issues/20707 but it seems to be a new 9.2 issue (as opposed to 9.0 even)

@Mikolaj
Copy link
Member

Mikolaj commented Mar 3, 2022

Yes, I think in a cabal ticket somebody said that also happened for some other GHCs. I might have misremembered though.

robx added a commit to robx/cabal that referenced this issue Mar 4, 2022
@jneira
Copy link
Member

jneira commented Mar 7, 2022

After enabling the cli-suite in ci for ghc-9.0.2 i've observed this test also fails for linux and ghc-9.0.2 installed with ghcup (but no for ghc-9.0.1 and ghc-9.2.1): https://github.com/jneira/cabal/runs/5445996258?check_suite_focus=true#step:15:289

P.hs:1:8: error:
Error:     Could not find module `Prelude'
    Perhaps you haven't installed the profiling libraries for package `base-4.15.1.0'?
    Use -v (or `:set -v` in ghci) to see a list of the files searched for.
  |
1 | module P where
  |        ^
CallStack (from HasCallStack):
  withMetadata, called at src/Distribution/Simple/Utils.hs:374:14 in Cabal-3.7.0.0-inplace:Distribution.Simple.Utils

It seems that ghc version has no profiled boot libraries

jneira added a commit to jneira/cabal that referenced this issue Mar 7, 2022
@robx
Copy link
Collaborator Author

robx commented Mar 7, 2022

Thanks, I've updated the ticket to try to cover the affected versions.

@jneira
Copy link
Member

jneira commented Mar 7, 2022

sorry the affected version is 9.0.2, I ve corrected the comment about

jneira added a commit to jneira/cabal that referenced this issue Mar 7, 2022
jneira added a commit to jneira/cabal that referenced this issue Mar 12, 2022
jneira added a commit to jneira/cabal that referenced this issue Mar 14, 2022
jneira added a commit to jneira/cabal that referenced this issue Mar 14, 2022
jneira added a commit to jneira/cabal that referenced this issue Mar 15, 2022
jneira added a commit to jneira/cabal that referenced this issue Mar 15, 2022
Kleidukos pushed a commit to Kleidukos/cabal that referenced this issue Mar 30, 2022
andreabedini pushed a commit to andreabedini/cabal that referenced this issue May 5, 2022
ulysses4ever added a commit to ulysses4ever/cabal that referenced this issue Jun 24, 2022
ulysses4ever added a commit to ulysses4ever/cabal that referenced this issue Jun 24, 2022
ulysses4ever added a commit to ulysses4ever/cabal that referenced this issue Jun 25, 2022
@ulysses4ever
Copy link
Collaborator

I was playing with it and bumped into something strange. Here's my idea: this issue is not our bug, rather it's a GHC (packaging) bug in certain versions of GHC. I thought we can specify exactly which versions of GHC are affected and close it. To that end I made the following change in the test:

-  missesProfilingLinux <- isGhcVersion ">= 9.0.2"
+  missesProfilingLinux <- isGhcVersion "== 9.0.2"
...
-  missesProfilingOsx <- isGhcVersion ">= 8.10.7"
+  missesProfilingOsx <- isGhcVersion "== 8.10.7"

(ulysses4ever@316bc21) I expected this change to go green on CI since the switch from 9.2.1 to 9.2.3 (6ce5118), because, as I verify locally, GHC 9.2.3 does have profiling libraries (also, according to GHC bug tracker it was fixed in 9.2.2 even). But the CI went red again (on 9.2.3) because of no profiling libraries again! Does anyone have an idea how that is possible?

@Mikolaj
Copy link
Member

Mikolaj commented Jun 29, 2022

Impossible. Perhaps the version CI uses has lost the profiling libraries? Is it obtained from GHA or ghcup or where?

@ulysses4ever
Copy link
Collaborator

This is the same CI that haskell/cabal has, so: GitHub Action haskell/action/setup@v1 which, in turn, uses ghcup.

@Mikolaj
Copy link
Member

Mikolaj commented Jun 30, 2022

I can only find a report about 9.2.2, not 9.2.3

https://gitlab.haskell.org/ghc/ghc/-/issues/21190

but perhaps ghcup repackages those (and fixes 9.2.2 and, implausibly, breaks 9.2.3)? I haven't looked at ghcup bugtracker (but open and closed tickets).

@ulysses4ever
Copy link
Collaborator

@hasufell do you have an idea how it's possible that I get "profiling libraries not found" with 9.2.3? https://github.com/ulysses4ever/cabal/runs/7083135188?check_suite_focus=true

@hasufell
Copy link
Member

Does anyone have an idea how that is possible?

The haskell setup action makes it hard to see which bindist exactly is installed. I've since then switched to just using ghcup directly, especially since it's pre-installed on all github actions images (and a recent version, unlike the haskell setup action).

It's possible that only some bindists are affected. E.g. hadrian has been incredibly buggy and some releases have a mixture of make and hadrian assembled bindists. Not sure about 9.2.3.

@Mikolaj
Copy link
Member

Mikolaj commented Aug 3, 2022

This is now a heisenbug on GHC 9.2.3: #8336 (and known to cause problems on 9.4), so I'm going to disable the test altogether for GHC >= 9.2. This is most probably a GHA/GHC/packaging bug, not anything to do with cabal.

@ulysses4ever
Copy link
Collaborator

@Mikolaj following your link, it's "Unexpected OK" now instead of "Unexpected FAIL". And that's no wonder because of the test itself:

  missesProfilingLinux <- isGhcVersion ">= 9.0.2"
...
  missesProfilingOsx <- isGhcVersion ">= 8.10.7"
  expectBrokenIf (linux && missesProfilingLinux || osx && missesProfilingOsx) 8032 $
...

This is not true that all GHCs above those are missing profiling libs so we expectBroken when we shouldn't.

The change I discussed above

-  missesProfilingLinux <- isGhcVersion ">= 9.0.2"
+  missesProfilingLinux <- isGhcVersion "== 9.0.2"
...
-  missesProfilingOsx <- isGhcVersion ">= 8.10.7"
+  missesProfilingOsx <- isGhcVersion "== 8.10.7"

was never implemented, but it could just fixed it: we only need to list GHCs with missing profile libs.

@Mikolaj
Copy link
Member

Mikolaj commented Aug 3, 2022

Right, but it's a heisenbug. It sometimes passes, sometimes fails with the same GHC (9.2.3, but other 9.2.* are likely, too, and 9.4 is possible as well --- I don't think it's worthwhile to create an exhaustive list of GHCs currently broken by GHA (or whatever the underyling cause may be)).

@ulysses4ever
Copy link
Collaborator

If it's nondeterministic, then yes. It's just the current code perfectly matches the error you referenced. If there are other failures, then I don't see a better solution. I'd still change those >= to something closer to reality though.

@hasufell
Copy link
Member

hasufell commented Aug 3, 2022

Again: Use ghcup directly in your github workflow, not the haskell setup action, because it reuses existing GHCs, which can be random bindists.

@jneira
Copy link
Member

jneira commented Aug 3, 2022

hmm I would bet it uses some bindist deterministically for bad or good (yeah sometimes for good, like when it installed a fixed downstream bindist from chocolatey)

Mikolaj added a commit that referenced this issue Aug 4, 2022
#8338)

* Turn off T3827 for new GHCs due to heisenbugs not caused by cabal

* Disable the test totally on Linux until we stop taking GHC from GHA

See #8032 (comment)

* It failed on OSX now, so let's disable it everywhere except on Windows

Who would have thought.
@ulysses4ever
Copy link
Collaborator

@jneira https://github.com/haskell/actions/tree/main/setup says:

The GitHub runners come with pre-installed versions of GHC and Cabal. Those will be used whenever possible. For all other versions, this action utilizes ppa:hvr/ghc, ghcup, and chocolatey.

This doesn’t strike me as a very deterministic (so to speak) algorithm. E.g. the pre-installed versions may probably change over time.


@hasufell do you have a good example of a purely ghcup-based setup in mind? I guess, some system-dependent boilerplate will be required?

@hasufell
Copy link
Member

hasufell commented Aug 4, 2022

@jneira https://github.com/haskell/actions/tree/main/setup says:

The GitHub runners come with pre-installed versions of GHC and Cabal. Those will be used whenever possible. For all other versions, this action utilizes ppa:hvr/ghc, ghcup, and chocolatey.

This doesn’t strike me as a very deterministic (so to speak) algorithm. E.g. the pre-installed versions may probably change over time.


@hasufell do you have a good example of a purely ghcup-based setup in mind? I guess, some system-dependent boilerplate will be required?

https://www.haskell.org/ghcup/guide/#continuous-integration

https://github.com/hasufell/stack2cabal/blob/master/.github/workflows/haskell.yml

https://github.com/haskell/unix/blob/a4c6a0c0a7477dfe12727c2a58f143e9f6bbf22e/.github/workflows/ci.yml#L64

@ulysses4ever
Copy link
Collaborator

https://github.com/hasufell/stack2cabal/blob/master/.github/workflows/haskell.yml

Seems to use haskell/actions/setup.

https://github.com/haskell/unix/blob/a4c6a0c0a7477dfe12727c2a58f143e9f6bbf22e/.github/workflows/ci.yml#L64

That’s nice! Doesn’t have caching of ghcup and everything that it pulls, though.

@hasufell
Copy link
Member

hasufell commented Aug 4, 2022

https://github.com/hasufell/stack2cabal/blob/master/.github/workflows/haskell.yml

Seems to use haskell/actions/setup.

https://github.com/haskell/unix/blob/a4c6a0c0a7477dfe12727c2a58f143e9f6bbf22e/.github/workflows/ci.yml#L64

That’s nice! Doesn’t have caching of ghcup and everything that it pulls, though.

You don't want caching of bindists.

@ulysses4ever
Copy link
Collaborator

ulysses4ever commented Aug 4, 2022

You don't want caching of bindists.

Why though?

@hasufell
Copy link
Member

hasufell commented Aug 4, 2022

You don't want caching of bindists.

Why though?

Causes issues if cache is broken or bindists are fixed in-place (we don't do that usually though).

The failure mode is: if caching is enabled and the bindist exists in the cache, use that. If the hash doesn't match, fail and do nothing.

Then you get those heisenbugs.

@ulysses4ever
Copy link
Collaborator

I'd think that if cache invalidation is done correctly, no problem should arise. But then again: caching is the other hard problem in computer science?..

mergify bot pushed a commit that referenced this issue Aug 10, 2022
#8338)

* Turn off T3827 for new GHCs due to heisenbugs not caused by cabal

* Disable the test totally on Linux until we stop taking GHC from GHA

See #8032 (comment)

* It failed on OSX now, so let's disable it everywhere except on Windows

Who would have thought.

(cherry picked from commit 91a343f)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants