-
Notifications
You must be signed in to change notification settings - Fork 155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GitLab CI: hack to deal with GHC heisenbug #2443
Conversation
Have we tried to actually do the thing in the message? E.g. run the failing executables with |
d4338a9
to
f1a49cf
Compare
d9c03f4
to
72368be
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we did @christiaanb, so we could try it once and see if that fixes things (I kinda doubt it though). Otherwise LGTM.
Every now and then, GHC will exit with the error ``` out: mmap 131072 bytes at (nil): Cannot allocate memory out: Try specifying an address with +RTS -xm<addr> -RTS out: internal error: m32_allocator_init: Failed to map (GHC version 9.0.2 for x86_64_unknown_linux) Please report this as a GHC bug: https://www.haskell.org/ghc/reportabug ``` (when the binary is named `out`). For some reason this problem has become more pronounced for us. Since we invoke GHC/Clash an awful amount of times in some of our CI tests, the chances of hitting it in one of those invocations are really high. Additionally, it seems some binaries have really high odds of exhibiting the issue. This commit wraps the `ghc`, `ghci`, `clash` and `clashi` binaries in a Bash script that will retry for a total of twenty(!) times when this error message is observed. The number of retries can be configured with the "-t" option argument. However, the test suite also compiles Haskell code to a binary and then runs that binary. These binaries have the same issues, but they don't come from the PATH, so we can't intercept them like we can for things that are on the PATH. For this, we introduce a new Tasty test provider that also tries up to twenty times when the heisenbug's error message is observed. We need both solutions because we are also seeing the problem on `doctests` wich don't involve our Tasty test providers, so these need to be covered by the script approach. Any `clash` invocations from Tasty are not retried since the Bash script already does that. We think this problem occurs on every combination of GHC version and Linux kernel version, but we are seeing it (almost?) exclusively on GHC 9.0.2.
72368be
to
1669ed0
Compare
1669ed0
to
8c4aa92
Compare
Interestingly, compiling a binary with It is interesting how the doc seems to suggest that this problem is not about compiled Haskell binaries at all:
GHCi has nothing to do with the problem we see here whatsoever. |
And indeed instead of compiling with |
In very specific tests in GitLab CI we are affected by GHC bug #19421. We can work around the issue by passing `-with-rtsopts=-xm20000000` when compiling an affected binary. This is a stopgap measure until the real bug is fixed. We have seen the bug in: - In `clash-testsuite` in `clashLibTest`s - In `ffi:example` in the `clash` binary itself - In `prelude:doctests`, probably in the `doctests` binary itself, although this is not certain. This workaround was applied only to those cases that were observed to go wrong, although as a consequence now the `clash` binary is always built with the RTS option.
a47d2b4
to
62c1ca0
Compare
Superseded by #2444 |
Every now and then, GHC will exit with the error
(when the binary is named
out
). For some reason this problem has become more pronounced for us. Since we invoke GHC/Clash an awful amount of times in some of our CI tests, the chances of hitting it in one of those invocations are really high. Additionally, it seems some binaries have really high odds of exhibiting the issue.This commit wraps the
ghc
,ghci
,clash
andclashi
binaries in a Bash script that will retry for a total of twenty(!) times when this error message is observed. The number of retries can be configured with the "-t" option argument.However, the test suite also compiles Haskell code to a binary and then runs that binary. These binaries have the same issues, but they don't come from the PATH, so we can't intercept them like we can for things that are on the PATH. For this, we introduce a new Tasty test provider that also tries up to twenty times when the heisenbug's error message is observed.
We need both solutions because we are also seeing the problem on
doctests
wich don't involve our Tasty test providers, so these need to be covered by the script approach. Anyclash
invocations from Tasty are not retried since the Bash script already does that.We think this problem occurs on every combination of GHC version and Linux kernel version, but we are seeing it (almost?) exclusively on GHC 9.0.2.