Skip to content

Commit

Permalink
GitLab CI: hack to deal with GHC heisenbug
Browse files Browse the repository at this point in the history
Every now and then, GHC will exit with the error

```
out: mmap 131072 bytes at (nil): Cannot allocate memory
out: Try specifying an address with +RTS -xm<addr> -RTS
out: internal error: m32_allocator_init: Failed to map
    (GHC version 9.0.2 for x86_64_unknown_linux)
    Please report this as a GHC bug:  https://www.haskell.org/ghc/reportabug
```

(when the binary is named `out`). For some reason this problem has
become more pronounced for us. Since we invoke GHC/Clash an awful amount
of times in some of our CI tests, the chances of hitting it in one of
those invocations are really high. Additionally, it seems some binaries
have really high odds of exhibiting the issue.

This commit wraps the `ghc`, `ghci`, `clash` and `clashi` binaries in a
Bash script that will retry for a total of twenty(!) times when this
error message is observed. The number of retries can be configured with
the "-t" option argument.

However, the test suite also compiles Haskell code to a binary and then
runs that binary. These binaries have the same issues, but they don't
come from the PATH, so we can't intercept them like we can for things
that are on the PATH. For this, we introduce a new Tasty test provider
that also tries up to twenty times when the heisenbug's error message is
observed.

We need both solutions because we are also seeing the problem on
`doctests` wich don't involve our Tasty test providers, so these need to
be covered by the script approach. Any `clash` invocations from Tasty
are not retried since the Bash script already does that.

We think this problem occurs on every combination of GHC version and
Linux kernel version, but we are seeing it (almost?) exclusively on GHC
9.0.2.
  • Loading branch information
DigitalBrains1 committed Mar 26, 2023
1 parent 4791875 commit 5389ecc
Show file tree
Hide file tree
Showing 10 changed files with 213 additions and 9 deletions.
1 change: 1 addition & 0 deletions .ci/gitlab/common.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@
- export
- tar -xf cache.tar.zst -C / || true
- .ci/setup.sh
- PATH="$(readlink -f .ci/retry-ghc-heisenbug/bin):$PATH"
after_script:
- tar -cf - $(ls -d /root/.cabal /root/.stack || true) | zstd -T${THREADS} -3 > cache.tar.zst

Expand Down
4 changes: 3 additions & 1 deletion .ci/gitlab/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ stages:
- export clash_cosim_datadir=$(pwd)/clash-cosim/
- export
- tar -xf dist.tar.zst -C /
- PATH="$(readlink -f .ci/retry-ghc-heisenbug/bin):$PATH"

# Not all package in cache get packed into dist.tar.zst, so we need to
# regenerate the package database
Expand All @@ -41,6 +42,7 @@ stages:
- tar -xf cache.tar.zst -C / || true
- tar -xf dist.tar.zst -C /
- .ci/setup.sh
- PATH="$(readlink -f .ci/retry-ghc-heisenbug/bin):$PATH"

# Not all package in cache get packed into dist.tar.zst, so we need to
# regenerate the package database
Expand Down Expand Up @@ -94,7 +96,7 @@ lib:unittests:
prelude:doctests:
extends: .test-nocache
script:
- ./dist-newstyle/build/*/*/clash-prelude-*/t/doctests/build/doctests/doctests -j${THREADS}
- .ci/retry-ghc-heisenbug/script -v ./dist-newstyle/build/*/*/clash-prelude-*/t/doctests/build/doctests/doctests -j${THREADS}

# Tests run on local fast machines:

Expand Down
1 change: 1 addition & 0 deletions .ci/retry-ghc-heisenbug/bin/clash
1 change: 1 addition & 0 deletions .ci/retry-ghc-heisenbug/bin/clashi
1 change: 1 addition & 0 deletions .ci/retry-ghc-heisenbug/bin/ghc
1 change: 1 addition & 0 deletions .ci/retry-ghc-heisenbug/bin/ghci
75 changes: 75 additions & 0 deletions .ci/retry-ghc-heisenbug/script
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
#!/bin/bash

# Every now and then, GHC will exit with the error
#
# out: mmap 131072 bytes at (nil): Cannot allocate memory
# out: Try specifying an address with +RTS -xm<addr> -RTS
# out: internal error: m32_allocator_init: Failed to map
# (GHC version 9.0.2 for x86_64_unknown_linux)
# Please report this as a GHC bug: https://www.haskell.org/ghc/reportabug
#
# (when the binary is named `out`).
#
# Since we invoke GHC/Clash an awful amount of times in some of our CI tests,
# the chances of hitting it in one of those invocations are really high.
# Additionally, it seems some binaries have really high odds of exhibiting the
# issue.
#
# This script wraps the relevant binaries and will retry for a total of
# twenty(!) times when this error message is observed. The number of retries
# can be configured with the "-t" option argument.
#
# The mmap error message itself will not be observed on stdout and stderr of
# this script, but anything printed before that by the wrapped binary will
# occur multiple times as the binary is run again. Given that we usually test
# either that stdout or stderr is empty or that it contains or does not
# contain a certain string, duplication does not affect the outcome of the
# test.
#
# We think this problem occurs on every combination of GHC version and
# Linux kernel version, but we are seeing it (almost?) exclusively on GHC
# 9.0.2.

unset VERBOSE MAXTRIES PROG TRIES STATUS

MAXTRIES=20
while true; do
case "$1" in
-v)
VERBOSE=1
;;
-t)
MAXTRIES="$2"
shift
;;
*)
break
;;
esac
shift
done

PROG="$1"
shift

exec 3>&1

TRIES=1
[[ -n $VERBOSE ]] && echo "retry-ghc-heisenbug: Run $PROG" >&2
while true; do
"$PROG" "$@" 2>&1 >&3 | \
awk '
/mmap 131072 bytes at \(nil\): Cannot allocate memory/ \
{ exit 1 }
{ print }' >&2
STATUS=("${PIPESTATUS[@]}")
[[ ${STATUS[1]} -eq 0 ]] && exit ${STATUS[0]}
((TRIES++))
[[ $TRIES -gt $MAXTRIES ]] && break
[[ -n $VERBOSE ]] && echo "retry-ghc-heisenbug: RETRYING due to heisenbug (try: $TRIES)" >&2
done

exec 3>&-

echo "retry-ghc-heisenbug: Heisenbug limit EXCEEDED" >&2
exit -6
21 changes: 21 additions & 0 deletions .ci/retry-ghc-heisenbug/wrap
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
#!/bin/bash

unset PROG PROGDIR REALPROG

PROG="$(basename $0)"
PROGDIR="$(dirname $0)"

if [[ -n "$RETRY_GHC_HEISENBUG_LOOP_DETECT" ]]; then
echo "retry-ghc-heisenbug: Loop detected, aborting" >&2
exit 1
fi
export RETRY_GHC_HEISENBUG_LOOP_DETECT=1

REALPROG="$(which -a $PROG | tail -n+2 | head -n1)"

if [[ -z "$REALPROG" ]]; then
echo "retry-ghc-heisenbug: Real $PROG not found, aborting" >&2
exit 1
fi

"$PROGDIR/../script" "$REALPROG" "$@"
2 changes: 1 addition & 1 deletion tests/src/Test/Tasty/Clash.hs
Original file line number Diff line number Diff line change
Expand Up @@ -307,7 +307,7 @@ instance IsTest ClashBinaryTest where
]

execProgram oDir =
TestProgram (oDir </> "out") (oDir:cbExtraExecArgs) NoGlob PrintStdErr False Nothing []
TestHeisenbugProgram (oDir </> "out") (oDir:cbExtraExecArgs) NoGlob PrintStdErr False Nothing []

testOptions = coerce (testOptions @TestProgram)

Expand Down
115 changes: 108 additions & 7 deletions tests/src/Test/Tasty/Program.hs
Original file line number Diff line number Diff line change
Expand Up @@ -71,12 +71,15 @@ module Test.Tasty.Program (
, ExpectOutput(..)
, TestProgram(..)
, TestFailingProgram(..)
, TestHeisenbugProgram(..)
, TestFailingHeisenbugProgram(..)
) where

import qualified Clash.Util.Interpolate as I
import qualified Data.List as List

import Control.Applicative ( Alternative (..) )
import Control.Monad.Extra ( firstJustM, unless )
import Data.Typeable ( Typeable )
import Data.Maybe ( fromMaybe, isNothing, listToMaybe )
import System.FilePath.Glob ( globDir1, compile )
Expand All @@ -96,6 +99,8 @@ import Text.Regex.TDFA.Text ( Regex, execute )

import qualified Data.Text as T

import Debug.Trace (traceIO)

data ExpectOutput a
= ExpectStdOut a
| ExpectStdErr a
Expand Down Expand Up @@ -138,6 +143,24 @@ data TestProgram =
-- ^ Additional environment variables
deriving (Typeable)

data TestHeisenbugProgram =
TestHeisenbugProgram
String
-- ^ Executable
[String]
-- ^ Executable args
GlobArgs
-- ^ Whether to interpret glob patterns in arguments
PrintOutput
-- ^ What output to print on test success
Bool
-- ^ Whether a non-empty stdout means failure
(Maybe FilePath)
-- ^ Work directory
[(String, String)]
-- ^ Additional environment variables
deriving (Typeable)

data TestFailingProgram =
TestFailingProgram
Bool
Expand All @@ -162,6 +185,30 @@ data TestFailingProgram =
-- ^ Additional environment variables
deriving (Typeable)

data TestFailingHeisenbugProgram =
TestFailingHeisenbugProgram
Bool
-- ^ Test exit code
String
-- ^ Executable
[String]
-- ^ Executable args
GlobArgs
-- ^ Whether to interpret glob patterns in arguments
PrintOutput
-- ^ What output to print on test success
Bool
-- ^ Whether an empty stderr means test failure
(Maybe Int)
-- ^ Expected return code
(ExpectOutput T.Text)
-- ^ Expected string in stderr
(Maybe FilePath)
-- ^ Work directory
[(String, String)]
-- ^ Additional environment variables
deriving (Typeable)

testOutput
:: PrintOutput
-- ^ What output to return
Expand Down Expand Up @@ -275,7 +322,20 @@ instance IsTest TestProgram where
-- Execute program
case execFound of
Nothing -> return $ execNotFoundFailure program
Just progPath -> runProgram progPath args' stdO stdF workDir addEnv
Just progPath -> runProgram False progPath args' stdO stdF workDir addEnv

testOptions = return []

instance IsTest TestHeisenbugProgram where
run opts (TestHeisenbugProgram program args glob stdO stdF workDir addEnv) _ = do
execFound <- findExecutableAlt program

args' <- globArgs glob workDir args

-- Execute program
case execFound of
Nothing -> return $ execNotFoundFailure program
Just progPath -> runProgram True progPath args' stdO stdF workDir addEnv

testOptions = return []

Expand All @@ -288,14 +348,29 @@ instance IsTest TestFailingProgram where
-- Execute program
case execFound of
Nothing -> return $ execNotFoundFailure program
Just progPath -> runFailingProgram testExitCode progPath args stdO stdF errCode expectedOutput workDir addEnv
Just progPath -> runFailingProgram False testExitCode progPath args stdO stdF errCode expectedOutput workDir addEnv

testOptions = return []

instance IsTest TestFailingHeisenbugProgram where
run _opts (TestFailingHeisenbugProgram testExitCode program args glob stdO stdF errCode expectedOutput workDir addEnv) _ = do
execFound <- findExecutableAlt program

args' <- globArgs glob workDir args

-- Execute program
case execFound of
Nothing -> return $ execNotFoundFailure program
Just progPath -> runFailingProgram True testExitCode progPath args stdO stdF errCode expectedOutput workDir addEnv

testOptions = return []

-- | Run a program with given options and optional working directory.
-- Return success if program exits with success code.
runProgram
:: String
:: Bool
-- ^ Should we retry when we see the heisenbug error msg?
-> String
-- ^ Program name
-> [String]
-- ^ Program options
Expand All @@ -308,10 +383,11 @@ runProgram
-> [(String, String)]
-- ^ Additional environment variables
-> IO Result
runProgram program args stdO stdF workDir addEnv = do
runProgram heisen program args stdO stdF workDir addEnv = do
e <- getEnvironment
let cp = (proc program args) { cwd = workDir, env = Just (addEnv ++ e) }
(exitCode, stdout, stderr) <- readCreateProcessWithExitCode cp ""
(exitCode, stdout, stderr) <-
filterHeisenbugProcess heisen $ readCreateProcessWithExitCode cp ""

-- For debugging: Uncomment this to print executable and and its arguments
--putStrLn $ show program ++ " " ++ concatMap (++ " ") args
Expand All @@ -333,6 +409,8 @@ runProgram program args stdO stdF workDir addEnv = do
-- all.
runFailingProgram
:: Bool
-- ^ Should we retry when we see the heisenbug error msg?
-> Bool
-- ^ Test exit code?
-> String
-- ^ Program name
Expand All @@ -352,10 +430,11 @@ runFailingProgram
-> [(String, String)]
-- ^ Additional environment variables
-> IO Result
runFailingProgram testExitCode program args stdO errOnEmptyStderr expectedCode expectedStderr workDir addEnv = do
runFailingProgram heisen testExitCode program args stdO errOnEmptyStderr expectedCode expectedStderr workDir addEnv = do
e <- getEnvironment
let cp = (proc program args) { cwd = workDir, env = Just (addEnv ++ e) }
(exitCode0, stdout, stderr) <- readCreateProcessWithExitCode cp ""
(exitCode0, stdout, stderr) <-
filterHeisenbugProcess heisen $ readCreateProcessWithExitCode cp ""

-- For debugging: Uncomment this to print executable and and its arguments
--putStrLn $ show program ++ " " ++ concatMap (++ " ") args
Expand Down Expand Up @@ -406,6 +485,28 @@ runFailingProgram testExitCode program args stdO errOnEmptyStderr expectedCode e
else
passed

filterHeisenbugProcess
:: Bool
-- ^ Should we retry when we see the heisenbug error msg?
-> IO (ExitCode, String, String)
-> IO (ExitCode, String, String)
filterHeisenbugProcess heisen process =
if heisen then do
res <- firstJustM skipHeisenbug [(1 :: Int) .. 20]
case res of
Just res0 -> pure res0
Nothing -> pure (ExitFailure (-6), "", "Heisenbug limit exceeded")
else do
process
where
skipHeisenbug cnt = do
unless (cnt == 1) $
traceIO ("Retrying due to heisenbug (try: " <> show cnt <> ")")
res@(_exitCode, _stdout, stderr) <- process
if heisenMsg `T.isInfixOf` (T.pack stderr)
then pure Nothing
else pure $ Just res
heisenMsg = T.pack "mmap 131072 bytes at (nil): Cannot allocate memory"

-- | Indicates that program does not exist in the path
execNotFoundFailure :: String -> Result
Expand Down

0 comments on commit 5389ecc

Please sign in to comment.