From 554098387200495fd520146b59d943e9968150ff Mon Sep 17 00:00:00 2001 From: Renato Costa Date: Thu, 24 Aug 2023 16:52:06 -0400 Subject: [PATCH] build/roachtest: do not exit with code 11 on cluster creation failure roachtest will exit with code 11 if creating any clusters during a test run failed. However, that is not ideal for a few reasons: * Cluster creation often fails, partly because of temporary unavailability of a resource type in a data center; and partly because of issues in roachtest itself (see #104029). * Exiting with code 11 causes the build to be marked and reported as a failrue on TeamCity/Slack and that's disruptive. We already get cluster creation failure notifications on GitHub. By reporting them as build failures on TeamCity, we mask actually serious issues like the test runner crashing in the middle of the build and not running every test (for a recent example, see #109279). For these reasons, this commit updates the script used by TeamCity to invoke roachtest to also ignore exit code 11 (just like it currently does for exit code 10). This makes roachtest build failures stand out more, as they will mean roachtest was unable to run all tests. Epic: none Release note: None --- build/teamcity-roachtest-invoke.sh | 14 ++++++++------ pkg/cmd/roachtest/main.go | 5 +++++ 2 files changed, 13 insertions(+), 6 deletions(-) diff --git a/build/teamcity-roachtest-invoke.sh b/build/teamcity-roachtest-invoke.sh index cf4275ea89f3..f4a0e3bfefb8 100755 --- a/build/teamcity-roachtest-invoke.sh +++ b/build/teamcity-roachtest-invoke.sh @@ -11,12 +11,14 @@ bin/roachtest run \ code=$? set -e -if [[ ${code} -eq 10 ]]; then - # Exit code 10 indicates that some tests failed, but that roachtest - # as a whole passed. We want to exit zero in this case so that we - # can let TeamCity report failing tests without also failing the - # build. That way, build failures can be used to notify about serious - # problems that prevent tests from being invoked in the first place. +if [[ ${code} == 10 || ${code} == 11 ]]; then + # Exit code 10 indicates that some tests failed; exit code 11 + # indicates that cluster creation failed for some test in the + # run. In both cases, roachtest as a whole passed. We want to exit + # zero in this case so that we can let TeamCity report failing tests + # without also failing the build. That way, build failures can be + # used to notify about serious problems that prevent tests from + # being invoked in the first place (typically exit code 1). code=0 fi diff --git a/pkg/cmd/roachtest/main.go b/pkg/cmd/roachtest/main.go index daeb652aeef4..d1de2ccd89d0 100644 --- a/pkg/cmd/roachtest/main.go +++ b/pkg/cmd/roachtest/main.go @@ -43,6 +43,11 @@ import ( "github.com/spf13/pflag" ) +// Note that the custom exit codes below are not exposed when running +// roachtest on TeamCity. See `teamcity-roachtest-invoke.sh` for more +// details. Also, if the exit codes here change, they need to updated +// on that script accordingly. + // ExitCodeTestsFailed is the exit code that results from a run of // roachtest in which the infrastructure worked, but at least one // test failed.