-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
roachtest: Allow roachtests to opt out of failing in post validation … #102162
Conversation
…when dead nodes are detected. Epic: none Fixes: cockroachdb#102131 Release note: None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @herkolategan, @renatolabs, and @smg260)
pkg/cmd/roachtest/cluster.go
line 1354 at r1 (raw file):
// If there's an error, it means either that the monitor command failed // completely, or that it found a dead node worth complaining about. if n.Err != nil || strings.HasPrefix(n.Msg, "dead") {
Just want to make sure this prefix is invariant? Where is its source?
pkg/cmd/roachtest/test_runner.go
line 1088 at r1 (raw file):
if err := c.assertNoDeadNode(ctx, t); err != nil { // Some tests expect dead nodes, so they may opt out of this check. if t.spec.SkipPostValidations®istry.PostValidationNoDeadNodes == 0 {
Could we log the fact that this validation was skipped?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @herkolategan, @renatolabs, and @srosenberg)
pkg/cmd/roachtest/cluster.go
line 1354 at r1 (raw file):
Previously, srosenberg (Stan Rosenberg) wrote…
Just want to make sure this prefix is invariant? Where is its source?
yep it's part of c.Monitor
...
if [ "${pid}" != "${lastpid}" ]; then
if [ "${lastpid}" != 0 ]; then
if [ "${pid}" != 0 ]; then
# If the PID changed but neither is zero, then the status refers to
# the new incarnation. We lost the actual exit status of the old PID.
status="unknown"
fi
echo "dead (exit status ${status})"
fi
if [ "${pid}" != 0 ]; then
echo "${pid}"
fi
lastpid=${pid}
fi
...
I removed the previous check which defaulted to assuming dead, because sometimes we are unable to verify the node for unrelated reasons (SSH) and end up failing an otherwise passing test.
pkg/cmd/roachtest/test_runner.go
line 1088 at r1 (raw file):
Previously, srosenberg (Stan Rosenberg) wrote…
Could we log the fact that this validation was skipped?
The opt-out in this case won't actually skip reporting the status of each node, but just whether we record an error. The t.L()...
below logs the case where we are skipping recording an error despite detecting a dead node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @herkolategan, @renatolabs, and @smg260)
pkg/cmd/roachtest/cluster.go
line 1354 at r1 (raw file):
Previously, smg260 (Miral Gadani) wrote…
yep it's part of
c.Monitor
... if [ "${pid}" != "${lastpid}" ]; then if [ "${lastpid}" != 0 ]; then if [ "${pid}" != 0 ]; then # If the PID changed but neither is zero, then the status refers to # the new incarnation. We lost the actual exit status of the old PID. status="unknown" fi echo "dead (exit status ${status})" fi if [ "${pid}" != 0 ]; then echo "${pid}" fi lastpid=${pid} fi ...
I removed the previous check which defaulted to assuming dead, because sometimes we are unable to verify the node for unrelated reasons (SSH) and end up failing an otherwise passing test.
Nice, thanks!
pkg/cmd/roachtest/test_runner.go
line 1088 at r1 (raw file):
Previously, smg260 (Miral Gadani) wrote…
The opt-out in this case won't actually skip reporting the status of each node, but just whether we record an error. The
t.L()...
below logs the case where we are skipping recording an error despite detecting a dead node.
Yep, I missed the line below.
TFTR bors r=srosenberg |
Build succeeded: |
blathers backport 23.1 |
Encountered an error creating backports. Some common things that can go wrong:
You might need to create your backport manually using the backport tool. error creating merge commit from bfbc704 to blathers/backport-release-23.1-102162: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 23.1 failed. See errors above. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
…when
dead nodes are detected.
Skips dead node assertion failures in all roachtests in #102131.
Also, the change explicitly checks for a message starting with "dead" to avoid failing on a flake during teardown.
Epic: none
Fixes: #102131
Release note: None