Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: Allow roachtests to opt out of failing in post validation … #102162

Merged
merged 1 commit into from
Apr 24, 2023

Conversation

smg260
Copy link
Contributor

@smg260 smg260 commented Apr 24, 2023

…when

dead nodes are detected.

Skips dead node assertion failures in all roachtests in #102131.

Also, the change explicitly checks for a message starting with "dead" to avoid failing on a flake during teardown.

Epic: none
Fixes: #102131

Release note: None

…when

dead nodes are detected.

Epic: none
Fixes: cockroachdb#102131

Release note: None
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@smg260 smg260 marked this pull request as ready for review April 24, 2023 17:48
@smg260 smg260 requested a review from a team as a code owner April 24, 2023 17:48
@smg260 smg260 requested review from srosenberg, renatolabs and herkolategan and removed request for a team April 24, 2023 17:48
Copy link
Member

@srosenberg srosenberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @herkolategan, @renatolabs, and @smg260)


pkg/cmd/roachtest/cluster.go line 1354 at r1 (raw file):

		// If there's an error, it means either that the monitor command failed
		// completely, or that it found a dead node worth complaining about.
		if n.Err != nil || strings.HasPrefix(n.Msg, "dead") {

Just want to make sure this prefix is invariant? Where is its source?


pkg/cmd/roachtest/test_runner.go line 1088 at r1 (raw file):

		if err := c.assertNoDeadNode(ctx, t); err != nil {
			// Some tests expect dead nodes, so they may opt out of this check.
			if t.spec.SkipPostValidations&registry.PostValidationNoDeadNodes == 0 {

Could we log the fact that this validation was skipped?

Copy link
Contributor Author

@smg260 smg260 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @herkolategan, @renatolabs, and @srosenberg)


pkg/cmd/roachtest/cluster.go line 1354 at r1 (raw file):

Previously, srosenberg (Stan Rosenberg) wrote…

Just want to make sure this prefix is invariant? Where is its source?

yep it's part of c.Monitor

...
  if [ "${pid}" != "${lastpid}" ]; then
    if [ "${lastpid}" != 0 ]; then
      if [ "${pid}" != 0 ]; then
        # If the PID changed but neither is zero, then the status refers to
        # the new incarnation. We lost the actual exit status of the old PID.
        status="unknown"
      fi
    	echo "dead (exit status ${status})"
    fi
		if [ "${pid}" != 0 ]; then
			echo "${pid}"
    fi
    lastpid=${pid}
  fi
...

I removed the previous check which defaulted to assuming dead, because sometimes we are unable to verify the node for unrelated reasons (SSH) and end up failing an otherwise passing test.


pkg/cmd/roachtest/test_runner.go line 1088 at r1 (raw file):

Previously, srosenberg (Stan Rosenberg) wrote…

Could we log the fact that this validation was skipped?

The opt-out in this case won't actually skip reporting the status of each node, but just whether we record an error. The t.L()... below logs the case where we are skipping recording an error despite detecting a dead node.

Copy link
Member

@srosenberg srosenberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @herkolategan, @renatolabs, and @smg260)


pkg/cmd/roachtest/cluster.go line 1354 at r1 (raw file):

Previously, smg260 (Miral Gadani) wrote…

yep it's part of c.Monitor

...
  if [ "${pid}" != "${lastpid}" ]; then
    if [ "${lastpid}" != 0 ]; then
      if [ "${pid}" != 0 ]; then
        # If the PID changed but neither is zero, then the status refers to
        # the new incarnation. We lost the actual exit status of the old PID.
        status="unknown"
      fi
    	echo "dead (exit status ${status})"
    fi
		if [ "${pid}" != 0 ]; then
			echo "${pid}"
    fi
    lastpid=${pid}
  fi
...

I removed the previous check which defaulted to assuming dead, because sometimes we are unable to verify the node for unrelated reasons (SSH) and end up failing an otherwise passing test.

Nice, thanks!


pkg/cmd/roachtest/test_runner.go line 1088 at r1 (raw file):

Previously, smg260 (Miral Gadani) wrote…

The opt-out in this case won't actually skip reporting the status of each node, but just whether we record an error. The t.L()... below logs the case where we are skipping recording an error despite detecting a dead node.

Yep, I missed the line below.

@smg260
Copy link
Contributor Author

smg260 commented Apr 24, 2023

TFTR

bors r=srosenberg

@craig
Copy link
Contributor

craig bot commented Apr 24, 2023

Build succeeded:

@kvoli
Copy link
Collaborator

kvoli commented Jun 15, 2023

blathers backport 23.1

@blathers-crl
Copy link

blathers-crl bot commented Jun 15, 2023

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


error creating merge commit from bfbc704 to blathers/backport-release-23.1-102162: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 23.1 failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

roachtest: allow tests to opt out of dead node validation
4 participants