Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: string match for transient errors as a fallback #132702

Merged
merged 1 commit into from
Oct 25, 2024

Conversation

DarrylWong
Copy link
Contributor

The require package is commonly used through roachtest to assert that no error occured. i.e. require.NoError(t, err) However, this function does not preserve the error object. This causes our transient error flake detection to not work. Since require is an upstream dependency, we cannot easily change this.

This change adds a fallback to our flake detection that string matches for the TRANSIENT_ERROR message we add. If found it will mark the error as a flake to reduce noise.

However, we have seen other cases where we do not preserve the error object but the code lives somewhere that is easily changeable for us. In those cases, we ideally should fix the code instead of resorting to this fallback.

To make sure we still do that, the fallback also explicity checks for a message that require.NoError prepends to all errors. If we find additional cases that require this fallback, we can review and add them on a case by case basis.

Fixes: #131094
Epic: none
Release note: none

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@DarrylWong DarrylWong force-pushed the transient-error-string-match branch 3 times, most recently from a7042cd to e2f6f96 Compare October 17, 2024 17:28
Copy link

blathers-crl bot commented Oct 17, 2024

Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@DarrylWong DarrylWong force-pushed the transient-error-string-match branch from e2f6f96 to 1e3f6ff Compare October 17, 2024 17:31
@DarrylWong
Copy link
Contributor Author

First two commits are part of #132795 as the newly added tests build off of it. The core logic behind this PR is in transientErrorOwnershipFallback and should be reviewable without knowing the context of the other PR.

@DarrylWong DarrylWong force-pushed the transient-error-string-match branch from 1e3f6ff to cbc9880 Compare October 17, 2024 21:26
@DarrylWong DarrylWong marked this pull request as ready for review October 17, 2024 21:31
@DarrylWong DarrylWong requested review from a team as code owners October 17, 2024 21:31
@DarrylWong DarrylWong requested review from srosenberg and vidit-bhat and removed request for a team October 17, 2024 21:31
Copy link
Member

@srosenberg srosenberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to add a (mocked) roachtest using the require package s.t. we have e2e coverage. Otherwise, LGTM!

@srosenberg srosenberg requested review from herkolategan and removed request for a team October 22, 2024 19:19
@DarrylWong DarrylWong force-pushed the transient-error-string-match branch from cbc9880 to 2c2eaf7 Compare October 24, 2024 22:28
@DarrylWong
Copy link
Contributor Author

It would be good to add a (mocked) roachtest using the require package s.t. we have e2e coverage.

Good idea, added.

The `require` package is commonly used through roachtest to
assert that no error occured. i.e. `require.NoError(t, err)`
However, this function does not preserve the error object. This
causes our transient error flake detection to not work. Since
`require` is an upstream dependency, we cannot easily change this.

This change adds a fallback to our flake detection that string
matches for the `TRANSIENT_ERROR` message we add. If found it
will mark the error as a flake to reduce noise.

However, we have seen other cases where we do not preserve the
error object but the code lives somewhere that is easily changeable for us.
In those cases, we ideally should fix the code instead of resorting
to this fallback.

To make sure we still do that, the fallback also explicity checks for
a message that `require.NoError` prepends to all errors. If we find
additional cases that require this fallback, we can review and add
them on a case by case basis.
@DarrylWong DarrylWong force-pushed the transient-error-string-match branch from 2c2eaf7 to c547767 Compare October 24, 2024 23:03
})

// Now test that if the transient error is not handled by the `require` package,
// but similarly lost due to casting to a string, the test runner *won't* mark
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Copy link
Member

@srosenberg srosenberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@DarrylWong
Copy link
Contributor Author

TFTR!

bors r=srosenberg

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

roachtest: flake detection does not work when used with require API
3 participants