Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachprod: classify dns errors #110831

Merged
merged 2 commits into from
Sep 21, 2023

Conversation

herkolategan
Copy link
Collaborator

Previously DNS errors were not marked making it hard to classify these errors
later on. This change marks all DNS related errors in order to better report on
the errors.

roachtest then classifies these errors as infra flake to help with triaging.

Epic: None
Release Note: None

@herkolategan herkolategan requested a review from a team as a code owner September 18, 2023 16:22
@herkolategan herkolategan requested review from rachitgsrivastava, DarrylWong and renatolabs and removed request for a team September 18, 2023 16:22
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Contributor

@smg260 smg260 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @DarrylWong, @herkolategan, @rachitgsrivastava, and @renatolabs)


pkg/cmd/roachtest/github.go line 155 at r2 (raw file):

		messagePrefix = fmt.Sprintf("test %s was skipped due to ", testName)
		infraFlake = true
	case failureContainsError(firstFailure, rperrors.ErrSSH255):

These 2 cases can probably be collapsed into one. Might be good to add a test case for this. (There are already a bunch).

Also this will result in a brand new umbrella github issue for just dns_problem (if that's the way we want to go)


pkg/roachprod/vm/gce/dns.go line 271 at r1 (raw file):

// markDNSOperationError should be used to mark any external DNS API or Google
// Cloud DNS CLI errors as DNS operation errors.
func markDNSOperationError(err error) error {

nit: not sure this needs to be a separate function

Copy link
Collaborator Author

@herkolategan herkolategan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @DarrylWong, @rachitgsrivastava, @renatolabs, and @smg260)


pkg/cmd/roachtest/github.go line 155 at r2 (raw file):

Previously, smg260 (Miral Gadani) wrote…

These 2 cases can probably be collapsed into one. Might be good to add a test case for this. (There are already a bunch).

Also this will result in a brand new umbrella github issue for just dns_problem (if that's the way we want to go)

Yes, we want a new umbrella issue, as intermittent network issues seem to be causing some failures and we want to track it. I like to have the cases explicit as it's easier to reference.
I'll add a test.


pkg/roachprod/vm/gce/dns.go line 271 at r1 (raw file):

Previously, smg260 (Miral Gadani) wrote…

nit: not sure this needs to be a separate function

Probably not, but I felt like it made wrapping the errors a bit cleaner and easier to see what's going on.

@herkolategan herkolategan requested a review from smg260 September 19, 2023 13:45
smg260

This comment was marked as duplicate.

Copy link
Contributor

@smg260 smg260 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @DarrylWong, @rachitgsrivastava, and @renatolabs)

@herkolategan
Copy link
Collaborator Author

TFTR!
bors r=smg260

@herkolategan
Copy link
Collaborator Author

bors r-

@craig
Copy link
Contributor

craig bot commented Sep 20, 2023

Canceled.

Previously DNS errors were not marked making it hard to classify these errors
later on. This change marks all DNS related errors in order to better report on
the errors.

Epic: None
Release Note: None
Previously DNS errors would propagate the same way as errors that arise during a
test and be triaged through the normal route. This change is to check for errors
marked as DNS errors and classify it as an infra flake and assign it to the
test-eng team.

Epic: None
Release Note: None
@herkolategan
Copy link
Collaborator Author

bors r=smg260

@craig
Copy link
Contributor

craig bot commented Sep 21, 2023

Build succeeded:

@craig craig bot merged commit a3dccc6 into cockroachdb:master Sep 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants