Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: increase consistency check timeout, and ignore errors #69285

Merged

Conversation

erikgrinaker
Copy link
Contributor

@erikgrinaker erikgrinaker commented Aug 24, 2021

This bumps the consistency check timeout to 5 minutes. There are
indications that a recent libpq upgrade unmasked previously ignored
context cancellation errors, caused by the timeout here being too low.
It also ignores errors during the consistency check, since it is
best-effort anyway.

Resolves #68883.

Release justification: non-production code changes
Release note: None

@erikgrinaker erikgrinaker requested review from rafiss and a team August 24, 2021 07:58
@erikgrinaker erikgrinaker self-assigned this Aug 24, 2021
@erikgrinaker erikgrinaker requested review from otan and removed request for a team August 24, 2021 07:58
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we bump it up a bit more though? It also seems as though we may want to not return an error if we hit the context cancellation. The consistency check is best-effort after all, and we don't want problems with it to post issues for an otherwise healthy test.

@@ -1278,6 +1278,7 @@ WHERE t.status NOT IN ('RANGE_CONSISTENT', 'RANGE_INDETERMINATE')`)
l.Printf("consistency check failed with %v; ignoring", err)
return nil
}
defer rows.Close()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. A quick check using the sqlclosecheck shows we have a few more cases of omitting a Close(). In many cases it likely doesn't matter because we are following up with t.Fatal(). (No need to handle them here, I'll open an issue)

@erikgrinaker erikgrinaker force-pushed the roachtest-consistency-timeout branch from 10d516f to 9d7e56f Compare August 24, 2021 11:59
@erikgrinaker erikgrinaker requested a review from a team as a code owner August 24, 2021 11:59
This bumps the consistency check timeout to 5 minutes. There are
indications that a recent libpq upgrade unmasked previously ignored
context cancellation errors, caused by the timeout here being too low.
It also ignores errors during the consistency check, since it is
best-effort anyway.

Release justification: non-production code changes
Release note: None
@erikgrinaker erikgrinaker force-pushed the roachtest-consistency-timeout branch from 9d7e56f to 6b89733 Compare August 24, 2021 12:00
@erikgrinaker erikgrinaker changed the title roachtest: increase consistency check timeout roachtest: increase consistency check timeout, and ignore errors Aug 24, 2021
@erikgrinaker
Copy link
Contributor Author

Should we bump it up a bit more though?

Sure, bumped to 5 minutes.

It also seems as though we may want to not return an error if we hit the context cancellation. The consistency check is best-effort after all, and we don't want problems with it to post issues for an otherwise healthy test.

Makes sense, used the existing convention of outputting a warning for any errors.

Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you!

@erikgrinaker
Copy link
Contributor Author

bors r=tbg

@craig
Copy link
Contributor

craig bot commented Aug 24, 2021

Build succeeded:

@craig craig bot merged commit 7c36a9d into cockroachdb:master Aug 24, 2021
@erikgrinaker erikgrinaker deleted the roachtest-consistency-timeout branch August 27, 2021 09:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

roachtest: kv/splits/nodes=3/quiesce=true failed
4 participants