Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: wait for ranges to replicate before filling disk #78456

Merged
merged 1 commit into from
Mar 25, 2022

Conversation

nicktrav
Copy link
Collaborator

Currently, the disk-full roachtest creates a cluster and immediately
places a ballast file on one node, which causes it to crash. If this
node is the only replica for a range containing a system table, when the
node crashes due to a full disk certain system queries may not complete.
This results in the test being unable to make forward progress, as the
one dead node prevents a system query from completing, and this query
prevents the node from being restarted.

Wait for all ranges to have at least two replicas before placing the
ballast file on the one node.

Touches #78337, #78270.

Release note: None.

@nicktrav nicktrav requested review from itsbilal and jbowens March 24, 2022 22:20
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Currently, the `disk-full` roachtest creates a cluster and immediately
places a ballast file on one node, which causes it to crash. If this
node is the only replica for a range containing a system table, when the
node crashes due to a full disk certain system queries may not complete.
This results in the test being unable to make forward progress, as the
one dead node prevents a system query from completing, and this query
prevents the node from being restarted.

Wait for all ranges to have at least two replicas before placing the
ballast file on the one node.

Touches cockroachdb#78337, cockroachdb#78270.

Release note: None.
@nicktrav nicktrav force-pushed the nickt.disk-full-replication-wait branch from 9492efd to b2900ab Compare March 25, 2022 16:44
@nicktrav
Copy link
Collaborator Author

TFTR!

bors r=tbg

@craig
Copy link
Contributor

craig bot commented Mar 25, 2022

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Mar 25, 2022

Build succeeded:

@craig craig bot merged commit 2635dc1 into cockroachdb:master Mar 25, 2022
@nicktrav nicktrav deleted the nickt.disk-full-replication-wait branch March 26, 2022 00:27
nicktrav added a commit to nicktrav/cockroach that referenced this pull request Mar 28, 2022
Improve on cockroachdb#78456 by waiting fro 3x replication, rather than 2x.

Release note: None.
craig bot pushed a commit that referenced this pull request Mar 28, 2022
76516: server: improve visibility of ranges that fail to move during decommissioning r=knz,aayushshah15 a=cameronnunez

Fixes #76249. Informs #74158.

This patch makes it so that when a decommission is slow or stalls, the 
descriptions of some "stuck" replicas are printed to the operator.

Release note (cli change): if decommissioning is slow or stalls, decommissioning
replicas are printed to the operator.

Release justification: low risk, high benefit changes to existing functionality

78433: ci: report failure to generate code as a test failure to teamcity r=rail a=rickystewart

Closes #78368.

Release note: None

78495: cluster: Revert "cluster: use WaitConditionNextExit" r=otan a=rickystewart

This reverts commit 6543749.
That commit was an (unsuccessful) attempt to fix #58955, and in the
presence of this change the `acceptance` tests are very likely to hang
forever under Ubuntu 20.04 due to a race condition where the container
exits before we begin waiting on it.

Release note: None

78525: dev: add `ui clean` subcommand, update `Makefile` to point to `dev` r=rail a=rickystewart

Add `ui clean` and `ui clean --all`; the former does approximately the
same as `make ui-clean`, the latter does approximately the same as
`make ui-maintainer-clean`.

Release note: None

78561: sqlsmith: fix deadlock during generation r=rharding6373 a=rharding6373

Fixes a deadlock when sqlsmith generates join expressions.

Fixes: #78555

Release note: None

78602: roachpb,kvserver: tweak ReplicaUnavailableError rendering r=erikgrinaker a=tbg

Release justification: minor UX tweakto an error message
Release note: None


78617: roachtest: wait for 3x replication in disk-full r=tbg a=nicktrav

Improve on #78456 by waiting fro 3x replication, rather than 2x.

Release note: None.

78629: ci: make sure we're streaming test output when `stress`ing via Bazel r=mari-crl a=rickystewart

Without this, you don't see the helpful "ran X iterations so far, Y
failures" messages from `stress`.

Release note: None

Co-authored-by: Cameron Nunez <[email protected]>
Co-authored-by: Ricky Stewart <[email protected]>
Co-authored-by: rharding6373 <[email protected]>
Co-authored-by: Tobias Grieger <[email protected]>
Co-authored-by: Nick Travers <[email protected]>
nicktrav added a commit to nicktrav/cockroach that referenced this pull request Mar 28, 2022
Improve on cockroachdb#78456 by waiting fro 3x replication, rather than 2x.

Release note: None.
nicktrav added a commit that referenced this pull request Mar 28, 2022
Improve on #78456 by waiting fro 3x replication, rather than 2x.

Release note: None.
nicktrav added a commit that referenced this pull request Mar 29, 2022
Improve on #78456 by waiting fro 3x replication, rather than 2x.

Release note: None.
nicktrav added a commit that referenced this pull request Mar 29, 2022
Improve on #78456 by waiting fro 3x replication, rather than 2x.

Release note: None.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants