-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
roachtest: wait for ranges to replicate before filling disk #78456
Merged
craig
merged 1 commit into
cockroachdb:master
from
nicktrav:nickt.disk-full-replication-wait
Mar 25, 2022
Merged
roachtest: wait for ranges to replicate before filling disk #78456
craig
merged 1 commit into
cockroachdb:master
from
nicktrav:nickt.disk-full-replication-wait
Mar 25, 2022
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
tbg
reviewed
Mar 25, 2022
tbg
approved these changes
Mar 25, 2022
Currently, the `disk-full` roachtest creates a cluster and immediately places a ballast file on one node, which causes it to crash. If this node is the only replica for a range containing a system table, when the node crashes due to a full disk certain system queries may not complete. This results in the test being unable to make forward progress, as the one dead node prevents a system query from completing, and this query prevents the node from being restarted. Wait for all ranges to have at least two replicas before placing the ballast file on the one node. Touches cockroachdb#78337, cockroachdb#78270. Release note: None.
nicktrav
force-pushed
the
nickt.disk-full-replication-wait
branch
from
March 25, 2022 16:44
9492efd
to
b2900ab
Compare
This was referenced Mar 25, 2022
TFTR! bors r=tbg |
Build failed (retrying...): |
Build succeeded: |
This was referenced Mar 25, 2022
nicktrav
added a commit
to nicktrav/cockroach
that referenced
this pull request
Mar 28, 2022
Improve on cockroachdb#78456 by waiting fro 3x replication, rather than 2x. Release note: None.
craig bot
pushed a commit
that referenced
this pull request
Mar 28, 2022
76516: server: improve visibility of ranges that fail to move during decommissioning r=knz,aayushshah15 a=cameronnunez Fixes #76249. Informs #74158. This patch makes it so that when a decommission is slow or stalls, the descriptions of some "stuck" replicas are printed to the operator. Release note (cli change): if decommissioning is slow or stalls, decommissioning replicas are printed to the operator. Release justification: low risk, high benefit changes to existing functionality 78433: ci: report failure to generate code as a test failure to teamcity r=rail a=rickystewart Closes #78368. Release note: None 78495: cluster: Revert "cluster: use WaitConditionNextExit" r=otan a=rickystewart This reverts commit 6543749. That commit was an (unsuccessful) attempt to fix #58955, and in the presence of this change the `acceptance` tests are very likely to hang forever under Ubuntu 20.04 due to a race condition where the container exits before we begin waiting on it. Release note: None 78525: dev: add `ui clean` subcommand, update `Makefile` to point to `dev` r=rail a=rickystewart Add `ui clean` and `ui clean --all`; the former does approximately the same as `make ui-clean`, the latter does approximately the same as `make ui-maintainer-clean`. Release note: None 78561: sqlsmith: fix deadlock during generation r=rharding6373 a=rharding6373 Fixes a deadlock when sqlsmith generates join expressions. Fixes: #78555 Release note: None 78602: roachpb,kvserver: tweak ReplicaUnavailableError rendering r=erikgrinaker a=tbg Release justification: minor UX tweakto an error message Release note: None 78617: roachtest: wait for 3x replication in disk-full r=tbg a=nicktrav Improve on #78456 by waiting fro 3x replication, rather than 2x. Release note: None. 78629: ci: make sure we're streaming test output when `stress`ing via Bazel r=mari-crl a=rickystewart Without this, you don't see the helpful "ran X iterations so far, Y failures" messages from `stress`. Release note: None Co-authored-by: Cameron Nunez <[email protected]> Co-authored-by: Ricky Stewart <[email protected]> Co-authored-by: rharding6373 <[email protected]> Co-authored-by: Tobias Grieger <[email protected]> Co-authored-by: Nick Travers <[email protected]>
nicktrav
added a commit
to nicktrav/cockroach
that referenced
this pull request
Mar 28, 2022
Improve on cockroachdb#78456 by waiting fro 3x replication, rather than 2x. Release note: None.
nicktrav
added a commit
that referenced
this pull request
Mar 28, 2022
Improve on #78456 by waiting fro 3x replication, rather than 2x. Release note: None.
nicktrav
added a commit
that referenced
this pull request
Mar 29, 2022
Improve on #78456 by waiting fro 3x replication, rather than 2x. Release note: None.
nicktrav
added a commit
that referenced
this pull request
Mar 29, 2022
Improve on #78456 by waiting fro 3x replication, rather than 2x. Release note: None.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Currently, the
disk-full
roachtest creates a cluster and immediatelyplaces a ballast file on one node, which causes it to crash. If this
node is the only replica for a range containing a system table, when the
node crashes due to a full disk certain system queries may not complete.
This results in the test being unable to make forward progress, as the
one dead node prevents a system query from completing, and this query
prevents the node from being restarted.
Wait for all ranges to have at least two replicas before placing the
ballast file on the one node.
Touches #78337, #78270.
Release note: None.