-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
roachtest: replicate/wide failed #50729
Comments
This timed out in the last step - decrease the replication factor to 5 and verify the replicas per range again falls (128e2dc) |
Darin, mind if I assign it to you? Very briefly glancing at the logs, and the area of the codebase I would suspect responsible, it looks to be the allocator.
Also consider #50865, which is a bug around the split queue being wedged for some reason. I'm not sure that it has any bearing here, but just to keep on your radar. |
(roachtest).replicate/wide failed on master@a16eb55ed96239dcd288aa1c2f80f306559f0f0b:
|
(roachtest).replicate/wide failed on master@3edbe4aeb3c7300e6690cb2222a8d5c01e920bf4:
|
(roachtest).replicate/wide failed on master@ebd5c732f83009acc9c6f5859ca95e74e5453a1c:
|
(roachtest).replicate/wide failed on master@bbbedabbf6ea0b1ff6fc799a0c04a75295a9f4c2:
|
(roachtest).replicate/wide failed on master@69ffd78d5bbab0d8f77cf1f2254e8a5fcbdf902f:
|
(roachtest).replicate/wide failed on master@7425e857e62fe4280f614f9076f310322cc78649:
|
(roachtest).replicate/wide failed on master@8b91062f9351d18f9104aff567cb152df162021e:
|
Symptom: replication fails to converge after 10mn |
+cc @nvanbenschoten, @tbg for triage/routing. |
(roachtest).replicate/wide failed on master@57e160b1fcc41dd12b595953729728007fd3fbda:
|
(roachtest).replicate/wide failed on master@38115d0cc366243bcbae1658057cb0438e23565e:
|
(roachtest).replicate/wide failed on master@dc5544839735faaa04075e0d9e021ddba721f3bb:
|
This used to live in the replicate queue, but there are other entry points to replication changes, notably the store rebalancer which caused cockroachdb#54444. Move the check in the guts of replication changes where it is guaranteed to be invoked. Fixes cockroachdb#50729 Touches cockroachdb#54444 (release-20.2) Release note (bug fix): in rare situations, an automated replication change could result in a loss of quorum. This would require down nodes and a simultaneous change in the replication factor. Note that a change in the replication factor can occur automatically if the cluster is comprised of less than five available nodes. Experimentally the likeli- hood of encountering this issue, even under contrived conditions, was small.
56735: kvserverpb: move quorum safeguard into execChangeReplicasTxn r=aayushshah15 a=tbg This used to live in the replicate queue, but there are other entry points to replication changes, notably the store rebalancer which caused #54444. Move the check in the guts of replication changes where it is guaranteed to be invoked. Fixes #50729 Touches #54444 (release-20.2) @aayushshah15 only requesting your review since you're in the area. Feel free to opt out. Release note (bug fix): in rare situations, an automated replication change could result in a loss of quorum. This would require down nodes and a simultaneous change in the replication factor. Note that a change in the replication factor can occur automatically if the cluster is comprised of less than five available nodes. Experimentally the likeli- hood of encountering this issue, even under contrived conditions, was small. Co-authored-by: Tobias Grieger <[email protected]>
(roachtest).replicate/wide failed on master@8f768ad14cfb3f514db6d40465b2dd60ee1f2890:
More
Artifacts: /replicate/wide
See this test on roachdash
powered by pkg/cmd/internal/issues
The text was updated successfully, but these errors were encountered: