-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
: TestStoreRangeRebalance failed under stress #10193
Comments
Actually, this is not #10156. |
Rather than the somewhat complicated rebalancing scenario, use a simple scenario that we perform up-replication of range 1 from 1 to 3 nodes. We check that this up-replication is performed using preemptive snapshots. The more complicated scenario was very fragile, frequently being broken by innocuous changes. Fixes cockroachdb#10497 Fixes cockroachdb#10193 Fixes cockroachdb#10156 Fixes cockroachdb#9395
Rather than the somewhat complicated rebalancing scenario, use a simple scenario that we perform up-replication of range 1 from 1 to 3 nodes. We check that this up-replication is performed using preemptive snapshots. The more complicated scenario was very fragile, frequently being broken by innocuous changes. Fixes cockroachdb#10497 Fixes cockroachdb#10193 Fixes cockroachdb#10156 Fixes cockroachdb#9395
This error message is certainly confusing. It is being generated by We see this on every node processing the raft command. That's good. Here is the log message I added:
And the output:
@andreimatei Can you take a look at this? It seems like we proposed a raft command when the lease wasn't valid. Or perhaps we proposed it but later transferred the lease making the in-flight proposal invalid. Somewhat curious, diff --git a/pkg/storage/client_raft_test.go b/pkg/storage/client_raft_test.go
index cee381c..e8e8c89 100644
--- a/pkg/storage/client_raft_test.go
+++ b/pkg/storage/client_raft_test.go
@@ -2054,6 +2054,7 @@ func TestStoreRangeRebalance(t *testing.T) {
mtc.Start(t, 6)
defer mtc.Stop()
+ stopNodeLivenessHeartbeats(mtc)
splitKey := roachpb.Key("split")
splitArgs := adminSplitArgs(roachpb.KeyMin, splitKey) And the stress invocation:
|
Will look. Just to verify - you're saying that the problem is reproducible both before and after my main change in #10420, right? |
No, this isn't reproducible after #10420 (specifically, not after 3d508a1) because it seems to be masked by another problem. Or perhaps 3d508a1 does fix the problem. I'm not sure. The question is whether this error ( |
I added some more logging and I can see that we're proposing the TransferLease when the current lease is invalid:
|
I think the problem might be that It looks possible to fix |
Rather than the somewhat complicated rebalancing scenario, use a simple scenario that we perform up-replication of range 1 from 1 to 3 nodes. We check that this up-replication is performed using preemptive snapshots. The more complicated scenario was very fragile, frequently being broken by innocuous changes. Fixes cockroachdb#10193 Fixes cockroachdb#10156 Fixes cockroachdb#9395
Right, I was just typing the same. |
Well I bet that this line has something to do with the flakiness:
Does anybody know why we need such things in this transport? What would happen if we didn't have any of this code around here that advances the clock? |
In |
Add the check that preemptive snapshots are being used to TestStoreRangeUpReplicate. Add TestReplicateQueueRebalance for testing that basic rebalancing is working. Fixes cockroachdb#10193 Fixes cockroachdb#10156 Fixes cockroachdb#9395
Add the check that preemptive snapshots are being used to TestStoreRangeUpReplicate. Add TestReplicateQueueRebalance for testing that basic rebalancing is working. Fixes cockroachdb#10193 Fixes cockroachdb#10156 Fixes cockroachdb#9395
Add the check that preemptive snapshots are being used to TestStoreRangeUpReplicate. Add TestReplicateQueueRebalance for testing that basic rebalancing is working. Fixes cockroachdb#10193 Fixes cockroachdb#10156 Fixes cockroachdb#9395
Add the check that preemptive snapshots are being used to TestStoreRangeUpReplicate. Add TestReplicateQueueRebalance for testing that basic rebalancing is working. Fixes cockroachdb#10193 Fixes cockroachdb#10156 Fixes cockroachdb#9395
Add the check that preemptive snapshots are being used to TestStoreRangeUpReplicate. Add TestReplicateQueueRebalance for testing that basic rebalancing is working. Fixes cockroachdb#10193 Fixes cockroachdb#10156 Fixes cockroachdb#9395
SHA: https://github.com/cockroachdb/cockroach/commits/ca89f456766a8f0381815e58aa7abfe5d3ece741
Stress build found a failed test:
The text was updated successfully, but these errors were encountered: