Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: crash with snapshot widens existing replica, but no replica exists for subsumed key #40257

Closed
andreimatei opened this issue Aug 27, 2019 · 6 comments
Assignees
Labels
A-kv-replication Relating to Raft, consensus, and coordination. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. C-test-failure Broken test (automatically or manually discovered). S-1 High impact: many users impacted, serious risk of high unavailability or data loss

Comments

@andreimatei
Copy link
Contributor

I was stressing the repartitioning tests, as one does.
@tbg or @nvanbenschoten do you have any guesses here? If not, I'm as good as any to look, but in a few days.

make stress PKG=./pkg/ccl/partitionccl TESTS=TestRepartitioning

=== RUN   TestRepartitioning/single_col_range_partitioning/unpartitioned
....
F190827 18:54:51.894556 264384 storage/replica_raft.go:1486  [n2,s2,r192/3:/Table/94/1/{5-6}] snapshot widens existing replica, but no replica exists for subsumed key /Table/94/1/6
goroutine 264384 [running]:
github.com/cockroachdb/cockroach/pkg/util/log.getStacks(0xc000424300, 0xc0004243c0, 0x0, 0x6)
	/home/andrei/go/src/github.com/cockroachdb/cockroach/pkg/util/log/clog.go:1016 +0xb1
github.com/cockroachdb/cockroach/pkg/util/log.(*loggingT).outputLogEntry(0x70bc4e0, 0xc000000004, 0x6a3f13e, 0x17, 0x5ce, 0xc00178c800, 0x77)
	/home/andrei/go/src/github.com/cockroachdb/cockroach/pkg/util/log/clog.go:872 +0x948
github.com/cockroachdb/cockroach/pkg/util/log.addStructured(0x484fa80, 0xc0076bd710, 0xc000000004, 0x2, 0x40095f8, 0x4b, 0xc003dd4b18, 0x1, 0x1)
	/home/andrei/go/src/github.com/cockroachdb/cockroach/pkg/util/log/structured.go:66 +0x2cc
github.com/cockroachdb/cockroach/pkg/util/log.logDepth(0x484fa80, 0xc0076bd710, 0x1, 0xc000000004, 0x40095f8, 0x4b, 0xc003dd4b18, 0x1, 0x1)
	/home/andrei/go/src/github.com/cockroachdb/cockroach/pkg/util/log/log.go:69 +0x8c
github.com/cockroachdb/cockroach/pkg/util/log.Fatalf(...)
	/home/andrei/go/src/github.com/cockroachdb/cockroach/pkg/util/log/log.go:180
github.com/cockroachdb/cockroach/pkg/storage.(*Replica).maybeAcquireSnapshotMergeLock(0xc003c49800, 0x484fa80, 0xc0076bd710, 0x664b41c61ec58c54, 0xe3d23aadb1a1dd8d, 0xc0076f8f40, 0x0, 0x0, 0x0, 0xc005f08c80, ...)
	/home/andrei/go/src/github.com/cockroachdb/cockroach/pkg/storage/replica_raft.go:1486 +0x294
github.com/cockroachdb/cockroach/pkg/storage.(*Replica).handleRaftReadyRaftMuLocked(0xc003c49800, 0x484fa80, 0xc0076bd710, 0x664b41c61ec58c54, 0xe3d23aadb1a1dd8d, 0xc0076f8f40, 0x0, 0x0, 0x0, 0xc005f08c80, ...)
	/home/andrei/go/src/github.com/cockroachdb/cockroach/pkg/storage/replica_raft.go:521 +0x1893
github.com/cockroachdb/cockroach/pkg/storage.(*Store).processRaftSnapshotRequest.func1(0x484fa80, 0xc0076bd710, 0xc003c49800, 0x0)
	/home/andrei/go/src/github.com/cockroachdb/cockroach/pkg/storage/store.go:3490 +0x19d
github.com/cockroachdb/cockroach/pkg/storage.(*Store).withReplicaForRequest(0xc000ce0700, 0x484fa80, 0xc0076bd710, 0xc005f08a48, 0xc007b1f4d8, 0x0)
	/home/andrei/go/src/github.com/cockroachdb/cockroach/pkg/storage/store.go:3370 +0x150
github.com/cockroachdb/cockroach/pkg/storage.(*Store).processRaftSnapshotRequest(0xc000ce0700, 0x484fa80, 0xc005a90d80, 0xc005f08a00, 0x664b41c61ec58c54, 0xe3d23aadb1a1dd8d, 0xc0076f8f40, 0x0, 0x0, 0x0, ...)
	/home/andrei/go/src/github.com/cockroachdb/cockroach/pkg/storage/store.go:3432 +0x1a4
github.com/cockroachdb/cockroach/pkg/storage.(*Store).receiveSnapshot(0xc000ce0700, 0x484fa80, 0xc005a90d80, 0xc005f08a00, 0x7f3371250df8, 0xc004a2c870, 0x0, 0x0)
	/home/andrei/go/src/github.com/cockroachdb/cockroach/pkg/storage/store_snapshot.go:871 +0x6d3
github.com/cockroachdb/cockroach/pkg/storage.(*Store).HandleSnapshot(0xc000ce0700, 0xc005f08a00, 0x7f3371250dc8, 0xc004a2c870, 0xc004a2c870, 0x30)
	/home/andrei/go/src/github.com/cockroachdb/cockroach/pkg/storage/store.go:3241 +0x201
github.com/cockroachdb/cockroach/pkg/storage.(*RaftTransport).RaftSnapshot.func1.1(0x48a46e0, 0xc004a2c870, 0xc001294e10, 0x484fa80, 0xc005a90cc0, 0x707e7f, 0xc003d695b0)
	/home/andrei/go/src/github.com/cockroachdb/cockroach/pkg/storage/raft_transport.go:415 +0x138
github.com/cockroachdb/cockroach/pkg/storage.(*RaftTransport).RaftSnapshot.func1(0x484fa80, 0xc005a90cc0)
	/home/andrei/go/src/github.com/cockroachdb/cockroach/pkg/storage/raft_transport.go:416 +0x5d
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTask.func1(0xc001ca2480, 0x484fa80, 0xc005a90cc0, 0xc005120400, 0x32, 0x0, 0x0, 0xc005a90cf0)
	/home/andrei/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:321 +0xe6
created by github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTask
	/home/andrei/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:316 +0x131


419 runs completed, 1 failures, over 22m10s
@andreimatei andreimatei added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. C-test-failure Broken test (automatically or manually discovered). A-kv-replication Relating to Raft, consensus, and coordination. S-1 High impact: many users impacted, serious risk of high unavailability or data loss labels Aug 27, 2019
@andreimatei
Copy link
Contributor Author

andreimatei commented Aug 27, 2019 via email

@tbg
Copy link
Member

tbg commented Aug 27, 2019

Ugh. Maybe #36611? See if you can corroborate or refute this.

Also, does this repro, perhaps on a large roachprod-stress cluster? A reliable repro doubles as a guarantee that we can fix this within at most a week of effort I'd say

@andreimatei
Copy link
Contributor Author

Got another report from Darin, while he was playing around.

@tbg
Copy link
Member

tbg commented Sep 3, 2019

Just happened on master too: #40423

Now that I think about it, this might be the same as #40367. To verify this, look at the logs: the node that blows up should have been removed from the range and re-added recently, but without applying the learner snapshot (the snap would still be sent, but we wouldn't see it get applied).

@ajwerner
Copy link
Contributor

ajwerner commented Sep 4, 2019

It does indeed seem like this is #40367. It reproduces readily. In all the cases I've seen we've had a replica get a learner snapshot soon after being added and then removed.

I190903 18:32:50.297745 241003 storage/replica_command.go:1430  [n3,s3,r187/1:/Table/94/1/{5-6}] change replicas (add [(n2,s2):2LEARNER] remove []): existing descriptor r187:/Table/94/1/{5-6} [(n3,s3):1, next=2, gen=142]                                                                                                
I190903 18:32:50.312835 241003 storage/replica_raft.go:291  [n3,s3,r187/1:/Table/94/1/{5-6}] proposing ADD_REPLICA[(n2,s2):2LEARNER]: after=[(n3,s3):1 (n2,s2):2LEARNER] next=3                                                                                                                                             
I190903 18:32:50.315539 241003 storage/store_snapshot.go:995  [n3,s3,r187/1:/Table/94/1/{5-6}] sending LEARNER snapshot 1bbc9d31 at applied index 23
I190903 18:32:50.315693 241003 storage/store_snapshot.go:1038  [n3,s3,r187/1:/Table/94/1/{5-6}] streamed snapshot to (n2,s2):2LEARNER: kv pairs: 8, log entries: 0, rate-limit: 8.0 MiB/sec, 0.00s                                                                                                                          
I190903 18:32:50.316810 241498 storage/replica_raftstorage.go:793  [n2,s2,r187/2:{-}] applying LEARNER snapshot [id=1bbc9d31 index=23]
I190903 18:32:50.329484 241498 storage/replica_raftstorage.go:814  [n2,s2,r187/2:/Table/94/1/{5-6}] applied LEARNER snapshot [total=11ms ingestion=4@11ms id=1bbc9d31 index=23]                                                                                                                                             
I190903 18:32:50.330283 241003 storage/replica_command.go:1430  [n3,s3,r187/1:/Table/94/1/{5-6}] change replicas (add [(n2,s2):2] remove []): existing descriptor r187:/Table/94/1/{5-6} [(n3,s3):1, (n2,s2):2LEARNER, next=3, gen=143]                                                                                     
I190903 18:32:50.336131 241003 storage/replica_raft.go:291  [n3,s3,r187/1:/Table/94/1/{5-6}] proposing ADD_REPLICA[(n2,s2):2]: after=[(n3,s3):1 (n2,s2):2] next=3                                                                                                                                                           
E190903 18:32:50.356208 241406 storage/queue.go:1027  [n2,replicate,s2,r187/2:/Table/94/1/{5-6}] no removable replicas from range that needs a removal: [1:0, 2*:27]                                                                                                                                                        
I190903 18:32:50.469525 241975 storage/replica_command.go:1430  [n2,s2,r187/2:/Table/94/1/{5-6}] change replicas (add [] remove [(n3,s3):1]): existing descriptor r187:/Table/94/1/{5-6} [(n3,s3):1, (n2,s2):2, next=3, gen=144]                                                                                            
I190903 18:32:50.485261 241975 storage/replica_raft.go:291  [n2,s2,r187/2:/Table/94/1/{5-6}] proposing REMOVE_REPLICA[(n3,s3):1]: after=[(n2,s2):2] next=3
I190903 18:32:50.488300 241111 storage/replica_command.go:609  [n2,merge,s2,r186/2:/Table/94/1/{4-5}] initiating a merge of r187:/Table/94/1/{5-6} [(n2,s2):2, next=3, gen=145] into this range (lhs+rhs has (size=0 B+0 B qps=0.00+0.00 --> 0.00qps) below threshold (size=0 B, qps=0.00))                                 
I190903 18:32:50.593128 242769 storage/replica_command.go:609  [n2,merge,s2,r187/2:/Table/94/1/{5-6}] initiating a merge of r188:/{Table/94/1/6-Max} [(n2,s2):3, next=4, gen=150] into this range (lhs+rhs has (size=0 B+0 B qps=0.00+0.00 --> 0.00qps) below threshold (size=0 B, qps=0.00))                               
I190903 18:32:50.656216 1537 storage/store.go:2593  [n2,s2,r187/2:/Table/94/1/{5-6}] removing replica r188/3
I190903 18:32:50.668577 244165 storage/replica_command.go:1430  [n2,s2,r187/2:/{Table/94/1/5-Max}] change replicas (add [(n3,s3):3LEARNER] remove []): existing descriptor r187:/{Table/94/1/5-Max} [(n2,s2):2, next=3, gen=151]                                                                                            
I190903 18:32:50.680770 244165 storage/replica_raft.go:291  [n2,s2,r187/2:/{Table/94/1/5-Max}] proposing ADD_REPLICA[(n3,s3):3LEARNER]: after=[(n2,s2):2 (n3,s3):3LEARNER] next=4                                                                                                                                           
I190903 18:32:50.682048 244165 storage/store_snapshot.go:995  [n2,s2,r187/2:/{Table/94/1/5-Max}] sending LEARNER snapshot f9f0b80c at applied index 35
I190903 18:32:50.682316 244165 storage/store_snapshot.go:1038  [n2,s2,r187/2:/{Table/94/1/5-Max}] streamed snapshot to (n3,s3):3LEARNER: kv pairs: 41, log entries: 0, rate-limit: 8.0 MiB/sec, 0.00s                                                                                                                       
F190903 18:32:50.683220 242722 storage/replica_raft.go:1486  [n3,s3,r187/1:/Table/94/1/{5-6}] snapshot widens existing replica, but no replica exists for subsumed key /Table/94/1/6     

out.1567534743.txt
out.1567535970.txt
out.1567540180.txt

I've got more failures. I futzed around trying to come up with a fix. I posted the probably too simple #40459. It survives roachprod-stress better but eventually does seem to fail a SucceedsSoon.

@petermattis
Copy link
Collaborator

FYI, I saw this while running clearrange/checks=true. Seems like this is readily reproducible, though.

ajwerner added a commit to ajwerner/cockroach that referenced this issue Sep 23, 2019
We've seen instability recently due to invariants being violated as
replicas catch up across periods of being removed and re-added to a range.
Due to learner replicas and their rollback behavior this is now a relatively
common case. Rather than handle all of these various scenarios this PR prevents
them from occuring by actively removing replicas when we determine that they
must have been removed.

Here's a high level overview of the change:

 * Once a Replica object has a non-zero Replica.mu.replicaID it will not
   change.
   * In this commit however, if a node crashes it may forget that it learned
     about a replica ID.
 * If a raft message or snapshot addressed to a higher replica ID is received
   the current replica will be removed completely.
 * If a replica sees a ChangeReplicasTrigger which removes it then it
   completely removes itself while applying that command.
 * Replica.mu.destroyStatus is used to meaningfully signify the removal state
   of a Replica. Replicas about to be synchronously removed are in
   destroyReasonRemovalPending.

This hopefully gives us some new invariants:

 * There is only ever at most 1 Replica which IsAlive() for a range on a Store
   at a time.
 * Once a Replica has a non-zero ReplicaID is never changes.
   * This applies only to the in-memory object, not the store itself.
 * Once a Replica applies a command as a part of the range descriptor it will
   never apply another command as a different Replica ID or outside of the
   Range.
   * Corrolary: a Replica created as a learner will only ever apply commands
     while that replica is in the range.

The change also introduces some new complexity. Namely we now allow removal of
uninitialized replicas, including their hard state. This allows us to catch up
across a split even when we know the RHS must have been removed.

Fixes cockroachdb#40367.

Issue cockroachdb#38772 (comment)
manifests itself as the RHS not being found for a merge. This happens because
the Replica is processing commands to catch itself up while it is not in the
range. This is no longer possible.

Fixes cockroachdb#40257.

Issue cockroachdb#40257 is another case of a replica processing commands while it is not
in the range.

Fixes cockroachdb#40470.

Issue cockroachdb#40470 is caused by a RHS learning about its existence and removal
prior to a LHS processing a split. This case is now handled properly and is
tested.

Release justification: This commit is safe for 19.2 because it fixes release
blockers.

Release note (bug fix): Fix crashes by preventing replica ID change.
ajwerner added a commit to ajwerner/cockroach that referenced this issue Sep 23, 2019
We've seen instability recently due to invariants being violated as
replicas catch up across periods of being removed and re-added to a range.
Due to learner replicas and their rollback behavior this is now a relatively
common case. Rather than handle all of these various scenarios this PR prevents
them from occuring by actively removing replicas when we determine that they
must have been removed.

Here's a high level overview of the change:

 * Once a Replica object has a non-zero Replica.mu.replicaID it will not
   change.
   * In this commit however, if a node crashes it may forget that it learned
     about a replica ID.
 * If a raft message or snapshot addressed to a higher replica ID is received
   the current replica will be removed completely.
 * If a replica sees a ChangeReplicasTrigger which removes it then it
   completely removes itself while applying that command.
 * Replica.mu.destroyStatus is used to meaningfully signify the removal state
   of a Replica. Replicas about to be synchronously removed are in
   destroyReasonRemovalPending.

This hopefully gives us some new invariants:

 * There is only ever at most 1 Replica which IsAlive() for a range on a Store
   at a time.
 * Once a Replica has a non-zero ReplicaID is never changes.
   * This applies only to the in-memory object, not the store itself.
 * Once a Replica applies a command as a part of the range descriptor it will
   never apply another command as a different Replica ID or outside of the
   Range.
   * Corrolary: a Replica created as a learner will only ever apply commands
     while that replica is in the range.

The change also introduces some new complexity. Namely we now allow removal of
uninitialized replicas, including their hard state. This allows us to catch up
across a split even when we know the RHS must have been removed.

Fixes cockroachdb#40367.

Issue cockroachdb#38772 (comment)
manifests itself as the RHS not being found for a merge. This happens because
the Replica is processing commands to catch itself up while it is not in the
range. This is no longer possible.

Fixes cockroachdb#40257.

Issue cockroachdb#40257 is another case of a replica processing commands while it is not
in the range.

Fixes cockroachdb#40470.

Issue cockroachdb#40470 is caused by a RHS learning about its existence and removal
prior to a LHS processing a split. This case is now handled properly and is
tested.

Release justification: This commit is safe for 19.2 because it fixes release
blockers.

Release note (bug fix): Fix crashes by preventing replica ID change.
ajwerner added a commit to ajwerner/cockroach that referenced this issue Sep 23, 2019
We've seen instability recently due to invariants being violated as
replicas catch up across periods of being removed and re-added to a range.
Due to learner replicas and their rollback behavior this is now a relatively
common case. Rather than handle all of these various scenarios this PR prevents
them from occuring by actively removing replicas when we determine that they
must have been removed.

Here's a high level overview of the change:

 * Once a Replica object has a non-zero Replica.mu.replicaID it will not
   change.
   * In this commit however, if a node crashes it may forget that it learned
     about a replica ID.
 * If a raft message or snapshot addressed to a higher replica ID is received
   the current replica will be removed completely.
 * If a replica sees a ChangeReplicasTrigger which removes it then it
   completely removes itself while applying that command.
 * Replica.mu.destroyStatus is used to meaningfully signify the removal state
   of a Replica. Replicas about to be synchronously removed are in
   destroyReasonRemovalPending.

This hopefully gives us some new invariants:

 * There is only ever at most 1 Replica which IsAlive() for a range on a Store
   at a time.
 * Once a Replica has a non-zero ReplicaID is never changes.
   * This applies only to the in-memory object, not the store itself.
 * Once a Replica applies a command as a part of the range descriptor it will
   never apply another command as a different Replica ID or outside of the
   Range.
   * Corrolary: a Replica created as a learner will only ever apply commands
     while that replica is in the range.

The change also introduces some new complexity. Namely we now allow removal of
uninitialized replicas, including their hard state. This allows us to catch up
across a split even when we know the RHS must have been removed.

Fixes cockroachdb#40367.

Issue cockroachdb#38772 (comment)
manifests itself as the RHS not being found for a merge. This happens because
the Replica is processing commands to catch itself up while it is not in the
range. This is no longer possible.

Fixes cockroachdb#40257.

Issue cockroachdb#40257 is another case of a replica processing commands while it is not
in the range.

Fixes cockroachdb#40470.

Issue cockroachdb#40470 is caused by a RHS learning about its existence and removal
prior to a LHS processing a split. This case is now handled properly and is
tested.

Release justification: This commit is safe for 19.2 because it fixes release
blockers.

Release note (bug fix): Fix crashes by preventing replica ID change.
ajwerner added a commit to ajwerner/cockroach that referenced this issue Sep 23, 2019
We've seen instability recently due to invariants being violated as
replicas catch up across periods of being removed and re-added to a range.
Due to learner replicas and their rollback behavior this is now a relatively
common case. Rather than handle all of these various scenarios this PR prevents
them from occuring by actively removing replicas when we determine that they
must have been removed.

Here's a high level overview of the change:

 * Once a Replica object has a non-zero Replica.mu.replicaID it will not
   change.
   * In this commit however, if a node crashes it may forget that it learned
     about a replica ID.
 * If a raft message or snapshot addressed to a higher replica ID is received
   the current replica will be removed completely.
 * If a replica sees a ChangeReplicasTrigger which removes it then it
   completely removes itself while applying that command.
 * Replica.mu.destroyStatus is used to meaningfully signify the removal state
   of a Replica. Replicas about to be synchronously removed are in
   destroyReasonRemovalPending.

This hopefully gives us some new invariants:

 * There is only ever at most 1 Replica which IsAlive() for a range on a Store
   at a time.
 * Once a Replica has a non-zero ReplicaID is never changes.
   * This applies only to the in-memory object, not the store itself.
 * Once a Replica applies a command as a part of the range descriptor it will
   never apply another command as a different Replica ID or outside of the
   Range.
   * Corrolary: a Replica created as a learner will only ever apply commands
     while that replica is in the range.

The change also introduces some new complexity. Namely we now allow removal of
uninitialized replicas, including their hard state. This allows us to catch up
across a split even when we know the RHS must have been removed.

Fixes cockroachdb#40367.

Issue cockroachdb#38772 (comment)
manifests itself as the RHS not being found for a merge. This happens because
the Replica is processing commands to catch itself up while it is not in the
range. This is no longer possible.

Fixes cockroachdb#40257.

Issue cockroachdb#40257 is another case of a replica processing commands while it is not
in the range.

Fixes cockroachdb#40470.

Issue cockroachdb#40470 is caused by a RHS learning about its existence and removal
prior to a LHS processing a split. This case is now handled properly and is
tested.

Release justification: This commit is safe for 19.2 because it fixes release
blockers.

Release note (bug fix): Avoid internal re-use of Replica objects to fix the following crashes:

    cockroachdb#38772 "found rXXX:{-} [, next=0, gen=0?] in place of the RHS"
    cockroachdb#39796 "replica descriptor of local store not found in right hand side of split"
    cockroachdb#40470 "split trigger found right-hand side with tombstone"
    cockroachdb#40257 "snapshot widens existing replica, but no replica exists for subsumed key"
ajwerner added a commit to ajwerner/cockroach that referenced this issue Sep 23, 2019
We've seen instability recently due to invariants being violated as
replicas catch up across periods of being removed and re-added to a range.
Due to learner replicas and their rollback behavior this is now a relatively
common case. Rather than handle all of these various scenarios this PR prevents
them from occuring by actively removing replicas when we determine that they
must have been removed.

Here's a high level overview of the change:

 * Once a Replica object has a non-zero Replica.mu.replicaID it will not
   change.
   * In this commit however, if a node crashes it may forget that it learned
     about a replica ID.
 * If a raft message or snapshot addressed to a higher replica ID is received
   the current replica will be removed completely.
 * If a replica sees a ChangeReplicasTrigger which removes it then it
   completely removes itself while applying that command.
 * Replica.mu.destroyStatus is used to meaningfully signify the removal state
   of a Replica. Replicas about to be synchronously removed are in
   destroyReasonRemovalPending.

This hopefully gives us some new invariants:

 * There is only ever at most 1 Replica which IsAlive() for a range on a Store
   at a time.
 * Once a Replica has a non-zero ReplicaID is never changes.
   * This applies only to the in-memory object, not the store itself.
 * Once a Replica applies a command as a part of the range descriptor it will
   never apply another command as a different Replica ID or outside of the
   Range.
   * Corrolary: a Replica created as a learner will only ever apply commands
     while that replica is in the range.

The change also introduces some new complexity. Namely we now allow removal of
uninitialized replicas, including their hard state. This allows us to catch up
across a split even when we know the RHS must have been removed.

Fixes cockroachdb#40367.

Issue cockroachdb#38772 (comment)
manifests itself as the RHS not being found for a merge. This happens because
the Replica is processing commands to catch itself up while it is not in the
range. This is no longer possible.

Fixes cockroachdb#40257.

Issue cockroachdb#40257 is another case of a replica processing commands while it is not
in the range.

Fixes cockroachdb#40470.

Issue cockroachdb#40470 is caused by a RHS learning about its existence and removal
prior to a LHS processing a split. This case is now handled properly and is
tested.

Release justification: This commit is safe for 19.2 because it fixes release
blockers.

Release note (bug fix): Avoid internal re-use of Replica objects to fix the following crashes:

    cockroachdb#38772 "found rXXX:{-} [, next=0, gen=0?] in place of the RHS"
    cockroachdb#39796 "replica descriptor of local store not found in right hand side of split"
    cockroachdb#40470 "split trigger found right-hand side with tombstone"
    cockroachdb#40257 "snapshot widens existing replica, but no replica exists for subsumed key"
ajwerner added a commit to ajwerner/cockroach that referenced this issue Sep 23, 2019
We've seen instability recently due to invariants being violated as
replicas catch up across periods of being removed and re-added to a range.
Due to learner replicas and their rollback behavior this is now a relatively
common case. Rather than handle all of these various scenarios this PR prevents
them from occuring by actively removing replicas when we determine that they
must have been removed.

Here's a high level overview of the change:

 * Once a Replica object has a non-zero Replica.mu.replicaID it will not
   change.
   * In this commit however, if a node crashes it may forget that it learned
     about a replica ID.
 * If a raft message or snapshot addressed to a higher replica ID is received
   the current replica will be removed completely.
 * If a replica sees a ChangeReplicasTrigger which removes it then it
   completely removes itself while applying that command.
 * Replica.mu.destroyStatus is used to meaningfully signify the removal state
   of a Replica. Replicas about to be synchronously removed are in
   destroyReasonRemovalPending.

This hopefully gives us some new invariants:

 * There is only ever at most 1 Replica which IsAlive() for a range on a Store
   at a time.
 * Once a Replica has a non-zero ReplicaID is never changes.
   * This applies only to the in-memory object, not the store itself.
 * Once a Replica applies a command as a part of the range descriptor it will
   never apply another command as a different Replica ID or outside of the
   Range.
   * Corrolary: a Replica created as a learner will only ever apply commands
     while that replica is in the range.

The change also introduces some new complexity. Namely we now allow removal of
uninitialized replicas, including their hard state. This allows us to catch up
across a split even when we know the RHS must have been removed.

Fixes cockroachdb#40367.

Issue cockroachdb#38772 (comment)
manifests itself as the RHS not being found for a merge. This happens because
the Replica is processing commands to catch itself up while it is not in the
range. This is no longer possible.

Fixes cockroachdb#40257.

Issue cockroachdb#40257 is another case of a replica processing commands while it is not
in the range.

Fixes cockroachdb#40470.

Issue cockroachdb#40470 is caused by a RHS learning about its existence and removal
prior to a LHS processing a split. This case is now handled properly and is
tested.

Release justification: This commit is safe for 19.2 because it fixes release
blockers.

Release note (bug fix): Avoid internal re-use of Replica objects to fix the following crashes:

    cockroachdb#38772 "found rXXX:{-} [, next=0, gen=0?] in place of the RHS"
    cockroachdb#39796 "replica descriptor of local store not found in right hand side of split"
    cockroachdb#40470 "split trigger found right-hand side with tombstone"
    cockroachdb#40257 "snapshot widens existing replica, but no replica exists for subsumed key"
ajwerner added a commit to ajwerner/cockroach that referenced this issue Sep 23, 2019
We've seen instability recently due to invariants being violated as
replicas catch up across periods of being removed and re-added to a range.
Due to learner replicas and their rollback behavior this is now a relatively
common case. Rather than handle all of these various scenarios this PR prevents
them from occuring by actively removing replicas when we determine that they
must have been removed.

Here's a high level overview of the change:

 * Once a Replica object has a non-zero Replica.mu.replicaID it will not
   change.
   * In this commit however, if a node crashes it may forget that it learned
     about a replica ID.
 * If a raft message or snapshot addressed to a higher replica ID is received
   the current replica will be removed completely.
 * If a replica sees a ChangeReplicasTrigger which removes it then it
   completely removes itself while applying that command.
 * Replica.mu.destroyStatus is used to meaningfully signify the removal state
   of a Replica. Replicas about to be synchronously removed are in
   destroyReasonRemovalPending.

This hopefully gives us some new invariants:

 * There is only ever at most 1 Replica which IsAlive() for a range on a Store
   at a time.
 * Once a Replica has a non-zero ReplicaID is never changes.
   * This applies only to the in-memory object, not the store itself.
 * Once a Replica applies a command as a part of the range descriptor it will
   never apply another command as a different Replica ID or outside of the
   Range.
   * Corrolary: a Replica created as a learner will only ever apply commands
     while that replica is in the range.

The change also introduces some new complexity. Namely we now allow removal of
uninitialized replicas, including their hard state. This allows us to catch up
across a split even when we know the RHS must have been removed.

Fixes cockroachdb#40367.

Issue cockroachdb#38772 (comment)
manifests itself as the RHS not being found for a merge. This happens because
the Replica is processing commands to catch itself up while it is not in the
range. This is no longer possible.

Fixes cockroachdb#40257.

Issue cockroachdb#40257 is another case of a replica processing commands while it is not
in the range.

Fixes cockroachdb#40470.

Issue cockroachdb#40470 is caused by a RHS learning about its existence and removal
prior to a LHS processing a split. This case is now handled properly and is
tested.

Release justification: This commit is safe for 19.2 because it fixes release
blockers.

Release note (bug fix): Avoid internal re-use of Replica objects to fix the following crashes:

    cockroachdb#38772 "found rXXX:{-} [, next=0, gen=0?] in place of the RHS"
    cockroachdb#39796 "replica descriptor of local store not found in right hand side of split"
    cockroachdb#40470 "split trigger found right-hand side with tombstone"
    cockroachdb#40257 "snapshot widens existing replica, but no replica exists for subsumed key"
ajwerner added a commit to ajwerner/cockroach that referenced this issue Sep 24, 2019
We've seen instability recently due to invariants being violated as
replicas catch up across periods of being removed and re-added to a range.
Due to learner replicas and their rollback behavior this is now a relatively
common case. Rather than handle all of these various scenarios this PR prevents
them from occuring by actively removing replicas when we determine that they
must have been removed.

Here's a high level overview of the change:

 * Once a Replica object has a non-zero Replica.mu.replicaID it will not
   change.
   * In this commit however, if a node crashes it may forget that it learned
     about a replica ID.
 * If a raft message or snapshot addressed to a higher replica ID is received
   the current replica will be removed completely.
 * If a replica sees a ChangeReplicasTrigger which removes it then it
   completely removes itself while applying that command.
 * Replica.mu.destroyStatus is used to meaningfully signify the removal state
   of a Replica. Replicas about to be synchronously removed are in
   destroyReasonRemovalPending.

This hopefully gives us some new invariants:

 * There is only ever at most 1 Replica which IsAlive() for a range on a Store
   at a time.
 * Once a Replica has a non-zero ReplicaID is never changes.
   * This applies only to the in-memory object, not the store itself.
 * Once a Replica applies a command as a part of the range descriptor it will
   never apply another command as a different Replica ID or outside of the
   Range.
   * Corrolary: a Replica created as a learner will only ever apply commands
     while that replica is in the range.

The change also introduces some new complexity. Namely we now allow removal of
uninitialized replicas, including their hard state. This allows us to catch up
across a split even when we know the RHS must have been removed.

Fixes cockroachdb#40367.

Issue cockroachdb#38772 (comment)
manifests itself as the RHS not being found for a merge. This happens because
the Replica is processing commands to catch itself up while it is not in the
range. This is no longer possible.

Fixes cockroachdb#40257.

Issue cockroachdb#40257 is another case of a replica processing commands while it is not
in the range.

Fixes cockroachdb#40470.

Issue cockroachdb#40470 is caused by a RHS learning about its existence and removal
prior to a LHS processing a split. This case is now handled properly and is
tested.

Release justification: This commit is safe for 19.2 because it fixes release
blockers.

Release note (bug fix): Avoid internal re-use of Replica objects to fix the following crashes:

    cockroachdb#38772 "found rXXX:{-} [, next=0, gen=0?] in place of the RHS"
    cockroachdb#39796 "replica descriptor of local store not found in right hand side of split"
    cockroachdb#40470 "split trigger found right-hand side with tombstone"
    cockroachdb#40257 "snapshot widens existing replica, but no replica exists for subsumed key"
ajwerner added a commit to ajwerner/cockroach that referenced this issue Sep 24, 2019
We've seen instability recently due to invariants being violated as
replicas catch up across periods of being removed and re-added to a range.
Due to learner replicas and their rollback behavior this is now a relatively
common case. Rather than handle all of these various scenarios this PR prevents
them from occuring by actively removing replicas when we determine that they
must have been removed.

Here's a high level overview of the change:

 * Once a Replica object has a non-zero Replica.mu.replicaID it will not
   change.
   * In this commit however, if a node crashes it may forget that it learned
     about a replica ID.
 * If a raft message or snapshot addressed to a higher replica ID is received
   the current replica will be removed completely.
 * If a replica sees a ChangeReplicasTrigger which removes it then it
   completely removes itself while applying that command.
 * Replica.mu.destroyStatus is used to meaningfully signify the removal state
   of a Replica. Replicas about to be synchronously removed are in
   destroyReasonRemovalPending.

This hopefully gives us some new invariants:

 * There is only ever at most 1 Replica which IsAlive() for a range on a Store
   at a time.
 * Once a Replica has a non-zero ReplicaID is never changes.
   * This applies only to the in-memory object, not the store itself.
 * Once a Replica applies a command as a part of the range descriptor it will
   never apply another command as a different Replica ID or outside of the
   Range.
   * Corrolary: a Replica created as a learner will only ever apply commands
     while that replica is in the range.

The change also introduces some new complexity. Namely we now allow removal of
uninitialized replicas, including their hard state. This allows us to catch up
across a split even when we know the RHS must have been removed.

Fixes cockroachdb#40367.

Issue cockroachdb#38772 (comment)
manifests itself as the RHS not being found for a merge. This happens because
the Replica is processing commands to catch itself up while it is not in the
range. This is no longer possible.

Fixes cockroachdb#40257.

Issue cockroachdb#40257 is another case of a replica processing commands while it is not
in the range.

Fixes cockroachdb#40470.

Issue cockroachdb#40470 is caused by a RHS learning about its existence and removal
prior to a LHS processing a split. This case is now handled properly and is
tested.

Release justification: This commit is safe for 19.2 because it fixes release
blockers.

Release note (bug fix): Avoid internal re-use of Replica objects to fix the following crashes:

    cockroachdb#38772 "found rXXX:{-} [, next=0, gen=0?] in place of the RHS"
    cockroachdb#39796 "replica descriptor of local store not found in right hand side of split"
    cockroachdb#40470 "split trigger found right-hand side with tombstone"
    cockroachdb#40257 "snapshot widens existing replica, but no replica exists for subsumed key"
ajwerner added a commit to ajwerner/cockroach that referenced this issue Sep 24, 2019
We've seen instability recently due to invariants being violated as
replicas catch up across periods of being removed and re-added to a range.
Due to learner replicas and their rollback behavior this is now a relatively
common case. Rather than handle all of these various scenarios this PR prevents
them from occuring by actively removing replicas when we determine that they
must have been removed.

Here's a high level overview of the change:

 * Once a Replica object has a non-zero Replica.mu.replicaID it will not
   change.
   * In this commit however, if a node crashes it may forget that it learned
     about a replica ID.
 * If a raft message or snapshot addressed to a higher replica ID is received
   the current replica will be removed completely.
 * If a replica sees a ChangeReplicasTrigger which removes it then it
   completely removes itself while applying that command.
 * Replica.mu.destroyStatus is used to meaningfully signify the removal state
   of a Replica. Replicas about to be synchronously removed are in
   destroyReasonRemovalPending.

This hopefully gives us some new invariants:

 * There is only ever at most 1 Replica which IsAlive() for a range on a Store
   at a time.
 * Once a Replica has a non-zero ReplicaID is never changes.
   * This applies only to the in-memory object, not the store itself.
 * Once a Replica applies a command as a part of the range descriptor it will
   never apply another command as a different Replica ID or outside of the
   Range.
   * Corrolary: a Replica created as a learner will only ever apply commands
     while that replica is in the range.

The change also introduces some new complexity. Namely we now allow removal of
uninitialized replicas, including their hard state. This allows us to catch up
across a split even when we know the RHS must have been removed.

Fixes cockroachdb#40367.

Issue cockroachdb#38772 (comment)
manifests itself as the RHS not being found for a merge. This happens because
the Replica is processing commands to catch itself up while it is not in the
range. This is no longer possible.

Fixes cockroachdb#40257.

Issue cockroachdb#40257 is another case of a replica processing commands while it is not
in the range.

Fixes cockroachdb#40470.

Issue cockroachdb#40470 is caused by a RHS learning about its existence and removal
prior to a LHS processing a split. This case is now handled properly and is
tested.

Release justification: This commit is safe for 19.2 because it fixes release
blockers.

Release note (bug fix): Avoid internal re-use of Replica objects to fix the following crashes:

    cockroachdb#38772 "found rXXX:{-} [, next=0, gen=0?] in place of the RHS"
    cockroachdb#39796 "replica descriptor of local store not found in right hand side of split"
    cockroachdb#40470 "split trigger found right-hand side with tombstone"
    cockroachdb#40257 "snapshot widens existing replica, but no replica exists for subsumed key"
ajwerner added a commit to ajwerner/cockroach that referenced this issue Sep 24, 2019
We've seen instability recently due to invariants being violated as
replicas catch up across periods of being removed and re-added to a range.
Due to learner replicas and their rollback behavior this is now a relatively
common case. Rather than handle all of these various scenarios this PR prevents
them from occuring by actively removing replicas when we determine that they
must have been removed.

Here's a high level overview of the change:

 * Once a Replica object has a non-zero Replica.mu.replicaID it will not
   change.
   * In this commit however, if a node crashes it may forget that it learned
     about a replica ID.
 * If a raft message or snapshot addressed to a higher replica ID is received
   the current replica will be removed completely.
 * If a replica sees a ChangeReplicasTrigger which removes it then it
   completely removes itself while applying that command.
 * Replica.mu.destroyStatus is used to meaningfully signify the removal state
   of a Replica. Replicas about to be synchronously removed are in
   destroyReasonRemovalPending.

This hopefully gives us some new invariants:

 * There is only ever at most 1 Replica which IsAlive() for a range on a Store
   at a time.
 * Once a Replica has a non-zero ReplicaID is never changes.
   * This applies only to the in-memory object, not the store itself.
 * Once a Replica applies a command as a part of the range descriptor it will
   never apply another command as a different Replica ID or outside of the
   Range.
   * Corrolary: a Replica created as a learner will only ever apply commands
     while that replica is in the range.

The change also introduces some new complexity. Namely we now allow removal of
uninitialized replicas, including their hard state. This allows us to catch up
across a split even when we know the RHS must have been removed.

Fixes cockroachdb#40367.

Issue cockroachdb#38772 (comment)
manifests itself as the RHS not being found for a merge. This happens because
the Replica is processing commands to catch itself up while it is not in the
range. This is no longer possible.

Fixes cockroachdb#40257.

Issue cockroachdb#40257 is another case of a replica processing commands while it is not
in the range.

Fixes cockroachdb#40470.

Issue cockroachdb#40470 is caused by a RHS learning about its existence and removal
prior to a LHS processing a split. This case is now handled properly and is
tested.

Release justification: This commit is safe for 19.2 because it fixes release
blockers.

Release note (bug fix): Avoid internal re-use of Replica objects to fix the following crashes:

    cockroachdb#38772 "found rXXX:{-} [, next=0, gen=0?] in place of the RHS"
    cockroachdb#39796 "replica descriptor of local store not found in right hand side of split"
    cockroachdb#40470 "split trigger found right-hand side with tombstone"
    cockroachdb#40257 "snapshot widens existing replica, but no replica exists for subsumed key"
ajwerner added a commit to ajwerner/cockroach that referenced this issue Sep 24, 2019
We've seen instability recently due to invariants being violated as
replicas catch up across periods of being removed and re-added to a range.
Due to learner replicas and their rollback behavior this is now a relatively
common case. Rather than handle all of these various scenarios this PR prevents
them from occuring by actively removing replicas when we determine that they
must have been removed.

Here's a high level overview of the change:

 * Once a Replica object has a non-zero Replica.mu.replicaID it will not
   change.
   * In this commit however, if a node crashes it may forget that it learned
     about a replica ID.
 * If a raft message or snapshot addressed to a higher replica ID is received
   the current replica will be removed completely.
 * If a replica sees a ChangeReplicasTrigger which removes it then it
   completely removes itself while applying that command.
 * Replica.mu.destroyStatus is used to meaningfully signify the removal state
   of a Replica. Replicas about to be synchronously removed are in
   destroyReasonRemovalPending.

This hopefully gives us some new invariants:

 * There is only ever at most 1 Replica which IsAlive() for a range on a Store
   at a time.
 * Once a Replica has a non-zero ReplicaID is never changes.
   * This applies only to the in-memory object, not the store itself.
 * Once a Replica applies a command as a part of the range descriptor it will
   never apply another command as a different Replica ID or outside of the
   Range.
   * Corrolary: a Replica created as a learner will only ever apply commands
     while that replica is in the range.

The change also introduces some new complexity. Namely we now allow removal of
uninitialized replicas, including their hard state. This allows us to catch up
across a split even when we know the RHS must have been removed.

Fixes cockroachdb#40367.

Issue cockroachdb#38772 (comment)
manifests itself as the RHS not being found for a merge. This happens because
the Replica is processing commands to catch itself up while it is not in the
range. This is no longer possible.

Fixes cockroachdb#40257.

Issue cockroachdb#40257 is another case of a replica processing commands while it is not
in the range.

Fixes cockroachdb#40470.

Issue cockroachdb#40470 is caused by a RHS learning about its existence and removal
prior to a LHS processing a split. This case is now handled properly and is
tested.

Release justification: This commit is safe for 19.2 because it fixes release
blockers.

Release note (bug fix): Avoid internal re-use of Replica objects to fix the following crashes:

    cockroachdb#38772 "found rXXX:{-} [, next=0, gen=0?] in place of the RHS"
    cockroachdb#39796 "replica descriptor of local store not found in right hand side of split"
    cockroachdb#40470 "split trigger found right-hand side with tombstone"
    cockroachdb#40257 "snapshot widens existing replica, but no replica exists for subsumed key"
ajwerner added a commit to ajwerner/cockroach that referenced this issue Sep 24, 2019
We've seen instability recently due to invariants being violated as
replicas catch up across periods of being removed and re-added to a range.
Due to learner replicas and their rollback behavior this is now a relatively
common case. Rather than handle all of these various scenarios this PR prevents
them from occuring by actively removing replicas when we determine that they
must have been removed.

Here's a high level overview of the change:

 * Once a Replica object has a non-zero Replica.mu.replicaID it will not
   change.
   * In this commit however, if a node crashes it may forget that it learned
     about a replica ID.
 * If a raft message or snapshot addressed to a higher replica ID is received
   the current replica will be removed completely.
 * If a replica sees a ChangeReplicasTrigger which removes it then it
   completely removes itself while applying that command.
 * Replica.mu.destroyStatus is used to meaningfully signify the removal state
   of a Replica. Replicas about to be synchronously removed are in
   destroyReasonRemovalPending.

This hopefully gives us some new invariants:

 * There is only ever at most 1 Replica which IsAlive() for a range on a Store
   at a time.
 * Once a Replica has a non-zero ReplicaID is never changes.
   * This applies only to the in-memory object, not the store itself.
 * Once a Replica applies a command as a part of the range descriptor it will
   never apply another command as a different Replica ID or outside of the
   Range.
   * Corrolary: a Replica created as a learner will only ever apply commands
     while that replica is in the range.

The change also introduces some new complexity. Namely we now allow removal of
uninitialized replicas, including their hard state. This allows us to catch up
across a split even when we know the RHS must have been removed.

Fixes cockroachdb#40367.

Issue cockroachdb#38772 (comment)
manifests itself as the RHS not being found for a merge. This happens because
the Replica is processing commands to catch itself up while it is not in the
range. This is no longer possible.

Fixes cockroachdb#40257.

Issue cockroachdb#40257 is another case of a replica processing commands while it is not
in the range.

Fixes cockroachdb#40470.

Issue cockroachdb#40470 is caused by a RHS learning about its existence and removal
prior to a LHS processing a split. This case is now handled properly and is
tested.

Release justification: This commit is safe for 19.2 because it fixes release
blockers.

Release note (bug fix): Avoid internal re-use of Replica objects to fix the following crashes:

    cockroachdb#38772 "found rXXX:{-} [, next=0, gen=0?] in place of the RHS"
    cockroachdb#39796 "replica descriptor of local store not found in right hand side of split"
    cockroachdb#40470 "split trigger found right-hand side with tombstone"
    cockroachdb#40257 "snapshot widens existing replica, but no replica exists for subsumed key"
@craig craig bot closed this as completed in 2020115 Sep 24, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-replication Relating to Raft, consensus, and coordination. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. C-test-failure Broken test (automatically or manually discovered). S-1 High impact: many users impacted, serious risk of high unavailability or data loss
Projects
None yet
Development

No branches or pull requests

4 participants