Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: avoid (one) fatal error from splitPostApply #39796

Merged
merged 2 commits into from
Sep 6, 2019

Conversation

tbg
Copy link
Member

@tbg tbg commented Aug 21, 2019

This is the next band-aid on top of #39658 and #39571.
The descriptor lookup I added sometimes fails because replicas can
process a split trigger in which they're not a member of the range:

F190821 15:14:28.241623 312191 storage/store.go:2172
[n2,s2,r21/3:/{Table/54-Max}] replica descriptor of local store not
found in right hand side of split

I saw this randomly in make test PKG=./pkg/ccl/partitionccl.

@danhhz could you take a stab at replicating this in the test you added
recently? It's probably too ambitious but it'd be nice to verify this is
actually fixed (I didn't). I haven't seen this fail on master (though I also
didn't look very hard) so maybe this is rare (i.e. we have time), but then I
just got it doing nothing related in particular.

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@tbg tbg requested a review from danhhz August 21, 2019 15:42
@danhhz
Copy link
Contributor

danhhz commented Aug 22, 2019

@danhhz could you take a stab at replicating this in the test you added
recently? It's probably too ambitious but it'd be nice to verify this is
actually fixed (I didn't). I haven't seen this fail on master (though I also
didn't look very hard) so maybe this is rare (i.e. we have time), but then I
just got it doing nothing related in particular.

I wouldn't count on me getting a chance to update the test anytime soon, unfortunately. I can't say I particularly understand the subtitles involved in this bit of code yet, so if there's no hurry, I'm inclined to skip the sciencedog review and wait for someone that does understand them to look at it.

@tbg
Copy link
Member Author

tbg commented Sep 4, 2019

Slightly different crash seen by @knz in #40470

I190904 14:33:33.261425 180 server/status/runtime.go:498  [n1] runtime stats: 4.7 GiB RSS, 407 goroutines, 2.6 GiB/718 MiB/3.4 GiB GO alloc/idle/total, 1.1 GiB/1.3 GiB CGO alloc/total, 210805.4 CGO/sec, 381.5/16.5 %(u/s)time, 0.0 %gc (3x),
 223 MiB/85 MiB (r/w)net
I190904 14:33:33.964062 11702 storage/replica_raft.go:291  [n1,s1,r261/1:/Table/56{-/1}] proposing REMOVE_REPLICA[(n4,s4):3]: after=[(n1,s1):1 (n3,s3):2 (n2,s2):5] next=6
W190904 14:33:34.137913 11847 storage/replica_raft.go:105  [n1,s1,r323/1:/Table/60/1/2{633/2/…-766/4/…}] context canceled before proposing: 1 HeartbeatTxn
I190904 14:33:34.489669 12200 storage/replica_command.go:1521  [n1,replicate,s1,r247/1:/{Table/61/3-Max}] change replicas (add [] remove [(n3,s3):2]): existing descriptor r247:/{Table/61/3-Max} [(n1,s1):1, (n3,s3):2, (n2,s2):3, (n4,s4):5,
next=6, gen=20]
I190904 14:33:34.669612 11610 storage/replica_raftstorage.go:793  [n1,s1,r313/4:{-}] applying LEARNER snapshot [id=faf20096 index=15]
I190904 14:33:34.944105 11610 storage/replica_raftstorage.go:814  [n1,s1,r313/4:/Table/58/1/{2525/9…-3750/1…}] applied LEARNER snapshot [total=274ms ingestion=4@217ms id=faf20096 index=15]
I190904 14:33:35.053208 12088 storage/split_queue.go:149  [n1,split,s1,r307/1:/Table/54/1/125{3/119…-5/522…}] split saw concurrent descriptor modification; maybe retrying
W190904 14:33:35.202730 12090 storage/replica_raft.go:105  [n1,s1,r304/1:/Table/53/1/250{3/7/-…-5/3/-…}] context canceled before proposing: 1 HeartbeatTxn
I190904 14:33:35.232699 12190 storage/replica_command.go:395  [n1,split,s1,r307/1:/Table/54/1/125{3/119…-5/522…}] initiating a split of this range at key /Table/54/1/1254/11837 [r329] (77 MiB above threshold size 64 MiB)
F190904 14:33:35.276986 138 storage/store.go:2148  [n1,s1,r313/4:/Table/58/1/{2525/9…-3750/1…}] split trigger found right-hand side with tombstone {NextReplicaID:5}: [n1,s1,r316/?:{-}]
goroutine 138 [running]:
github.com/cockroachdb/cockroach/pkg/util/log.getStacks(0xc000448301, 0xc000448360, 0x0, 0x7c662a)
        /go/src/github.com/cockroachdb/cockroach/pkg/util/log/clog.go:1016 +0xb1
github.com/cockroachdb/cockroach/pkg/util/log.(*loggingT).outputLogEntry(0x7c04a40, 0xc000000004, 0x73d2595, 0x10, 0x864, 0xc0006e43c0, 0x89)
        /go/src/github.com/cockroachdb/cockroach/pkg/util/log/clog.go:874 +0x93e
github.com/cockroachdb/cockroach/pkg/util/log.addStructured(0x4e5e460, 0xc03ccc99e0, 0x4, 0x2, 0x4563081, 0x3a, 0xc003394730, 0x2, 0x2)
        /go/src/github.com/cockroachdb/cockroach/pkg/util/log/structured.go:66 +0x2cc
github.com/cockroachdb/cockroach/pkg/util/log.logDepth(0x4e5e460, 0xc03ccc99e0, 0x1, 0xc000000004, 0x4563081, 0x3a, 0xc003394730, 0x2, 0x2)
        /go/src/github.com/cockroachdb/cockroach/pkg/util/log/log.go:69 +0x8c
github.com/cockroachdb/cockroach/pkg/util/log.Fatalf(...)
        /go/src/github.com/cockroachdb/cockroach/pkg/util/log/log.go:180
github.com/cockroachdb/cockroach/pkg/storage.splitPostApply(0x4e5e460, 0xc03ccc99e0, 0x0, 0x15c142ce810fb293, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        /go/src/github.com/cockroachdb/cockroach/pkg/storage/store.go:2148 +0xb3e

@tbg tbg force-pushed the fix/split-trigger-non-member branch from 28d5d6e to 704162a Compare September 6, 2019 09:34
@tbg tbg changed the title storage: avoid fatal error from splitPostApply storage: avoid (one) fatal error from splitPostApply Sep 6, 2019
@tbg
Copy link
Member Author

tbg commented Sep 6, 2019

I added a test that reproduces this. @ajwerner, mind reviewing this now that you're dealing with similar problems?

@tbg tbg force-pushed the fix/split-trigger-non-member branch 2 times, most recently from daafca6 to 68d6486 Compare September 6, 2019 12:15
tbg added a commit to tbg/cockroach that referenced this pull request Sep 6, 2019
Without the preceding commit, this fatals as seen in cockroachdb#39796:

> F190821 15:14:28.241623 312191 storage/store.go:2172
> [n2,s2,r21/3:/{Table/54-Max}] replica descriptor of local store not
> found in right hand side of split

Note that cockroachdb#40459 will break these tests since it will not allow the
replicaID to change without an intermittent GC cycle, but we now rely on
catching up via the raft log across a removal, split, and then re-add.
At that point, the only way in which a split trigger could execute while
in non-member status is if a replica got removed but would still receive
a later split trigger. This could happen as follows:

The log is `[..., remove, split]`

a) the follower receives `split`
b) it learns that `split` is committed so that
c) it will apply both `remove` and then `split`.

a) seems possible; if the removed follower is trailing a bit, the leader
may append both entries at the same time before it (the leader) applies
the replication change that removes the follower from the leader's
config.
b) seems possible like above, at least once we do async command
application, if said append carries a commit index encompassing the
append itself.
c) will then naturally happen in today's code.

There's no way we can reasonably enact this scenario in our testing at
this point, though. The suggestion for cockroachdb#40459 is to essentially revert
this commit, so that we at least retain the coverage related to the
RHS of this test.

Release note (bug fix): Avoid crashes with message "replica descriptor
of local store not found".
Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r1, 5 of 5 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @danhhz and @tbg)


pkg/storage/store.go, line 2171 at r2 (raw file):

			// not in the exact scenario above) but it sounds like it could be
			// in general. So if it's zero, we leave it at zero and if it's not
			// zero we wind it back to one. The code below achieves this.

Can you note that this replica id of 1 is just used for in-memory Replica.mu.replicaID field and won't actually make it to raft? I was puzzled for a second by how it could be safe to just assume that you have replica id 1.

It seems from the test that 1 is a problematic value as it may overlap with an existing replica (see your test failure). What about using something else inside like math.MaxInt32?

Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @danhhz and @tbg)


pkg/storage/store.go, line 2171 at r2 (raw file):

Previously, ajwerner wrote…

Can you note that this replica id of 1 is just used for in-memory Replica.mu.replicaID field and won't actually make it to raft? I was puzzled for a second by how it could be safe to just assume that you have replica id 1.

It seems from the test that 1 is a problematic value as it may overlap with an existing replica (see your test failure). What about using something else inside like math.MaxInt32?

Actually -1 seems better than MaxInt32

This is the next band-aid on top of cockroachdb#39658 and cockroachdb#39571.
The descriptor lookup I added sometimes fails because replicas can
process a split trigger in which they're not a member of the range:

> F190821 15:14:28.241623 312191 storage/store.go:2172
> [n2,s2,r21/3:/{Table/54-Max}] replica descriptor of local store not
> found in right hand side of split

I saw this randomly in `make test PKG=./pkg/ccl/partitionccl`.

There is one more bug here that is not addressed by this change,
namely the fact that there might be a tombstone for the right
hand side. This continues to be tracked in cockroachdb#40470.

See also
cockroachdb#40367 (comment)
for more work we need to do to establish sanity.

Release note: None
Without the preceding commit, this fatals as seen in cockroachdb#39796:

> F190821 15:14:28.241623 312191 storage/store.go:2172
> [n2,s2,r21/3:/{Table/54-Max}] replica descriptor of local store not
> found in right hand side of split

Note that cockroachdb#40459 will break these tests since it will not allow the
replicaID to change without an intermittent GC cycle, but we now rely on
catching up via the raft log across a removal, split, and then re-add.
At that point, the only way in which a split trigger could execute while
in non-member status is if a replica got removed but would still receive
a later split trigger. This could happen as follows:

The log is `[..., remove, split]`

a) the follower receives `split`
b) it learns that `split` is committed so that
c) it will apply both `remove` and then `split`.

a) seems possible; if the removed follower is trailing a bit, the leader
may append both entries at the same time before it (the leader) applies
the replication change that removes the follower from the leader's
config.
b) seems possible like above, at least once we do async command
application, if said append carries a commit index encompassing the
append itself.
c) will then naturally happen in today's code.

There's no way we can reasonably enact this scenario in our testing at
this point, though. The suggestion for cockroachdb#40459 is to essentially revert
this commit, so that we at least retain the coverage related to the
RHS of this test.

Release note (bug fix): Avoid crashes with message "replica descriptor
of local store not found".
@tbg tbg force-pushed the fix/split-trigger-non-member branch from 68d6486 to 4cdf0f9 Compare September 6, 2019 15:27
Copy link
Member Author

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ajwerner could you give this another look and bors it for me? You should be able to push to the branch in case you're finding it necessary to amend

Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: that explanation is clearer. I look forward to adding the assertion back.

bors r+

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner, @danhhz, and @tbg)

@craig
Copy link
Contributor

craig bot commented Sep 6, 2019

Build failed

@ajwerner
Copy link
Contributor

ajwerner commented Sep 6, 2019

flaked on #39321

bors r+

craig bot pushed a commit that referenced this pull request Sep 6, 2019
39796: storage: avoid (one) fatal error from splitPostApply r=ajwerner a=tbg

This is the next band-aid on top of #39658 and #39571.
The descriptor lookup I added sometimes fails because replicas can
process a split trigger in which they're not a member of the range:

> F190821 15:14:28.241623 312191 storage/store.go:2172
> [n2,s2,r21/3:/{Table/54-Max}] replica descriptor of local store not
> found in right hand side of split

I saw this randomly in `make test PKG=./pkg/ccl/partitionccl`.

@danhhz could you take a stab at replicating this in the test you added
recently? It's probably too ambitious but it'd be nice to verify this is
actually fixed (I didn't). I haven't seen this fail on master (though I also
didn't look very hard) so maybe this is rare (i.e. we have time), but then I
just got it doing nothing related in particular.

Release note: None

Co-authored-by: Tobias Schottdorf <[email protected]>
@craig
Copy link
Contributor

craig bot commented Sep 6, 2019

Build succeeded

@craig craig bot merged commit 4cdf0f9 into cockroachdb:master Sep 6, 2019
ajwerner added a commit to ajwerner/cockroach that referenced this pull request Sep 23, 2019
We've seen instability recently due to invariants being violated as
replicas catch up across periods of being removed and re-added to a range.
Due to learner replicas and their rollback behavior this is now a relatively
common case. Rather than handle all of these various scenarios this PR prevents
them from occuring by actively removing replicas when we determine that they
must have been removed.

Here's a high level overview of the change:

 * Once a Replica object has a non-zero Replica.mu.replicaID it will not
   change.
   * In this commit however, if a node crashes it may forget that it learned
     about a replica ID.
 * If a raft message or snapshot addressed to a higher replica ID is received
   the current replica will be removed completely.
 * If a replica sees a ChangeReplicasTrigger which removes it then it
   completely removes itself while applying that command.
 * Replica.mu.destroyStatus is used to meaningfully signify the removal state
   of a Replica. Replicas about to be synchronously removed are in
   destroyReasonRemovalPending.

This hopefully gives us some new invariants:

 * There is only ever at most 1 Replica which IsAlive() for a range on a Store
   at a time.
 * Once a Replica has a non-zero ReplicaID is never changes.
   * This applies only to the in-memory object, not the store itself.
 * Once a Replica applies a command as a part of the range descriptor it will
   never apply another command as a different Replica ID or outside of the
   Range.
   * Corrolary: a Replica created as a learner will only ever apply commands
     while that replica is in the range.

The change also introduces some new complexity. Namely we now allow removal of
uninitialized replicas, including their hard state. This allows us to catch up
across a split even when we know the RHS must have been removed.

Fixes cockroachdb#40367.

Issue cockroachdb#38772 (comment)
manifests itself as the RHS not being found for a merge. This happens because
the Replica is processing commands to catch itself up while it is not in the
range. This is no longer possible.

Fixes cockroachdb#40257.

Issue cockroachdb#40257 is another case of a replica processing commands while it is not
in the range.

Fixes cockroachdb#40470.

Issue cockroachdb#40470 is caused by a RHS learning about its existence and removal
prior to a LHS processing a split. This case is now handled properly and is
tested.

Release justification: This commit is safe for 19.2 because it fixes release
blockers.

Release note (bug fix): Avoid internal re-use of Replica objects to fix the following crashes:

    cockroachdb#38772 "found rXXX:{-} [, next=0, gen=0?] in place of the RHS"
    cockroachdb#39796 "replica descriptor of local store not found in right hand side of split"
    cockroachdb#40470 "split trigger found right-hand side with tombstone"
    cockroachdb#40257 "snapshot widens existing replica, but no replica exists for subsumed key"
ajwerner added a commit to ajwerner/cockroach that referenced this pull request Sep 23, 2019
We've seen instability recently due to invariants being violated as
replicas catch up across periods of being removed and re-added to a range.
Due to learner replicas and their rollback behavior this is now a relatively
common case. Rather than handle all of these various scenarios this PR prevents
them from occuring by actively removing replicas when we determine that they
must have been removed.

Here's a high level overview of the change:

 * Once a Replica object has a non-zero Replica.mu.replicaID it will not
   change.
   * In this commit however, if a node crashes it may forget that it learned
     about a replica ID.
 * If a raft message or snapshot addressed to a higher replica ID is received
   the current replica will be removed completely.
 * If a replica sees a ChangeReplicasTrigger which removes it then it
   completely removes itself while applying that command.
 * Replica.mu.destroyStatus is used to meaningfully signify the removal state
   of a Replica. Replicas about to be synchronously removed are in
   destroyReasonRemovalPending.

This hopefully gives us some new invariants:

 * There is only ever at most 1 Replica which IsAlive() for a range on a Store
   at a time.
 * Once a Replica has a non-zero ReplicaID is never changes.
   * This applies only to the in-memory object, not the store itself.
 * Once a Replica applies a command as a part of the range descriptor it will
   never apply another command as a different Replica ID or outside of the
   Range.
   * Corrolary: a Replica created as a learner will only ever apply commands
     while that replica is in the range.

The change also introduces some new complexity. Namely we now allow removal of
uninitialized replicas, including their hard state. This allows us to catch up
across a split even when we know the RHS must have been removed.

Fixes cockroachdb#40367.

Issue cockroachdb#38772 (comment)
manifests itself as the RHS not being found for a merge. This happens because
the Replica is processing commands to catch itself up while it is not in the
range. This is no longer possible.

Fixes cockroachdb#40257.

Issue cockroachdb#40257 is another case of a replica processing commands while it is not
in the range.

Fixes cockroachdb#40470.

Issue cockroachdb#40470 is caused by a RHS learning about its existence and removal
prior to a LHS processing a split. This case is now handled properly and is
tested.

Release justification: This commit is safe for 19.2 because it fixes release
blockers.

Release note (bug fix): Avoid internal re-use of Replica objects to fix the following crashes:

    cockroachdb#38772 "found rXXX:{-} [, next=0, gen=0?] in place of the RHS"
    cockroachdb#39796 "replica descriptor of local store not found in right hand side of split"
    cockroachdb#40470 "split trigger found right-hand side with tombstone"
    cockroachdb#40257 "snapshot widens existing replica, but no replica exists for subsumed key"
ajwerner added a commit to ajwerner/cockroach that referenced this pull request Sep 23, 2019
We've seen instability recently due to invariants being violated as
replicas catch up across periods of being removed and re-added to a range.
Due to learner replicas and their rollback behavior this is now a relatively
common case. Rather than handle all of these various scenarios this PR prevents
them from occuring by actively removing replicas when we determine that they
must have been removed.

Here's a high level overview of the change:

 * Once a Replica object has a non-zero Replica.mu.replicaID it will not
   change.
   * In this commit however, if a node crashes it may forget that it learned
     about a replica ID.
 * If a raft message or snapshot addressed to a higher replica ID is received
   the current replica will be removed completely.
 * If a replica sees a ChangeReplicasTrigger which removes it then it
   completely removes itself while applying that command.
 * Replica.mu.destroyStatus is used to meaningfully signify the removal state
   of a Replica. Replicas about to be synchronously removed are in
   destroyReasonRemovalPending.

This hopefully gives us some new invariants:

 * There is only ever at most 1 Replica which IsAlive() for a range on a Store
   at a time.
 * Once a Replica has a non-zero ReplicaID is never changes.
   * This applies only to the in-memory object, not the store itself.
 * Once a Replica applies a command as a part of the range descriptor it will
   never apply another command as a different Replica ID or outside of the
   Range.
   * Corrolary: a Replica created as a learner will only ever apply commands
     while that replica is in the range.

The change also introduces some new complexity. Namely we now allow removal of
uninitialized replicas, including their hard state. This allows us to catch up
across a split even when we know the RHS must have been removed.

Fixes cockroachdb#40367.

Issue cockroachdb#38772 (comment)
manifests itself as the RHS not being found for a merge. This happens because
the Replica is processing commands to catch itself up while it is not in the
range. This is no longer possible.

Fixes cockroachdb#40257.

Issue cockroachdb#40257 is another case of a replica processing commands while it is not
in the range.

Fixes cockroachdb#40470.

Issue cockroachdb#40470 is caused by a RHS learning about its existence and removal
prior to a LHS processing a split. This case is now handled properly and is
tested.

Release justification: This commit is safe for 19.2 because it fixes release
blockers.

Release note (bug fix): Avoid internal re-use of Replica objects to fix the following crashes:

    cockroachdb#38772 "found rXXX:{-} [, next=0, gen=0?] in place of the RHS"
    cockroachdb#39796 "replica descriptor of local store not found in right hand side of split"
    cockroachdb#40470 "split trigger found right-hand side with tombstone"
    cockroachdb#40257 "snapshot widens existing replica, but no replica exists for subsumed key"
ajwerner added a commit to ajwerner/cockroach that referenced this pull request Sep 23, 2019
We've seen instability recently due to invariants being violated as
replicas catch up across periods of being removed and re-added to a range.
Due to learner replicas and their rollback behavior this is now a relatively
common case. Rather than handle all of these various scenarios this PR prevents
them from occuring by actively removing replicas when we determine that they
must have been removed.

Here's a high level overview of the change:

 * Once a Replica object has a non-zero Replica.mu.replicaID it will not
   change.
   * In this commit however, if a node crashes it may forget that it learned
     about a replica ID.
 * If a raft message or snapshot addressed to a higher replica ID is received
   the current replica will be removed completely.
 * If a replica sees a ChangeReplicasTrigger which removes it then it
   completely removes itself while applying that command.
 * Replica.mu.destroyStatus is used to meaningfully signify the removal state
   of a Replica. Replicas about to be synchronously removed are in
   destroyReasonRemovalPending.

This hopefully gives us some new invariants:

 * There is only ever at most 1 Replica which IsAlive() for a range on a Store
   at a time.
 * Once a Replica has a non-zero ReplicaID is never changes.
   * This applies only to the in-memory object, not the store itself.
 * Once a Replica applies a command as a part of the range descriptor it will
   never apply another command as a different Replica ID or outside of the
   Range.
   * Corrolary: a Replica created as a learner will only ever apply commands
     while that replica is in the range.

The change also introduces some new complexity. Namely we now allow removal of
uninitialized replicas, including their hard state. This allows us to catch up
across a split even when we know the RHS must have been removed.

Fixes cockroachdb#40367.

Issue cockroachdb#38772 (comment)
manifests itself as the RHS not being found for a merge. This happens because
the Replica is processing commands to catch itself up while it is not in the
range. This is no longer possible.

Fixes cockroachdb#40257.

Issue cockroachdb#40257 is another case of a replica processing commands while it is not
in the range.

Fixes cockroachdb#40470.

Issue cockroachdb#40470 is caused by a RHS learning about its existence and removal
prior to a LHS processing a split. This case is now handled properly and is
tested.

Release justification: This commit is safe for 19.2 because it fixes release
blockers.

Release note (bug fix): Avoid internal re-use of Replica objects to fix the following crashes:

    cockroachdb#38772 "found rXXX:{-} [, next=0, gen=0?] in place of the RHS"
    cockroachdb#39796 "replica descriptor of local store not found in right hand side of split"
    cockroachdb#40470 "split trigger found right-hand side with tombstone"
    cockroachdb#40257 "snapshot widens existing replica, but no replica exists for subsumed key"
ajwerner added a commit to ajwerner/cockroach that referenced this pull request Sep 24, 2019
We've seen instability recently due to invariants being violated as
replicas catch up across periods of being removed and re-added to a range.
Due to learner replicas and their rollback behavior this is now a relatively
common case. Rather than handle all of these various scenarios this PR prevents
them from occuring by actively removing replicas when we determine that they
must have been removed.

Here's a high level overview of the change:

 * Once a Replica object has a non-zero Replica.mu.replicaID it will not
   change.
   * In this commit however, if a node crashes it may forget that it learned
     about a replica ID.
 * If a raft message or snapshot addressed to a higher replica ID is received
   the current replica will be removed completely.
 * If a replica sees a ChangeReplicasTrigger which removes it then it
   completely removes itself while applying that command.
 * Replica.mu.destroyStatus is used to meaningfully signify the removal state
   of a Replica. Replicas about to be synchronously removed are in
   destroyReasonRemovalPending.

This hopefully gives us some new invariants:

 * There is only ever at most 1 Replica which IsAlive() for a range on a Store
   at a time.
 * Once a Replica has a non-zero ReplicaID is never changes.
   * This applies only to the in-memory object, not the store itself.
 * Once a Replica applies a command as a part of the range descriptor it will
   never apply another command as a different Replica ID or outside of the
   Range.
   * Corrolary: a Replica created as a learner will only ever apply commands
     while that replica is in the range.

The change also introduces some new complexity. Namely we now allow removal of
uninitialized replicas, including their hard state. This allows us to catch up
across a split even when we know the RHS must have been removed.

Fixes cockroachdb#40367.

Issue cockroachdb#38772 (comment)
manifests itself as the RHS not being found for a merge. This happens because
the Replica is processing commands to catch itself up while it is not in the
range. This is no longer possible.

Fixes cockroachdb#40257.

Issue cockroachdb#40257 is another case of a replica processing commands while it is not
in the range.

Fixes cockroachdb#40470.

Issue cockroachdb#40470 is caused by a RHS learning about its existence and removal
prior to a LHS processing a split. This case is now handled properly and is
tested.

Release justification: This commit is safe for 19.2 because it fixes release
blockers.

Release note (bug fix): Avoid internal re-use of Replica objects to fix the following crashes:

    cockroachdb#38772 "found rXXX:{-} [, next=0, gen=0?] in place of the RHS"
    cockroachdb#39796 "replica descriptor of local store not found in right hand side of split"
    cockroachdb#40470 "split trigger found right-hand side with tombstone"
    cockroachdb#40257 "snapshot widens existing replica, but no replica exists for subsumed key"
ajwerner added a commit to ajwerner/cockroach that referenced this pull request Sep 24, 2019
We've seen instability recently due to invariants being violated as
replicas catch up across periods of being removed and re-added to a range.
Due to learner replicas and their rollback behavior this is now a relatively
common case. Rather than handle all of these various scenarios this PR prevents
them from occuring by actively removing replicas when we determine that they
must have been removed.

Here's a high level overview of the change:

 * Once a Replica object has a non-zero Replica.mu.replicaID it will not
   change.
   * In this commit however, if a node crashes it may forget that it learned
     about a replica ID.
 * If a raft message or snapshot addressed to a higher replica ID is received
   the current replica will be removed completely.
 * If a replica sees a ChangeReplicasTrigger which removes it then it
   completely removes itself while applying that command.
 * Replica.mu.destroyStatus is used to meaningfully signify the removal state
   of a Replica. Replicas about to be synchronously removed are in
   destroyReasonRemovalPending.

This hopefully gives us some new invariants:

 * There is only ever at most 1 Replica which IsAlive() for a range on a Store
   at a time.
 * Once a Replica has a non-zero ReplicaID is never changes.
   * This applies only to the in-memory object, not the store itself.
 * Once a Replica applies a command as a part of the range descriptor it will
   never apply another command as a different Replica ID or outside of the
   Range.
   * Corrolary: a Replica created as a learner will only ever apply commands
     while that replica is in the range.

The change also introduces some new complexity. Namely we now allow removal of
uninitialized replicas, including their hard state. This allows us to catch up
across a split even when we know the RHS must have been removed.

Fixes cockroachdb#40367.

Issue cockroachdb#38772 (comment)
manifests itself as the RHS not being found for a merge. This happens because
the Replica is processing commands to catch itself up while it is not in the
range. This is no longer possible.

Fixes cockroachdb#40257.

Issue cockroachdb#40257 is another case of a replica processing commands while it is not
in the range.

Fixes cockroachdb#40470.

Issue cockroachdb#40470 is caused by a RHS learning about its existence and removal
prior to a LHS processing a split. This case is now handled properly and is
tested.

Release justification: This commit is safe for 19.2 because it fixes release
blockers.

Release note (bug fix): Avoid internal re-use of Replica objects to fix the following crashes:

    cockroachdb#38772 "found rXXX:{-} [, next=0, gen=0?] in place of the RHS"
    cockroachdb#39796 "replica descriptor of local store not found in right hand side of split"
    cockroachdb#40470 "split trigger found right-hand side with tombstone"
    cockroachdb#40257 "snapshot widens existing replica, but no replica exists for subsumed key"
ajwerner added a commit to ajwerner/cockroach that referenced this pull request Sep 24, 2019
We've seen instability recently due to invariants being violated as
replicas catch up across periods of being removed and re-added to a range.
Due to learner replicas and their rollback behavior this is now a relatively
common case. Rather than handle all of these various scenarios this PR prevents
them from occuring by actively removing replicas when we determine that they
must have been removed.

Here's a high level overview of the change:

 * Once a Replica object has a non-zero Replica.mu.replicaID it will not
   change.
   * In this commit however, if a node crashes it may forget that it learned
     about a replica ID.
 * If a raft message or snapshot addressed to a higher replica ID is received
   the current replica will be removed completely.
 * If a replica sees a ChangeReplicasTrigger which removes it then it
   completely removes itself while applying that command.
 * Replica.mu.destroyStatus is used to meaningfully signify the removal state
   of a Replica. Replicas about to be synchronously removed are in
   destroyReasonRemovalPending.

This hopefully gives us some new invariants:

 * There is only ever at most 1 Replica which IsAlive() for a range on a Store
   at a time.
 * Once a Replica has a non-zero ReplicaID is never changes.
   * This applies only to the in-memory object, not the store itself.
 * Once a Replica applies a command as a part of the range descriptor it will
   never apply another command as a different Replica ID or outside of the
   Range.
   * Corrolary: a Replica created as a learner will only ever apply commands
     while that replica is in the range.

The change also introduces some new complexity. Namely we now allow removal of
uninitialized replicas, including their hard state. This allows us to catch up
across a split even when we know the RHS must have been removed.

Fixes cockroachdb#40367.

Issue cockroachdb#38772 (comment)
manifests itself as the RHS not being found for a merge. This happens because
the Replica is processing commands to catch itself up while it is not in the
range. This is no longer possible.

Fixes cockroachdb#40257.

Issue cockroachdb#40257 is another case of a replica processing commands while it is not
in the range.

Fixes cockroachdb#40470.

Issue cockroachdb#40470 is caused by a RHS learning about its existence and removal
prior to a LHS processing a split. This case is now handled properly and is
tested.

Release justification: This commit is safe for 19.2 because it fixes release
blockers.

Release note (bug fix): Avoid internal re-use of Replica objects to fix the following crashes:

    cockroachdb#38772 "found rXXX:{-} [, next=0, gen=0?] in place of the RHS"
    cockroachdb#39796 "replica descriptor of local store not found in right hand side of split"
    cockroachdb#40470 "split trigger found right-hand side with tombstone"
    cockroachdb#40257 "snapshot widens existing replica, but no replica exists for subsumed key"
ajwerner added a commit to ajwerner/cockroach that referenced this pull request Sep 24, 2019
We've seen instability recently due to invariants being violated as
replicas catch up across periods of being removed and re-added to a range.
Due to learner replicas and their rollback behavior this is now a relatively
common case. Rather than handle all of these various scenarios this PR prevents
them from occuring by actively removing replicas when we determine that they
must have been removed.

Here's a high level overview of the change:

 * Once a Replica object has a non-zero Replica.mu.replicaID it will not
   change.
   * In this commit however, if a node crashes it may forget that it learned
     about a replica ID.
 * If a raft message or snapshot addressed to a higher replica ID is received
   the current replica will be removed completely.
 * If a replica sees a ChangeReplicasTrigger which removes it then it
   completely removes itself while applying that command.
 * Replica.mu.destroyStatus is used to meaningfully signify the removal state
   of a Replica. Replicas about to be synchronously removed are in
   destroyReasonRemovalPending.

This hopefully gives us some new invariants:

 * There is only ever at most 1 Replica which IsAlive() for a range on a Store
   at a time.
 * Once a Replica has a non-zero ReplicaID is never changes.
   * This applies only to the in-memory object, not the store itself.
 * Once a Replica applies a command as a part of the range descriptor it will
   never apply another command as a different Replica ID or outside of the
   Range.
   * Corrolary: a Replica created as a learner will only ever apply commands
     while that replica is in the range.

The change also introduces some new complexity. Namely we now allow removal of
uninitialized replicas, including their hard state. This allows us to catch up
across a split even when we know the RHS must have been removed.

Fixes cockroachdb#40367.

Issue cockroachdb#38772 (comment)
manifests itself as the RHS not being found for a merge. This happens because
the Replica is processing commands to catch itself up while it is not in the
range. This is no longer possible.

Fixes cockroachdb#40257.

Issue cockroachdb#40257 is another case of a replica processing commands while it is not
in the range.

Fixes cockroachdb#40470.

Issue cockroachdb#40470 is caused by a RHS learning about its existence and removal
prior to a LHS processing a split. This case is now handled properly and is
tested.

Release justification: This commit is safe for 19.2 because it fixes release
blockers.

Release note (bug fix): Avoid internal re-use of Replica objects to fix the following crashes:

    cockroachdb#38772 "found rXXX:{-} [, next=0, gen=0?] in place of the RHS"
    cockroachdb#39796 "replica descriptor of local store not found in right hand side of split"
    cockroachdb#40470 "split trigger found right-hand side with tombstone"
    cockroachdb#40257 "snapshot widens existing replica, but no replica exists for subsumed key"
ajwerner added a commit to ajwerner/cockroach that referenced this pull request Sep 24, 2019
We've seen instability recently due to invariants being violated as
replicas catch up across periods of being removed and re-added to a range.
Due to learner replicas and their rollback behavior this is now a relatively
common case. Rather than handle all of these various scenarios this PR prevents
them from occuring by actively removing replicas when we determine that they
must have been removed.

Here's a high level overview of the change:

 * Once a Replica object has a non-zero Replica.mu.replicaID it will not
   change.
   * In this commit however, if a node crashes it may forget that it learned
     about a replica ID.
 * If a raft message or snapshot addressed to a higher replica ID is received
   the current replica will be removed completely.
 * If a replica sees a ChangeReplicasTrigger which removes it then it
   completely removes itself while applying that command.
 * Replica.mu.destroyStatus is used to meaningfully signify the removal state
   of a Replica. Replicas about to be synchronously removed are in
   destroyReasonRemovalPending.

This hopefully gives us some new invariants:

 * There is only ever at most 1 Replica which IsAlive() for a range on a Store
   at a time.
 * Once a Replica has a non-zero ReplicaID is never changes.
   * This applies only to the in-memory object, not the store itself.
 * Once a Replica applies a command as a part of the range descriptor it will
   never apply another command as a different Replica ID or outside of the
   Range.
   * Corrolary: a Replica created as a learner will only ever apply commands
     while that replica is in the range.

The change also introduces some new complexity. Namely we now allow removal of
uninitialized replicas, including their hard state. This allows us to catch up
across a split even when we know the RHS must have been removed.

Fixes cockroachdb#40367.

Issue cockroachdb#38772 (comment)
manifests itself as the RHS not being found for a merge. This happens because
the Replica is processing commands to catch itself up while it is not in the
range. This is no longer possible.

Fixes cockroachdb#40257.

Issue cockroachdb#40257 is another case of a replica processing commands while it is not
in the range.

Fixes cockroachdb#40470.

Issue cockroachdb#40470 is caused by a RHS learning about its existence and removal
prior to a LHS processing a split. This case is now handled properly and is
tested.

Release justification: This commit is safe for 19.2 because it fixes release
blockers.

Release note (bug fix): Avoid internal re-use of Replica objects to fix the following crashes:

    cockroachdb#38772 "found rXXX:{-} [, next=0, gen=0?] in place of the RHS"
    cockroachdb#39796 "replica descriptor of local store not found in right hand side of split"
    cockroachdb#40470 "split trigger found right-hand side with tombstone"
    cockroachdb#40257 "snapshot widens existing replica, but no replica exists for subsumed key"
ajwerner added a commit to ajwerner/cockroach that referenced this pull request Sep 24, 2019
We've seen instability recently due to invariants being violated as
replicas catch up across periods of being removed and re-added to a range.
Due to learner replicas and their rollback behavior this is now a relatively
common case. Rather than handle all of these various scenarios this PR prevents
them from occuring by actively removing replicas when we determine that they
must have been removed.

Here's a high level overview of the change:

 * Once a Replica object has a non-zero Replica.mu.replicaID it will not
   change.
   * In this commit however, if a node crashes it may forget that it learned
     about a replica ID.
 * If a raft message or snapshot addressed to a higher replica ID is received
   the current replica will be removed completely.
 * If a replica sees a ChangeReplicasTrigger which removes it then it
   completely removes itself while applying that command.
 * Replica.mu.destroyStatus is used to meaningfully signify the removal state
   of a Replica. Replicas about to be synchronously removed are in
   destroyReasonRemovalPending.

This hopefully gives us some new invariants:

 * There is only ever at most 1 Replica which IsAlive() for a range on a Store
   at a time.
 * Once a Replica has a non-zero ReplicaID is never changes.
   * This applies only to the in-memory object, not the store itself.
 * Once a Replica applies a command as a part of the range descriptor it will
   never apply another command as a different Replica ID or outside of the
   Range.
   * Corrolary: a Replica created as a learner will only ever apply commands
     while that replica is in the range.

The change also introduces some new complexity. Namely we now allow removal of
uninitialized replicas, including their hard state. This allows us to catch up
across a split even when we know the RHS must have been removed.

Fixes cockroachdb#40367.

Issue cockroachdb#38772 (comment)
manifests itself as the RHS not being found for a merge. This happens because
the Replica is processing commands to catch itself up while it is not in the
range. This is no longer possible.

Fixes cockroachdb#40257.

Issue cockroachdb#40257 is another case of a replica processing commands while it is not
in the range.

Fixes cockroachdb#40470.

Issue cockroachdb#40470 is caused by a RHS learning about its existence and removal
prior to a LHS processing a split. This case is now handled properly and is
tested.

Release justification: This commit is safe for 19.2 because it fixes release
blockers.

Release note (bug fix): Avoid internal re-use of Replica objects to fix the following crashes:

    cockroachdb#38772 "found rXXX:{-} [, next=0, gen=0?] in place of the RHS"
    cockroachdb#39796 "replica descriptor of local store not found in right hand side of split"
    cockroachdb#40470 "split trigger found right-hand side with tombstone"
    cockroachdb#40257 "snapshot widens existing replica, but no replica exists for subsumed key"
ajwerner added a commit to ajwerner/cockroach that referenced this pull request Sep 24, 2019
We've seen instability recently due to invariants being violated as
replicas catch up across periods of being removed and re-added to a range.
Due to learner replicas and their rollback behavior this is now a relatively
common case. Rather than handle all of these various scenarios this PR prevents
them from occuring by actively removing replicas when we determine that they
must have been removed.

Here's a high level overview of the change:

 * Once a Replica object has a non-zero Replica.mu.replicaID it will not
   change.
   * In this commit however, if a node crashes it may forget that it learned
     about a replica ID.
 * If a raft message or snapshot addressed to a higher replica ID is received
   the current replica will be removed completely.
 * If a replica sees a ChangeReplicasTrigger which removes it then it
   completely removes itself while applying that command.
 * Replica.mu.destroyStatus is used to meaningfully signify the removal state
   of a Replica. Replicas about to be synchronously removed are in
   destroyReasonRemovalPending.

This hopefully gives us some new invariants:

 * There is only ever at most 1 Replica which IsAlive() for a range on a Store
   at a time.
 * Once a Replica has a non-zero ReplicaID is never changes.
   * This applies only to the in-memory object, not the store itself.
 * Once a Replica applies a command as a part of the range descriptor it will
   never apply another command as a different Replica ID or outside of the
   Range.
   * Corrolary: a Replica created as a learner will only ever apply commands
     while that replica is in the range.

The change also introduces some new complexity. Namely we now allow removal of
uninitialized replicas, including their hard state. This allows us to catch up
across a split even when we know the RHS must have been removed.

Fixes cockroachdb#40367.

Issue cockroachdb#38772 (comment)
manifests itself as the RHS not being found for a merge. This happens because
the Replica is processing commands to catch itself up while it is not in the
range. This is no longer possible.

Fixes cockroachdb#40257.

Issue cockroachdb#40257 is another case of a replica processing commands while it is not
in the range.

Fixes cockroachdb#40470.

Issue cockroachdb#40470 is caused by a RHS learning about its existence and removal
prior to a LHS processing a split. This case is now handled properly and is
tested.

Release justification: This commit is safe for 19.2 because it fixes release
blockers.

Release note (bug fix): Avoid internal re-use of Replica objects to fix the following crashes:

    cockroachdb#38772 "found rXXX:{-} [, next=0, gen=0?] in place of the RHS"
    cockroachdb#39796 "replica descriptor of local store not found in right hand side of split"
    cockroachdb#40470 "split trigger found right-hand side with tombstone"
    cockroachdb#40257 "snapshot widens existing replica, but no replica exists for subsumed key"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants