-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: eager GC for replicas blocking snapshots #10426
storage: eager GC for replicas blocking snapshots #10426
Conversation
This is a based on @tschottdorf #8944 with the primary difference being the addition of a test exercising the scenario described in the commit message. |
Reviewed 2 of 2 files at r1. pkg/storage/client_raft_test.go, line 687 at r1 (raw file):
this needs a rebase pkg/storage/client_raft_test.go, line 688 at r1 (raw file):
deserves a comment pkg/storage/client_raft_test.go, line 692 at r1 (raw file):
s/rng/rep/ pkg/storage/client_raft_test.go, line 745 at r1 (raw file):
"unexpected error %v"? pkg/storage/client_raft_test.go, line 747 at r1 (raw file):
comment? pkg/storage/store.go, line 3516 at r1 (raw file):
remove space after the colon pkg/storage/store.go, line 3520 at r1 (raw file):
verb for exReplica shouldn't have changed, should be %s still as it is elsewhere Comments from Reviewable |
Review status: all files reviewed at latest revision, 7 unresolved discussions, some commit checks failed. Comments from Reviewable |
I think you should make this close #8942. Review status: all files reviewed at latest revision, 9 unresolved discussions, some commit checks failed. pkg/storage/client_raft_test.go, line 684 at r1 (raw file):
Brief synopsis of the test would be good. pkg/storage/client_raft_test.go, line 715 at r1 (raw file):
Shouldn't you look up the right hand side of the split's Comments from Reviewable |
6f66401
to
70ef8d5
Compare
Review status: 0 of 2 files reviewed at latest revision, 9 unresolved discussions. pkg/storage/client_raft_test.go, line 684 at r1 (raw file):
|
70ef8d5
to
6f42385
Compare
Reviewed 2 of 2 files at r1, 2 of 2 files at r2. pkg/storage/store.go, line 3520 at r1 (raw file):
|
Consider the following situation on a Range X: - the range lives on nodes 1,2,3 - gets rebalanced to nodes 1,2,4 while node 3 is down - node 3 restarts, but the (now removed) replica remains - quiescent - the range splits - the right hand side of the split gets rebalanced to - nodes 1,2,3. In order to receive a snapshot for in the last step, node 3 needs to have garbage-collected its old pre-split replica. If it weren't for node 3's downtime this would normally happen eagerly as its peers will inform it of its fate. In this scenario however, one would have to wait until a) a client request creates the Raft group (which is unlikely as it isn't being addressed any more) or b) a queue picks it up (which can take a long time). Instead, when returning an overlap and the overlap appears to be inactive, we add to the GC queue (which in turn activates the replica). For the situation above, we could also hope to only create the Raft group (as that would likely, but not necessarily, bring it into contact with its ex-peers). However, none of the old members may still be around in larger clusters, so going for the GC queue directly is the better option.
6f42385
to
809149d
Compare
Reviewed 1 of 2 files at r2, 1 of 1 files at r3. pkg/storage/store.go, line 3520 at r1 (raw file):
|
If a store receives a snapshot that overlaps an existing replica, we take it as a sign that the local replica may no longer be a member of its range and queue it for processing in the replica GC queue. When this code was added (cockroachdb#10426), the replica GC queue was quite aggressive about processing replicas, and so the implementation was careful to only queue a replica if it looked "inactive." Unfortunately, this inactivity check rotted when epoch-based leases were introduced a month later (cockroachdb#10305). An inactive replica with an epoch-based lease can incorrectly be considered active, even if all of the other members of the range have stopped sending it messages, because the epoch-based lease will continue to be heartbeated by the node itself. (With an expiration-based lease, the replica's local copy of the lease would quickly expire if the other members of the range stopped sending it messages.) Fixing the inactivity check to work with epoch-based leases is rather tricky. Quiescent replicas are nearly indistinguishable from abandoned replicas. This commit just removes the inactivity check and unconditionally queues replicas for GC if they intersect an incoming snapshot. The replica GC queue check is relatively cheap (one or two meta2 lookups), and overlapping snapshot situations are not expected to last for very long. Release note: None
If a store receives a snapshot that overlaps an existing replica, we take it as a sign that the local replica may no longer be a member of its range and queue it for processing in the replica GC queue. When this code was added (cockroachdb#10426), the replica GC queue was quite aggressive about processing replicas, and so the implementation was careful to only queue a replica if it looked "inactive." Unfortunately, this inactivity check rotted when epoch-based leases were introduced a month later (cockroachdb#10305). An inactive replica with an epoch-based lease can incorrectly be considered active, even if all of the other members of the range have stopped sending it messages, because the epoch-based lease will continue to be heartbeated by the node itself. (With an expiration-based lease, the replica's local copy of the lease would quickly expire if the other members of the range stopped sending it messages.) Fixing the inactivity check to work with epoch-based leases is rather tricky. Quiescent replicas are nearly indistinguishable from abandoned replicas. This commit just removes the inactivity check and unconditionally queues replicas for GC if they intersect an incoming snapshot. The replica GC queue check is relatively cheap (one or two meta2 lookups), and overlapping snapshot situations are not expected to last for very long. Release note: None
If a store receives a snapshot that overlaps an existing replica, we take it as a sign that the local replica may no longer be a member of its range and queue it for processing in the replica GC queue. When this code was added (cockroachdb#10426), the replica GC queue was quite aggressive about processing replicas, and so the implementation was careful to only queue a replica if it looked "inactive." Unfortunately, this inactivity check rotted when epoch-based leases were introduced a month later (cockroachdb#10305). An inactive replica with an epoch-based lease can incorrectly be considered active, even if all of the other members of the range have stopped sending it messages, because the epoch-based lease will continue to be heartbeated by the node itself. (With an expiration-based lease, the replica's local copy of the lease would quickly expire if the other members of the range stopped sending it messages.) Fixing the inactivity check to work with epoch-based leases is rather tricky. Quiescent replicas are nearly indistinguishable from abandoned replicas. This commit just removes the inactivity check and unconditionally queues replicas for GC if they intersect an incoming snapshot. The replica GC queue check is relatively cheap (one or two meta2 lookups), and overlapping snapshot situations are not expected to last for very long. Release note: None
If a store receives a snapshot that overlaps an existing replica, we take it as a sign that the local replica may no longer be a member of its range and queue it for processing in the replica GC queue. When this code was added (cockroachdb#10426), the replica GC queue was quite aggressive about processing replicas, and so the implementation was careful to only queue a replica if it looked "inactive." Unfortunately, this inactivity check rotted when epoch-based leases were introduced a month later (cockroachdb#10305). An inactive replica with an epoch-based lease can incorrectly be considered active, even if all of the other members of the range have stopped sending it messages, because the epoch-based lease will continue to be heartbeated by the node itself. (With an expiration-based lease, the replica's local copy of the lease would quickly expire if the other members of the range stopped sending it messages.) Fixing the inactivity check to work with epoch-based leases is rather tricky. Quiescent replicas are nearly indistinguishable from abandoned replicas. This commit just removes the inactivity check and unconditionally queues replicas for GC if they intersect an incoming snapshot. The replica GC queue check is relatively cheap (one or two meta2 lookups), and overlapping snapshot situations are not expected to last for very long. Release note: None
Consider the following situation on a Range X:
In order to receive a snapshot for in the last step, node 3 needs to
have garbage-collected its old pre-split replica. If it weren't for node
3's downtime this would normally happen eagerly as its peers will inform
it of its fate. In this scenario however, one would have to wait until
a) a client request creates the Raft group (which is unlikely as it
isn't being addressed any more) or b) a queue picks it up (which can
take a long time).
Instead, when returning an overlap and the overlap appears to be
inactive, we add to the GC queue (which in turn activates the
replica). For the situation above, we could also hope to only create the
Raft group (as that would likely, but not necessarily, bring it into
contact with its ex-peers). However, none of the old members may still
be around in larger clusters, so going for the GC queue directly is the
better option.
Closes #8942
This change is