-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Epoch-based range leases implementation #10305
Conversation
+cc @andreimatei |
0aded85
to
c76b6b3
Compare
@tschottdorf definitely needs to put his eyes on this. Review status: 0 of 41 files reviewed at latest revision, 8 unresolved discussions, some commit checks failed. build/npm.installed, line 0 at r1 (raw file): pkg/roachpb/data.proto, line 313 at r1 (raw file):
I think we discussed not using pkg/roachpb/data_test.go, line 593 at r1 (raw file):
I think this might be clearer if you define helper methods to create epoch and expiration based leases. For example:
I'm asking for this because it was visually difficult to distinguish between the prefixes pkg/storage/below_raft_protos_test.go, line 92 at r1 (raw file):
Isn't it problematic that this is changing? Will this PR require a freeze-cluster upgrade? pkg/storage/replica.go, line 735 at r1 (raw file):
Any reason you changed these to values? Using pkg/storage/replica_range_lease.go, line 267 at r1 (raw file):
I thought the previous stasis period was present so that we could adjust the max clock offset without fouling up previous leases. Guess I'm not following why it is safe to remove that field now. pkg/storage/replica_range_lease.go, line 283 at r1 (raw file):
Why the named return value? Those are generally frowned upon, especially when there is only a single return value like here. pkg/storage/storagebase/proposer_kv.proto, line 34 at r1 (raw file):
I realize using Comments from Reviewable |
5337712
to
f5ddf81
Compare
Review status: 0 of 41 files reviewed at latest revision, 8 unresolved discussions. build/npm.installed, line at r1 (raw file):
|
4814c4f
to
240a377
Compare
it's nice how not many things had to change :) Review status: 0 of 41 files reviewed at latest revision, 17 unresolved discussions, some commit checks failed. pkg/storage/replica.go, line 785 at r2 (raw file):
"current timestamp" seems stale. Also the new But... is the fact that we're completely shifting to using a request timestamp a good thing? For example, don't we want to renew a lease if it's about to expire even if the current request has an older timestamp? pkg/storage/replica.go, line 881 at r2 (raw file):
Why not pkg/storage/replica.go, line 2732 at r2 (raw file):
I think the commit message should call out that you've fixed the previously possible (I think...) situation: a replica proposes a command under lease A, and then quickly transfers A away, then gets another lease B, and the applies the command (the application's check of the lease would succeed) pkg/storage/replica.go, line 2817 at r2 (raw file):
nit: s/lease at/proposed under lease pkg/storage/replica_range_lease.go, line 33 at r3 (raw file):
I think this comments needs to be expanded now that this struct handles two types of leases. That's magical. pkg/storage/replica_range_lease.go, line 132 at r3 (raw file):
how about extracting this closure in a top-level function, to keep functions reasonably-sized? pkg/storage/replica_range_lease.go, line 136 at r3 (raw file):
I think you've created an accessor for pkg/storage/replica_range_lease.go, line 151 at r3 (raw file):
as we discussed, you only wanna set this pkg/storage/replica_state.go, line 142 at r3 (raw file):
maybe replace this with a Comments from Reviewable |
This has come up in cockroachdb#10305: we have a test that sets the MaxOffset to a very high value because it will push the clock and it doesn't want the MaxOffset checks that assert that commands are not coming from the (far) future to complain. That unfortunately doesn't quite work, as the MaxOffset is used for the leases' stasis period (which apparently only 10305 starts checking) and so the leases would be in always in stasis. Also moved the MaxOffset to the TestingKnobs, so that people are not encouraged to use it. Also cleaned up some MaxOffsets that are no longer needed since we stopped starting new leases at curTime + MaxOffset.
Reviewed 26 of 41 files at r1, 2 of 8 files at r2, 1 of 1 files at r3, 12 of 12 files at r4. pkg/roachpb/data.proto, line 313 at r1 (raw file):
|
Only gave this a high-level overview so far, looks roughly like I expected (but then again, the devil is the details, and haven't looked at those). Definitely needs a lot of tests written. What's the plan going forward w.r.t #10327? I strongly suspect that #10327 will go in first, and then this PR would be the natural one for fixing up proposer-evaluated KV's lease handling (but either way it would have to be compatible with #10327 and thus be rebased on top of that, though likely worth holding off for a few days hoping that my PR can merge until then). Reviewed 26 of 41 files at r1, 2 of 8 files at r2, 1 of 1 files at r3, 12 of 12 files at r4. pkg/roachpb/data.go, line 976 at r4 (raw file):
pkg/storage/client_raft_test.go, line 974 at r4 (raw file):
@tamird: ideally we would stress this test in CI since its contents changed significantly. I don't think that's currently happening, right? Interesting problem, but I think the diff hunks show the method header for each hunk and perhaps you can grep those? Or that's already happening. pkg/storage/helpers_test.go, line 195 at r4 (raw file):
nit: pkg/storage/node_liveness.go, line 149 at r4 (raw file):
Looks like this context should have a timeout of pkg/storage/replica.go, line 735 at r1 (raw file):
|
This has come up in cockroachdb#10305: we have a test that sets the MaxOffset to a very high value because it will push the clock and it doesn't want the MaxOffset checks that assert that commands are not coming from the (far) future to complain. That unfortunately doesn't quite work, as the MaxOffset is used for the leases' stasis period (which apparently only 10305 starts checking) and so the leases would be in always in stasis. Also moved the MaxOffset to the TestingKnobs, so that people are not encouraged to use it. Also cleaned up some MaxOffsets that are no longer needed since we stopped starting new leases at curTime + MaxOffset.
Review status: all files reviewed at latest revision, 41 unresolved discussions, some commit checks failed. pkg/sql/txn_restart_test.go, line 1136 at r4 (raw file):
|
@bdarnell I'm getting started again on this because of course it's rotted to brown pulp. I did the massive merge last night and am working on tests now. I just had some spare time, but my real impetus is for proposer evaluated KV. That feels like it might end up being a crucial step before GA. I'm wondering whether you're actively working on it or not to gauge timing. Clearly I'll need to have this merged and well debugged before we take that next step. |
Yeah, if things will just stop being on fire then propEvalKV is next on my plate. I think I'll be starting with the command queue changes (#10413) before I get to anything lease-related, though (#10414). I agree that propEvalKV (or rather the zero-downtime upgrades that it enables) will be a requirement for GA. |
Review status: all files reviewed at latest revision, 41 unresolved discussions, some commit checks failed. pkg/roachpb/data.go, line 976 at r4 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
Changed to pkg/roachpb/data.proto, line 313 at r1 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Done. pkg/sql/txn_restart_test.go, line 1136 at r4 (raw file): Previously, andreimatei (Andrei Matei) wrote…
Done. pkg/storage/below_raft_protos_test.go, line 92 at r1 (raw file): Previously, bdarnell (Ben Darnell) wrote…
What's latest on this? @andreimatei changed this already and now I'm changing it again...and have to. pkg/storage/client_raft_test.go, line 974 at r4 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
I've done significant stress tests on existing unittests. I'll leave the rest of this question to @tamird. pkg/storage/client_raft_test.go, line 1047 at r4 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Done. Didn't fail after 1000 runs. pkg/storage/client_test.go, line 1019 at r4 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Good point. Changed. pkg/storage/helpers_test.go, line 195 at r4 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
Done. pkg/storage/node_liveness.go, line 82 at r4 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Done. pkg/storage/node_liveness.go, line 149 at r4 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
Done. pkg/storage/node_liveness.go, line 181 at r4 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Done. pkg/storage/node_liveness.go, line 203 at r4 (raw file): Previously, bdarnell (Ben Darnell) wrote…
The supplied pkg/storage/node_liveness.go, line 204 at r4 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Oops, fixed. pkg/storage/replica.go, line 735 at r1 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
Why does it need to be? There's no hacking necessary. Just calling pkg/storage/replica.go, line 785 at r2 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
OK and cleaned up comment a bit. pkg/storage/replica.go, line 881 at r2 (raw file): Previously, andreimatei (Andrei Matei) wrote…
This is a useful piece of info. Why put it at verbose level 2? pkg/storage/replica.go, line 2732 at r2 (raw file): Previously, andreimatei (Andrei Matei) wrote…
Done. pkg/storage/replica.go, line 2817 at r2 (raw file): Previously, andreimatei (Andrei Matei) wrote…
Done. pkg/storage/replica.go, line 756 at r4 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
It has to stay here in order to accommodate the pkg/storage/replica.go, line 901 at r4 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Done. pkg/storage/replica.go, line 909 at r4 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Done. pkg/storage/replica.go, line 1190 at r4 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Done. pkg/storage/replica.go, line 1695 at r4 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Good idea. Done. pkg/storage/replica.go, line 1768 at r4 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
Probably better to remove it, but it's iffy because there are commands without leases and what I must do now is set the pkg/storage/replica.go, line 2818 at r4 (raw file): Previously, bdarnell (Ben Darnell) wrote…
They definitely don't skip the lease check. pkg/storage/replica_proposal.go, line 130 at r4 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
The pkg/storage/replica_range_lease.go, line 33 at r3 (raw file): Previously, andreimatei (Andrei Matei) wrote…
Done. pkg/storage/replica_range_lease.go, line 132 at r3 (raw file): Previously, andreimatei (Andrei Matei) wrote…
Done. pkg/storage/replica_range_lease.go, line 136 at r3 (raw file): Previously, andreimatei (Andrei Matei) wrote…
Done. pkg/storage/replica_range_lease.go, line 151 at r3 (raw file): Previously, andreimatei (Andrei Matei) wrote…
Yes quite right. Will add a test. This code path gets tested plenty by our unittests, btw. It works because the replica gets the not lease holder error and retries. pkg/storage/replica_range_lease.go, line 150 at r4 (raw file): Previously, bdarnell (Ben Darnell) wrote…
No, that was an error. The code still worked in unittests because the replica would just retry on receipt of the pkg/storage/replica_range_lease.go, line 230 at r4 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Done. pkg/storage/replica_range_lease.go, line 238 at r4 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Done. pkg/storage/replica_range_lease.go, line 244 at r4 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Done. pkg/storage/replica_range_lease.go, line 283 at r4 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
It can't. We need the replica's pkg/storage/replica_state.go, line 142 at r3 (raw file): Previously, andreimatei (Andrei Matei) wrote…
Done. Comments from Reviewable |
ba8bf9f
to
69f8a58
Compare
Reviewed 1 of 41 files at r1, 25 of 49 files at r5. pkg/storage/client_raft_test.go, line 913 at r5 (raw file):
use a subtest instead - this test already wraps itself in a function, so the change will be trivial. pkg/storage/client_raft_test.go, line 973 at r5 (raw file):
revert this when you go to a subtest here. pkg/storage/replica_state.go, line 142 at r3 (raw file): Previously, spencerkimball (Spencer Kimball) wrote…
Haven't the semantics here changed, now? It seems odd for this function to swallow empty leases, and this makes it quite difficult to see which callers of this function may encounter this scenario. pkg/storage/stores.go, line 84 at r5 (raw file):
why change this? pkg/storage/storagebase/proposer_kv.proto, line 119 at r5 (raw file):
why rely on empty here? seems like you want it to be nil when a lease is not required. pkg/storage/storagebase/state.proto, line 53 at r5 (raw file):
isn't this what's causing the below raft protos check to fail? why do you need to change this nullability? it seems like you've just swapped it for a Comments from Reviewable |
Reviewed 15 of 43 files at r6. pkg/storage/below_raft_protos_test.go, line 92 at r1 (raw file): Previously, spencerkimball (Spencer Kimball) wrote…
So this still changed? pkg/storage/client_raft_test.go, line 920 at r6 (raw file):
you can call Comments from Reviewable |
29cb6a8
to
c7e137d
Compare
Review status: 17 of 45 files reviewed at latest revision, 39 unresolved discussions, some commit checks failed. pkg/storage/below_raft_protos_test.go, line 92 at r1 (raw file): Previously, tamird (Tamir Duberstein) wrote…
Well we now have an pkg/storage/client_raft_test.go, line 920 at r6 (raw file): Previously, tamird (Tamir Duberstein) wrote…
Done. Comments from Reviewable |
65d8b7e
to
31ad27d
Compare
Reviewed 1 of 8 files at r2. pkg/storage/replica.go, line 881 at r2 (raw file): Previously, spencerkimball (Spencer Kimball) wrote…
pkg/storage/replica.go, line 1694 at r7 (raw file):
I suggest reverting this change and not passing status to pkg/storage/replica.go, line 2173 at r7 (raw file):
please document the new param pkg/storage/replica.go, line 3213 at r7 (raw file):
pkg/storage/replica_range_lease.go, line 43 at r7 (raw file):
nit: double space pkg/storage/replica_range_lease.go, line 45 at r7 (raw file):
the last sentence scares the reader. Also the implementation of pkg/storage/replica_range_lease.go, line 153 at r7 (raw file):
it's a bit subtle why it's correct to append to pkg/storage/replica_range_lease.go, line 172 at r7 (raw file):
what does it mean for pkg/storage/replica_range_lease.go, line 288 at r7 (raw file):
// state of the lease in relationship to pkg/storage/replica_range_lease.go, line 289 at r7 (raw file):
mmm this pattern seems weird to me... You're tying a timestamp to a pkg/storage/replica_range_lease.go, line 295 at r7 (raw file):
... in relationship with pkg/storage/replica_range_lease.go, line 352 at r7 (raw file):
Can pkg/storage/replica_range_lease.go, line 431 at r7 (raw file):
the new param should be documented. But I don't think it should have been introduced at all. Why not let pkg/storage/replica_range_lease.go, line 269 at r8 (raw file):
instead of this method, can we have a simple member boolean initialized to the constant answer? pkg/storage/replica_range_lease.go, line 270 at r8 (raw file):
what does it mean for Comments from Reviewable |
Review status: 13 of 45 files reviewed at latest revision, 52 unresolved discussions, some commit checks failed. pkg/storage/replica.go, line 881 at r2 (raw file): Previously, andreimatei (Andrei Matei) wrote…
OK, but I'd like to just make the point that this stuff has become a mess. There should be only one way to do things, for good or bad. pkg/storage/replica.go, line 1694 at r7 (raw file): Previously, andreimatei (Andrei Matei) wrote…
See later comment. pkg/storage/replica.go, line 2173 at r7 (raw file): Previously, andreimatei (Andrei Matei) wrote…
Done. pkg/storage/replica.go, line 3213 at r7 (raw file): Previously, andreimatei (Andrei Matei) wrote…
There is no lease specified if that flag is true on the batch request (i.e. the lease is neither checked pre-Raft or post-Raft). Added a comment. pkg/storage/replica_range_lease.go, line 43 at r7 (raw file): Previously, andreimatei (Andrei Matei) wrote…
Done. pkg/storage/replica_range_lease.go, line 45 at r7 (raw file): Previously, andreimatei (Andrei Matei) wrote…
The reader is easily scared. pkg/storage/replica_range_lease.go, line 153 at r7 (raw file): Previously, andreimatei (Andrei Matei) wrote…
I'm going to leave a TODO here for you. pkg/storage/replica_range_lease.go, line 172 at r7 (raw file): Previously, andreimatei (Andrei Matei) wrote…
Removed the check, though I'm going to keep pkg/storage/replica_range_lease.go, line 288 at r7 (raw file): Previously, andreimatei (Andrei Matei) wrote…
Done. pkg/storage/replica_range_lease.go, line 289 at r7 (raw file): Previously, andreimatei (Andrei Matei) wrote…
They're all necessary through the stack at one place or another. Obviously we can't use the timestamp anymore. We need the lease primarily, but turns out we also need the timestamp and of course we need the state (although we could probably trim that one and return two variables). The liveness is necessary to increment the epoch if required when we request the lease. This is the bundle of stuff that's necessary, like it or not. Returning a bunch of values sounds worse to me. pkg/storage/replica_range_lease.go, line 295 at r7 (raw file): Previously, andreimatei (Andrei Matei) wrote…
Done. pkg/storage/replica_range_lease.go, line 352 at r7 (raw file): Previously, andreimatei (Andrei Matei) wrote…
Yes. When evaluating a lease held by another replica, it's possible that gossip is not as up-to-date as Raft's propagation of the actual lease. It's not such a big deal as you really only care about this for the owner of the lease and the owner of the lease is not dependent on gossip. However, we obviously don't want nodes panicking when checking if another node's lease is valid (say for metrics). pkg/storage/replica_range_lease.go, line 431 at r7 (raw file): Previously, andreimatei (Andrei Matei) wrote…
Because then it would really need to verify it again. That's the job of pkg/storage/replica_range_lease.go, line 269 at r8 (raw file): Previously, andreimatei (Andrei Matei) wrote…
I really hate adding new member variables to pkg/storage/replica_range_lease.go, line 270 at r8 (raw file): Previously, andreimatei (Andrei Matei) wrote…
Unittests. Comments from Reviewable |
87a50dd
to
7952470
Compare
Review status: 13 of 46 files reviewed at latest revision, 37 unresolved discussions, some commit checks pending. pkg/storage/replica_range_lease.go, line 172 at r7 (raw file): Previously, spencerkimball (Spencer Kimball) wrote…
Actually, this is a very important predicate. When I took it out all hell broke loose. This is a subtle predicate. What you have to keep in mind is that the status of the current lease has already been determined to need updating. What this is meant to do is check whether that prior lease was epoch-based and is now expired. If we're also requesting an epoch-based lease now, then we either need to heartbeat our own liveness record or else increment the prior owner's epoch. I think what was confusing was using Comments from Reviewable |
7952470
to
97f1e4c
Compare
I've done another pass over this monster to make sure it's safe without freeze-cluster; it looks good except for my comment about OriginReplica in replica.go Reviewed 33 of 43 files at r6, 4 of 9 files at r8, 7 of 7 files at r9. pkg/roachpb/data.proto, line 313 at r9 (raw file):
s/field/field is/ pkg/storage/below_raft_protos_test.go, line 92 at r1 (raw file):
Yeah, but the lesson is a bit ambiguous. A half-finished freeze-cluster implementation was a waste of time in retrospect, but if we had finished it and used it routinely we could have avoided the bugs around the 20161110 release (and probably simplified some other transitions too). I had assumed from the beginning that propEvalKV would require freeze-cluster because we'd be unable to support a feature-flagged transition, but Tobi made that work. pkg/storage/replica.go, line 843 at r9 (raw file):
s/have/has/ pkg/storage/replica.go, line 3220 at r9 (raw file):
OriginLease will be nil for log entries proposed before this change; we must validate them using the origin replica and timestamp instead. pkg/storage/replica.go, line 3222 at r9 (raw file):
When would this happen? (r.mu.state.Lease is nil but it's not a SkipLease command and we want to proceed) pkg/storage/replica.go, line 3248 at r9 (raw file):
Move this comment to the definition of verifyLease. pkg/storage/replica_command.go, line 1638 at r9 (raw file):
We need to keep this in to avoid a freeze-cluster. (OK, so it's a long shot that any log entries this old will still be around, but still. We'll do a big purge of this kind of stuff after propEvalKV is in) pkg/storage/store.go, line 830 at r9 (raw file):
This pattern means that it's impossible for StoreConfig to set EnableEpochRangeLeases to false when the env var (or default) is true. That didn't matter much for EnableCoaleascedHeartbeats but it's going to break the new "switcheroo" test you added when we flip the default. pkg/storage/storagebase/proposer_kv.proto, line 121 at r9 (raw file):
Add "If the command was proposed prior to the introduction of epoch leases, origin_lease will be nil, but the combination of origin_replica and timestamp are sufficient to verify an expiration-based lease." Comments from Reviewable |
Review status: 45 of 46 files reviewed at latest revision, 45 unresolved discussions, some commit checks failed. pkg/storage/replica.go, line 3213 at r7 (raw file): Previously, spencerkimball (Spencer Kimball) wrote…
This is what I was just referring to in the meeting. IsSingleSkipLeaseCheckRequest is true for RequestLease and TransferLease, so in propEvalKV the proposer had stale information it could propose an improper lease which would be blindly applied by all followers. That's not new with this PR; it's tracked separately in #10414. Comments from Reviewable |
97f1e4c
to
4a9f024
Compare
Review status: 45 of 46 files reviewed at latest revision, 45 unresolved discussions, some commit checks failed. pkg/roachpb/data.proto, line 313 at r9 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Done. pkg/storage/replica.go, line 3213 at r7 (raw file): Previously, bdarnell (Ben Darnell) wrote…
OK, I was missing something there. pkg/storage/replica.go, line 843 at r9 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Done. pkg/storage/replica.go, line 3220 at r9 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Done. pkg/storage/replica.go, line 3222 at r9 (raw file): Previously, bdarnell (Ben Darnell) wrote…
This is no longer necessary with the changes to restore the previous checks for non-epoch-based commands. pkg/storage/replica.go, line 3248 at r9 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Done. pkg/storage/replica_command.go, line 1638 at r9 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Done. pkg/storage/store.go, line 830 at r9 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Any suggestions? I considered an enum, and creating a pkg/storage/storagebase/proposer_kv.proto, line 121 at r9 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Done. Comments from Reviewable |
Review status: 35 of 46 files reviewed at latest revision, 31 unresolved discussions, some commit checks failed. pkg/storage/replica.go, line 3213 at r7 (raw file): Previously, spencerkimball (Spencer Kimball) wrote…
Ack, but I think the TODO should be more generic, explaining that we need to verify some lease, not call for removal of pkg/storage/replica_range_lease.go, line 45 at r7 (raw file): Previously, spencerkimball (Spencer Kimball) wrote…
OK, but then I think you should hint that the "switching" happens between server restarts, otherwise it sounds like it's a more dynamic decision. pkg/storage/replica_range_lease.go, line 295 at r7 (raw file): Previously, spencerkimball (Spencer Kimball) wrote…
nit: I find the comment confusing. "currently held lease at the specified ts" seems to be contradictory. pkg/storage/replica_range_lease.go, line 269 at r8 (raw file): Previously, spencerkimball (Spencer Kimball) wrote…
well, but if it is a constant... Comments from Reviewable |
4a9f024
to
c620d4a
Compare
Review status: 35 of 46 files reviewed at latest revision, 31 unresolved discussions, some commit checks failed. pkg/storage/replica.go, line 3213 at r7 (raw file): Previously, andreimatei (Andrei Matei) wrote…
Done. pkg/storage/replica_range_lease.go, line 45 at r7 (raw file): Previously, andreimatei (Andrei Matei) wrote…
Done. pkg/storage/replica_range_lease.go, line 295 at r7 (raw file): Previously, andreimatei (Andrei Matei) wrote…
Done. pkg/storage/replica_range_lease.go, line 269 at r8 (raw file): Previously, andreimatei (Andrei Matei) wrote…
OK, done. Comments from Reviewable |
c620d4a
to
eb4ccef
Compare
Reviewed 5 of 11 files at r10, 13 of 13 files at r11. pkg/storage/store.go, line 830 at r9 (raw file): Previously, spencerkimball (Spencer Kimball) wrote…
Yeah, we'll eventually remove it. I don't have any good suggestions; all the alternatives seem ugly. Comments from Reviewable |
Introduce new epoch-based range leases. These are designed to use the same machinery as the expiration-based leases but use epochs from the node liveness table instead of expirations. The same Lease protobuf is utilized for both types of leases, but there's now an optional Epoch. Previously, the lease proto had a "stasis" timestamp that's now removed and replaced by logic in the replica to evaluate the state of a lease. In order to evaluate whether a lease is valid at command apply time (downstream of Raft), we evaluate the lease upstream of Raft and send it with every Raft command to be compared to the lease at apply time. See: https://github.com/cockroachdb/cockroach/blob/master/docs/RFCS/range_leases.md Epoch-based leases can be enabled or disabled with the `COCKROACH_ENABLE_EPOCH_LEASES` environment variable. This change fixes a previously possible loophole in lease verification for expiration-based leases. In this scenario, a node could propose a command with an older lease, transfer it away, receive a new lease, and then execute the command.
eb4ccef
to
4c2b798
Compare
If a store receives a snapshot that overlaps an existing replica, we take it as a sign that the local replica may no longer be a member of its range and queue it for processing in the replica GC queue. When this code was added (cockroachdb#10426), the replica GC queue was quite aggressive about processing replicas, and so the implementation was careful to only queue a replica if it looked "inactive." Unfortunately, this inactivity check rotted when epoch-based leases were introduced a month later (cockroachdb#10305). An inactive replica with an epoch-based lease can incorrectly be considered active, even if all of the other members of the range have stopped sending it messages, because the epoch-based lease will continue to be heartbeated by the node itself. (With an expiration-based lease, the replica's local copy of the lease would quickly expire if the other members of the range stopped sending it messages.) Fixing the inactivity check to work with epoch-based leases is rather tricky. Quiescent replicas are nearly indistinguishable from abandoned replicas. This commit just removes the inactivity check and unconditionally queues replicas for GC if they intersect an incoming snapshot. The replica GC queue check is relatively cheap (one or two meta2 lookups), and overlapping snapshot situations are not expected to last for very long. Release note: None
If a store receives a snapshot that overlaps an existing replica, we take it as a sign that the local replica may no longer be a member of its range and queue it for processing in the replica GC queue. When this code was added (cockroachdb#10426), the replica GC queue was quite aggressive about processing replicas, and so the implementation was careful to only queue a replica if it looked "inactive." Unfortunately, this inactivity check rotted when epoch-based leases were introduced a month later (cockroachdb#10305). An inactive replica with an epoch-based lease can incorrectly be considered active, even if all of the other members of the range have stopped sending it messages, because the epoch-based lease will continue to be heartbeated by the node itself. (With an expiration-based lease, the replica's local copy of the lease would quickly expire if the other members of the range stopped sending it messages.) Fixing the inactivity check to work with epoch-based leases is rather tricky. Quiescent replicas are nearly indistinguishable from abandoned replicas. This commit just removes the inactivity check and unconditionally queues replicas for GC if they intersect an incoming snapshot. The replica GC queue check is relatively cheap (one or two meta2 lookups), and overlapping snapshot situations are not expected to last for very long. Release note: None
If a store receives a snapshot that overlaps an existing replica, we take it as a sign that the local replica may no longer be a member of its range and queue it for processing in the replica GC queue. When this code was added (cockroachdb#10426), the replica GC queue was quite aggressive about processing replicas, and so the implementation was careful to only queue a replica if it looked "inactive." Unfortunately, this inactivity check rotted when epoch-based leases were introduced a month later (cockroachdb#10305). An inactive replica with an epoch-based lease can incorrectly be considered active, even if all of the other members of the range have stopped sending it messages, because the epoch-based lease will continue to be heartbeated by the node itself. (With an expiration-based lease, the replica's local copy of the lease would quickly expire if the other members of the range stopped sending it messages.) Fixing the inactivity check to work with epoch-based leases is rather tricky. Quiescent replicas are nearly indistinguishable from abandoned replicas. This commit just removes the inactivity check and unconditionally queues replicas for GC if they intersect an incoming snapshot. The replica GC queue check is relatively cheap (one or two meta2 lookups), and overlapping snapshot situations are not expected to last for very long. Release note: None
If a store receives a snapshot that overlaps an existing replica, we take it as a sign that the local replica may no longer be a member of its range and queue it for processing in the replica GC queue. When this code was added (cockroachdb#10426), the replica GC queue was quite aggressive about processing replicas, and so the implementation was careful to only queue a replica if it looked "inactive." Unfortunately, this inactivity check rotted when epoch-based leases were introduced a month later (cockroachdb#10305). An inactive replica with an epoch-based lease can incorrectly be considered active, even if all of the other members of the range have stopped sending it messages, because the epoch-based lease will continue to be heartbeated by the node itself. (With an expiration-based lease, the replica's local copy of the lease would quickly expire if the other members of the range stopped sending it messages.) Fixing the inactivity check to work with epoch-based leases is rather tricky. Quiescent replicas are nearly indistinguishable from abandoned replicas. This commit just removes the inactivity check and unconditionally queues replicas for GC if they intersect an incoming snapshot. The replica GC queue check is relatively cheap (one or two meta2 lookups), and overlapping snapshot situations are not expected to last for very long. Release note: None
Introduce new epoch-based range leases. These are designed to use
the same machinery as the expiration-based leases but use epochs from
the node liveness table instead of expirations.
The same Lease protobuf is utilized for both types of leases, but
there's now an optional Epoch. Previously, the lease proto had a "stasis"
timestamp that's now removed and replaced by logic in the replica to
evaluate the state of a lease.
In order to evaluate whether a lease is valid at command apply time
(downstream of Raft), we evaluate the lease upstream of Raft and send
it with every Raft command to be compared to the lease at apply time.
See: https://github.com/cockroachdb/cockroach/blob/master/docs/RFCS/range_leases.md
This change is