-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvserver: allow lease extensions from followers in propBuf #61982
kvserver: allow lease extensions from followers in propBuf #61982
Conversation
91f0e3a
to
1239745
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @nvanbenschoten, and @tbg)
pkg/roachpb/api.go, line 281 at r1 (raw file):
func (rlr *RequestLeaseRequest) IsExtension() bool { return rlr.PrevLease.Replica.StoreID == rlr.Lease.Replica.StoreID && rlr.PrevLease.Equivalent(rlr.Lease)
I had to add Equivalent()
as a condition here, such that a lease request in a new epoch is not considered an extension of a lease in a previous epoch. Not sure if this makes sense or what the implications are, would appreciate some extra eyes on it.
This was added to make TestRequestsOnLaggingReplica
pass -- otherwise, when a partitioned leader rejoined the cluster in a later epoch it would consider its lease request to be an extension of its previous lease (in the previous epoch), causing a request to the old leader to block rather than redirect to the new leader. I believe this scenario was the entire motivation for rejecting lease acquisition proposals in the first place.
Equivalent()
does not check whether expiration-based leases have expired (only epoch-based leases), which I think could cause the same problem for system ranges. We could e.g. check it against the current time, or even just compare with the current local lease in propBuf.leaseStatus
to check whether the lease extension proposal is equivalent and still valid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 3 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @nvanbenschoten)
pkg/roachpb/api.go, line 281 at r1 (raw file):
Previously, erikgrinaker (Erik Grinaker) wrote…
I had to add
Equivalent()
as a condition here, such that a lease request in a new epoch is not considered an extension of a lease in a previous epoch. Not sure if this makes sense or what the implications are, would appreciate some extra eyes on it.This was added to make
TestRequestsOnLaggingReplica
pass -- otherwise, when a partitioned leader rejoined the cluster in a later epoch it would consider its lease request to be an extension of its previous lease (in the previous epoch), causing a request to the old leader to block rather than redirect to the new leader. I believe this scenario was the entire motivation for rejecting lease acquisition proposals in the first place.
Equivalent()
does not check whether expiration-based leases have expired (only epoch-based leases), which I think could cause the same problem for system ranges. We could e.g. check it against the current time, or even just compare with the current local lease inpropBuf.leaseStatus
to check whether the lease extension proposal is equivalent and still valid.
This makes sense, and you have a good point regarding expiration-based leases. It almost seems as though we should include an epoch even in expiration-based leases, unfortunately this isn't going to work since a zero epoch is what encodes a lease as expiration-based:
Lines 1902 to 1904 in e924d91
if l.Epoch == 0 { | |
return LeaseExpiration | |
} |
Comparing against anything from the local replica state isn't going to help us with the problem of requesting a lease based on a stale lease (& leaving requests waiting for our replica to catch up), so it would have to be a timestamp-based route that we're taking. This is mildly annoying as well, as there's no obvious way to distinguish a lease extension (after the range has been dormant for a while) from the kind of extension we want to block. @nvanbenschoten do you have an idea? Maybe adding an auxiliary epoch field that's only used for expiration-based leases? I don't like the look of that but it seems better than trying to use timestamps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make the commit message explicit about the fact that the change is only about expiration-based leases (right?)
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @erikgrinaker and @nvanbenschoten)
pkg/kv/kvserver/replica_proposal_buf.go, line 581 at r1 (raw file):
// also send lease extensions for an existing leaseholder. if isExtension { log.VEventf(ctx, 4, "proposing lease extension even though we're not the leader")
nit: 4
is to high, generally only used for extremely verbose messages in loops. Use the more usual 2
.
pkg/kv/kvserver/replica_proposal_buf_test.go, line 269 at r1 (raw file):
var data []byte if leaseReq { pd, data = pc.newLeaseProposal(roachpb.Lease{}, false)
false /* newLeaseExtension */
to be in compliance with the style guide
But much better to give them names:
type leaseOpt bool
const (
brandNewLease leaseOpt = false
extension = true
)
pkg/roachpb/api.go, line 278 at r1 (raw file):
// IsExtension returns whether the lease request is an extension of an existing // lease or whether it is claiming the lease away from a previous owner.
nit: s/or whether it is claiming the lease away from a previous owner//
The disjunction is confusing because it's not clear if it refers to the cases where IsExtension
returns true
or if it refers to the distinction between true or false. Consider clarifying, or better yet just remove the latter part because it's obvious enough.
pkg/roachpb/api.go, line 281 at r1 (raw file):
Equivalent() does not check whether expiration-based leases have expired (only epoch-based leases), which I think could cause the same problem for system ranges. We could e.g. check it against the current time, or even just compare with the current local lease in propBuf.leaseStatus to check whether the lease extension proposal is equivalent and still valid.
Equivalent
doesn't check whether either type of lease has expired - it doesn't even have access to a clock.
Line 1927 in e924d91
func (l Lease) Equivalent(newL Lease) bool { |
But otherwise I think you're right - I think we should check whether the current lease is valid. That's what it should mean for a "lease" to be an extension. One can't extend a lease it no longer has.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 3 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @erikgrinaker)
pkg/kv/kvserver/replica_proposal_buf_test.go, line 269 at r1 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
false /* newLeaseExtension */
to be in compliance with the style guide
But much better to give them names:type leaseOpt bool const ( brandNewLease leaseOpt = false extension = true )
+1 for false /* newLeaseExtension */
. Please don't introduce an enum for a test helper function that's called 4 times, all in the same file.
pkg/kv/kvserver/replica_proposal_buf_test.go, line 591 at r1 (raw file):
}, { name: "follower, lease extension to known eligible leader",
This name seems a little off. The follower is the one who holds the lease and is performing an extension, right? So should it be follower, lease extension despite known eligible leader
or something along those lines?
pkg/roachpb/api.go, line 281 at r1 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
Equivalent() does not check whether expiration-based leases have expired (only epoch-based leases), which I think could cause the same problem for system ranges. We could e.g. check it against the current time, or even just compare with the current local lease in propBuf.leaseStatus to check whether the lease extension proposal is equivalent and still valid.
Equivalent
doesn't check whether either type of lease has expired - it doesn't even have access to a clock.
Line 1927 in e924d91
func (l Lease) Equivalent(newL Lease) bool {
But otherwise I think you're right - I think we should check whether the current lease is valid. That's what it should mean for a "lease" to be an extension. One can't extend a lease it no longer has.
This is an interesting dilemma, as it strikes at the question of what constitutes a lease transfer and what lease requests are safe from getting wedged. I don't think the two sets are equivalent.
We consider a lease "extension" at different parts of the code to be one that does not change the lease's owner. We make no attempt to say that the existing lease is still valid - if it is not and has already been replaced, the lease extension will fail.
An argument could be made that a lease that is replaced by a new lease with the same owner but a new sequence is not a true "extension", because writes proposed under the old lease will be rejected if reordered and applying after the new lease. This is the case during a node restart, where the node's minLeaseProposedTS
is bumped, triggering this logic to force a new lease sequence number - the increased start time leads to !Equivalent()
here. If we wanted to detect this case specifically from the lease request, we could compare PrevLease.Start
to MinProposedTS
.
But neither of these definitions help classify when a lease request is safe from getting wedged or not. In either case, it would be possible for the leaseholder to have been partitioned, replaced non-cooperatively, and fallen far behind on its log. What it sounds like you're getting at is that to feel sufficiently comfortable ignoring the lease-request-on-follower check, we'd want to make sure that the leaseholder couldn't have lost its lease non-cooperatively to some other replica. This gives us a reasonable assurance that the proposing replica doesn't have a wall of log entries waiting for it behind a network partition that it will need to wait on before hearing back about its lease request.
If that's the case, then maybe we shouldn't be worrying about the request and whether it looks like an extension at all. Maybe we should just be checking (*Replica).ownsValidLeaseRLocked
in FlushLockedWithRaftGroup
to determine whether we should allow a lease request to proceed on a follower.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make the commit message explicit about the fact that the change is only about expiration-based leases
It's about epoch-based leases too. The initial suggestion was to forward all lease extensions (as defined by the previous and current lease having the same store ID in the lease request). However, this broke TestRequestsOnLaggingReplica
since an epoch-based lease was being reclaimed by an old leaseholder in a new epoch, which was considered an extension, blocking requests to the old leaseholder.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @erikgrinaker, @nvanbenschoten, and @tbg)
pkg/kv/kvserver/replica_proposal_buf.go, line 581 at r1 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
nit:
4
is to high, generally only used for extremely verbose messages in loops. Use the more usual2
.
Sure. I figured this was a fairly benign scenario compared to the others, and might be unnecessarily noisy since it'd be output on every lease extension, but will bump it to 2.
pkg/kv/kvserver/replica_proposal_buf_test.go, line 269 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
+1 for
false /* newLeaseExtension */
. Please don't introduce an enum for a test helper function that's called 4 times, all in the same file.
Will do.
pkg/kv/kvserver/replica_proposal_buf_test.go, line 591 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
This name seems a little off. The follower is the one who holds the lease and is performing an extension, right? So should it be
follower, lease extension despite known eligible leader
or something along those lines?
Makes sense, thanks.
pkg/roachpb/api.go, line 281 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
This is an interesting dilemma, as it strikes at the question of what constitutes a lease transfer and what lease requests are safe from getting wedged. I don't think the two sets are equivalent.
We consider a lease "extension" at different parts of the code to be one that does not change the lease's owner. We make no attempt to say that the existing lease is still valid - if it is not and has already been replaced, the lease extension will fail.
An argument could be made that a lease that is replaced by a new lease with the same owner but a new sequence is not a true "extension", because writes proposed under the old lease will be rejected if reordered and applying after the new lease. This is the case during a node restart, where the node's
minLeaseProposedTS
is bumped, triggering this logic to force a new lease sequence number - the increased start time leads to!Equivalent()
here. If we wanted to detect this case specifically from the lease request, we could comparePrevLease.Start
toMinProposedTS
.But neither of these definitions help classify when a lease request is safe from getting wedged or not. In either case, it would be possible for the leaseholder to have been partitioned, replaced non-cooperatively, and fallen far behind on its log. What it sounds like you're getting at is that to feel sufficiently comfortable ignoring the lease-request-on-follower check, we'd want to make sure that the leaseholder couldn't have lost its lease non-cooperatively to some other replica. This gives us a reasonable assurance that the proposing replica doesn't have a wall of log entries waiting for it behind a network partition that it will need to wait on before hearing back about its lease request.
If that's the case, then maybe we shouldn't be worrying about the request and whether it looks like an extension at all. Maybe we should just be checking
(*Replica).ownsValidLeaseRLocked
inFlushLockedWithRaftGroup
to determine whether we should allow a lease request to proceed on a follower.
Yep, great summary. I think ownsValidLeaseRLocked()
might be the right solution here, but I wasn't sure if we could sometimes be proxying lease requests from other replicas or if the proposal buffer was exclusively for proposals the node generated itself. I'll try implementing this.
1239745
to
26584e4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @erikgrinaker, @nvanbenschoten, and @tbg)
pkg/roachpb/api.go, line 281 at r1 (raw file):
Previously, erikgrinaker (Erik Grinaker) wrote…
Yep, great summary. I think
ownsValidLeaseRLocked()
might be the right solution here, but I wasn't sure if we could sometimes be proxying lease requests from other replicas or if the proposal buffer was exclusively for proposals the node generated itself. I'll try implementing this.
Updated the PR, let me know what you think.
26584e4
to
1beb8c6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make the commit message explicit about the fact that the change is only about expiration-based leases
It's about epoch-based leases too. The initial suggestion was to forward all lease extensions (as defined by the previous and current lease having the same store ID in the lease request). However, this broke TestRequestsOnLaggingReplica since an epoch-based lease was being reclaimed by an old leaseholder in a new epoch, which was considered an extension, blocking requests to the old leaseholder.
But we've now established that "as defined by the previous and current lease having the same store ID" is the wrong definition for an extension, right? So what remains only affects expiration-based leases, I think? Like, there's no such thing as "an extension" for epoch-based leases, is there? For epoch-based ranges, a replica with a valid lease never proposes a new lease (does it?).
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @erikgrinaker, @nvanbenschoten, and @tbg)
pkg/kv/kvserver/replica_proposal_buf.go, line 567 at r2 (raw file):
// lease. In that case, we have no choice but to continue with the proposal. // // Lease extensions for a currently held lease always go through.
please explain why they always go through in this comment
pkg/kv/kvserver/replica_proposal_buf.go, line 571 at r2 (raw file):
req := p.Request.Requests[0].GetRequestLease() leaderKnownAndEligible := leaderInfo.leaderKnown && leaderInfo.leaderEligibleForLease isValidExtension := req.PrevLease.Replica.StoreID == req.Lease.Replica.StoreID &&
nit: I'd strike the "valid" from isValidExtension
. What's an "invalid extension"?
pkg/kv/kvserver/replica_proposal_buf.go, line 571 at r2 (raw file):
req := p.Request.Requests[0].GetRequestLease() leaderKnownAndEligible := leaderInfo.leaderKnown && leaderInfo.leaderEligibleForLease isValidExtension := req.PrevLease.Replica.StoreID == req.Lease.Replica.StoreID &&
is the req.PrevLease.Replica.StoreID == req.Lease.Replica.StoreID
part necessary? What would go wrong if we just removed it?
RequestLeaseRequest
always asks for a lease for the sender of the respective request, not for some other rando (unless you know otherwise?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @erikgrinaker!
I think we can also address a TODO in setupLeaseTransferTest
that references this issue. But when doing so, please stress TestLeaseExpirationBelowFutureTimeRequest
a bit to make sure we aren't introducing any flakiness.
make stress PKG=./pkg/kv/kvserver TESTS= TestLeaseExpirationBelowFutureTimeRequest
Reviewed 2 of 3 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @erikgrinaker)
pkg/kv/kvserver/replica_proposal_buf.go, line 581 at r1 (raw file):
Previously, erikgrinaker (Erik Grinaker) wrote…
Sure. I figured this was a fairly benign scenario compared to the others, and might be unnecessarily noisy since it'd be output on every lease extension, but will bump it to 2.
Should we explain why we're proposing like we do in other cases? ; we hold the current lease
pkg/kv/kvserver/replica_proposal_buf.go, line 571 at r2 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
is the
req.PrevLease.Replica.StoreID == req.Lease.Replica.StoreID
part necessary? What would go wrong if we just removed it?
RequestLeaseRequest
always asks for a lease for the sender of the respective request, not for some other rando (unless you know otherwise?).
I was thinking the same thing. I don't see why we still need to look at the request at all.
pkg/kv/kvserver/replica_proposal_buf.go, line 571 at r2 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
nit: I'd strike the "valid" from
isValidExtension
. What's an "invalid extension"?
I'd also rename this to ownsCurrentLease
or something along those lines to reflect the salient point here.
pkg/kv/kvserver/replica_proposal_buf.go, line 1029 at r2 (raw file):
} func (rp *replicaProposer) ownsValidLeaseRLocked(ctx context.Context, now hlc.ClockTimestamp) bool {
nit: let's keep this below leaderStatusRLocked
so that these method impls remain ordered.
pkg/kv/kvserver/replica_proposal_buf_test.go, line 269 at r1 (raw file):
Previously, erikgrinaker (Erik Grinaker) wrote…
Will do.
Should we remove this now?
pkg/roachpb/api.go, line 281 at r1 (raw file):
Previously, erikgrinaker (Erik Grinaker) wrote…
Updated the PR, let me know what you think.
Looks great.
1beb8c6
to
1311f0c
Compare
TFTRs, I've addressed the comments. I tried removing the
I find this pretty odd, because if I call |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r3.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @andreimatei and @erikgrinaker)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but consider clarifying the commit message to say that this patch is only about expiration-based leases (if you buy what I said above).
I find this pretty odd, because if I call replica1.GetLease() it does indeed say that n2 is the leaseholder, and yet the request fails saying we're not. However, the current lease in the request above has seq=0 start=0,0 exp= -- could this be the initial lease, such that n2 somehow hasn't fully picked up the lease transfer yet?
The empty lease in the error makes me think that the replica that generated the error did not actually know the lease. Sometimes we generate NotLeaseholderErrors
without having a lease on hand; I don't remember the exact cases.
Nit: it's convention to always respond with a "done" on the review comments after you've addressed them. Otherwise I don't know if they're still pending or not.
Reviewable status: complete! 2 of 0 LGTMs obtained (waiting on @andreimatei and @erikgrinaker)
The Raft proposal buffer currently rejects lease acquisition proposals from followers if there is an eligible leader, to avoid interfering with it trying to claim the lease. However, this logic unintentionally rejected lease extension proposals from followers as well, which could cause a leaseholder to lose its lease. This patch changes the proposal buffer to allow lease extension proposals from followers for extension-based leases as long as they still hold a valid lease. Release note: None
1311f0c
to
a59fd79
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The empty lease in the error makes me think that the replica that generated the error did not actually know the lease.
Ah, of course -- the error was from the other replica. Removed the test knob and fixed TestLeaseExpirationBelowFutureTimeRequest
by waiting for both replicas to pick up on the lease transfer.
it's convention to always respond with a "done" on the review comments after you've addressed them.
Got it, will do! Still getting used to Reviewable.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @andreimatei and @erikgrinaker)
pkg/kv/kvserver/replica_proposal_buf.go, line 571 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
I was thinking the same thing. I don't see why we still need to look at the request at all.
Done.
pkg/kv/kvserver/replica_proposal_buf_test.go, line 269 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Should we remove this now?
Done.
bors r=nvanbenschoten,andreimatei,tbg |
Build failed (retrying...): |
Build failed (retrying...): |
Build succeeded: |
The Raft proposal buffer currently rejects lease acquisition proposals
from followers if there is an eligible leader, to avoid interfering with
it trying to claim the lease. However, this logic unintentionally
rejected lease extension proposals from followers as well, which could
cause a leaseholder to lose its lease.
This patch changes the proposal buffer to allow lease extension
proposals from followers for extension-based leases as long as they
still hold a valid lease.
Resolves #59179. Follow-up to #55148.
Release note: None