-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvserver: always transfer expiration-based leases #85629
kvserver: always transfer expiration-based leases #85629
Conversation
1fa26f1
to
ee3d18f
Compare
62a6a38
to
b47f808
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 5 of 6 files at r1, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif and @nvanbenschoten)
pkg/kv/kvserver/client_split_test.go
line 2751 at r1 (raw file):
// Increment the manual clock and do a write to increase the qps above zero. manualClock.Increment(int64(replicastats.MinStatsDuration)) testutils.SucceedsSoon(t, func() error {
Why is this now needed?
pkg/kv/kvserver/replica_proposal.go
line 351 at r1 (raw file):
// in serializability violations. r.mu.state.Lease = newLease requiresExpirationBasedLease := r.requiresExpiringLeaseRLocked()
It's odd that this is checking an attribute of the range instead of just checking whether the newly acquired lease is expiration-based or not. Could this be newLease.Type() == roachpb.LeaseExpiration
? Doing that would enqueue the lease to be renewed
pkg/kv/kvserver/replica_range_lease.go
line 258 at r1 (raw file):
// more efficient epoch-based ones. But by transferring an // expiration-based lease, we can limit the effect of an ill-advised // lease transfer since the in coming leaseholder needs to recognize
s/in coming/incoming/
pkg/kv/kvserver/replica_range_lease.go
line 260 at r1 (raw file):
// lease transfer since the in coming leaseholder needs to recognize // itself as such within a few seconds; if it doesn't (we accidentally // sent the lease to a replica in need of a snapshot), the lease is up
"in need of a snapshot or far behind on its log"
pkg/kv/kvserver/replica_range_lease.go
line 262 at r1 (raw file):
// sent the lease to a replica in need of a snapshot), the lease is up // for grabs. If we simply transferred epoch based leases, it's possible // for the new leaseholder in need of a snapshot to maintain its lease
s/in need of a snapshot/that is delayed in applying the lease transfer/
pkg/kv/kvserver/replica_range_lease.go
line 1083 at r1 (raw file):
) error { var leaseRenewal time.Duration if r.leaseStatusAtRLocked(ctx, now).Lease.Type() == roachpb.LeaseExpiration {
I don't think we want to call leaseStatusAtRLocked
again. We could pass the lease in from the caller, but it may be invalid.
I think we should actually keep the logic as it was. This is a sanity check that avoids an infinite lease acquisition loop if a request is so far in the future that it wouldn't be satisfiable even after the current lease was renewed. We know that after a lease transfer, a !requiresExpiringLeaseRLocked()
range will upgrade to an epoch-based lease on renewal.
pkg/kv/kvserver/replica_rangefeed_test.go
line 1287 at r1 (raw file):
// Run up the clock to upgrade the expiration based lease to an epoch based // one.
Consider mentioning that this is needed because the test intends to expire the lease by pausing node liveness.
b47f808
to
b1a0828
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)
pkg/kv/kvserver/client_split_test.go
line 2751 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Why is this now needed?
Flakey now with the following; the proposal itself observes a lease upgrade underneath it:
client_split_test.go:2753: [NotLeaseHolderError] stale proposal: command was proposed under lease #2 but is being applied under lease: repl=(n2,s2):2 seq=3 start=1660158045.473318000,1 epo=1 pro=1660158050.473837000,0; r52: replica not lease holder; current lease is repl=(n2,s2):2 seq=3 start=1660158045.473318000,1 epo=1 pro=1660158050.473837000,0
pkg/kv/kvserver/replica_proposal.go
line 351 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
It's odd that this is checking an attribute of the range instead of just checking whether the newly acquired lease is expiration-based or not. Could this be
newLease.Type() == roachpb.LeaseExpiration
? Doing that would enqueue the lease to be renewed
Wasn't sure here, figured if we didn't, we'd still renew (+ upgrade) on the first request the replica received. But I've added this tracking (and subsequent untracking once the lease does in fact change to an epoch based one). I also added a check in TestLeaseTransfersUseExpirationLeasesAndBumpToEpochBasedOnes to make sure the untracking (and consequently, tracking) occurs after a renewal attempt. PTAL.
pkg/kv/kvserver/replica_range_lease.go
line 1083 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
I don't think we want to call
leaseStatusAtRLocked
again. We could pass the lease in from the caller, but it may be invalid.I think we should actually keep the logic as it was. This is a sanity check that avoids an infinite lease acquisition loop if a request is so far in the future that it wouldn't be satisfiable even after the current lease was renewed. We know that after a lease transfer, a
!requiresExpiringLeaseRLocked()
range will upgrade to an epoch-based lease on renewal.
Reverted. I understand what this check is for so keeping it as is fine, but could you explain why we don't like like peeking into the current lease type -- i.e. why's leaseStatusAtRLocked
inappropriate here? (didn't look that expensive.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 6 files at r1, 5 of 6 files at r2, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif and @nvanbenschoten)
pkg/kv/kvserver/client_split_test.go
line 2751 at r1 (raw file):
Previously, irfansharif (irfan sharif) wrote…
Flakey now with the following; the proposal itself observes a lease upgrade underneath it:
client_split_test.go:2753: [NotLeaseHolderError] stale proposal: command was proposed under lease #2 but is being applied under lease: repl=(n2,s2):2 seq=3 start=1660158045.473318000,1 epo=1 pro=1660158050.473837000,0; r52: replica not lease holder; current lease is repl=(n2,s2):2 seq=3 start=1660158045.473318000,1 epo=1 pro=1660158050.473837000,0
Ah, I see. Because we're passing through the store's TestSender and not a real DistSender, so NotLeaseHolderErrors aren't retired.
pkg/kv/kvserver/replica_range_lease.go
line 1083 at r1 (raw file):
Previously, irfansharif (irfan sharif) wrote…
Reverted. I understand what this check is for so keeping it as is fine, but could you explain why we don't like like peeking into the current lease type -- i.e. why's
leaseStatusAtRLocked
inappropriate here? (didn't look that expensive.)
I was thinking that it would require an additional mutex acquisition and risk inconsistency with the previous call, but I see now that we already hold the RLock.
The other reason is that the lease might not even be valid at this point, so we can't trust the lease status.
pkg/kv/kvserver/store.go
line 2250 at r2 (raw file):
} s.renewableLeases.Delete(k) } else if leaseStatus.Lease.Type() == roachpb.LeaseEpoch {
Good catch!
Now I'm confused about what happens when a range in this map is destroyed and removed from a node. Is it also removed from the map? I don't see where that happens. Do we rely on the next call to redirectOnOrAcquireLease
returning an error and that causing us to hit the s.renewableLeases.Delete(k)
path above?
Maybe we don't see issues with this in practice because a leaseholder will never remove itself, and we'll usually hit a NotLeaseHolderError above when we periodically try to renew the lease before we're removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)
pkg/kv/kvserver/store.go
line 2250 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Good catch!
Now I'm confused about what happens when a range in this map is destroyed and removed from a node. Is it also removed from the map? I don't see where that happens. Do we rely on the next call to
redirectOnOrAcquireLease
returning an error and that causing us to hit thes.renewableLeases.Delete(k)
path above?Maybe we don't see issues with this in practice because a leaseholder will never remove itself, and we'll usually hit a NotLeaseHolderError above when we periodically try to renew the lease before we're removed.
This all seems very fragile (but has been this way since '18, so maybe not), what do you want to do here? With this PR specifically, we're increasing the likelihood that this map contains a replica that's destroyed from the store. So we could just not track these replicas-temporarily-using-expiration-leases in this map for now, until it's made more robust. We're relying on an incoming timely request to renew + upgrade the lease I think. What happens if no such request comes in? It's up for grabs until the next request is routed ~ somewhere?
(Gentle nudge, @nvanbenschoten) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif and @nvanbenschoten)
pkg/kv/kvserver/store.go
line 2250 at r2 (raw file):
Previously, irfansharif (irfan sharif) wrote…
This all seems very fragile (but has been this way since '18, so maybe not), what do you want to do here? With this PR specifically, we're increasing the likelihood that this map contains a replica that's destroyed from the store. So we could just not track these replicas-temporarily-using-expiration-leases in this map for now, until it's made more robust. We're relying on an incoming timely request to renew + upgrade the lease I think. What happens if no such request comes in? It's up for grabs until the next request is routed ~ somewhere?
I think we can add s.renewableLeases.Delete(rangeID)
to Store.unlinkReplicaByRangeIDLocked
and then feel better about this. We should probably do that.
However, I am now a little concerned about what happens when this map gets too large (e.g. after a flood of new leases). Could that prevent a timely extension of an important lease like node liveness?
What do you think about skipping this map and promoting the lease from expiration to epoch immediately upon its reception? For instance, we could call maybeExtendLeaseAsync
(or a new maybeExtendLeaseAsyncLocked
) directly in leasePostApplyLocked
when we have an expiration-based lease but !requiresExpiringLeaseRLocked
.
b1a0828
to
e521bc2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)
pkg/kv/kvserver/store.go
line 2250 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
I think we can add
s.renewableLeases.Delete(rangeID)
toStore.unlinkReplicaByRangeIDLocked
and then feel better about this. We should probably do that.However, I am now a little concerned about what happens when this map gets too large (e.g. after a flood of new leases). Could that prevent a timely extension of an important lease like node liveness?
What do you think about skipping this map and promoting the lease from expiration to epoch immediately upon its reception? For instance, we could call
maybeExtendLeaseAsync
(or a newmaybeExtendLeaseAsyncLocked
) directly inleasePostApplyLocked
when we have an expiration-based lease but!requiresExpiringLeaseRLocked
.
That makes more sense, I'd prefer mucking with this map for all the reasons listed above. Upgrading the lease immediately after applying it instead of waiting for the first request to the new leaseholder also makes sense, so I've done that now -- PTAL.
I'll send the renewableLeases map cleanup in a follow up PR. It's trivial but I'm being paranoid about getting just this one change in to start. That map is at most a few keys today since only ranges including and before liveness are tracked in there, so not an immediate concern.
e521bc2
to
5dd866d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 8 of 8 files at r3, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif)
pkg/kv/kvserver/replica_proposal.go
line 361 at r3 (raw file):
// before it expires. if leaseChangingHands && iAmTheLeaseHolder && expirationBasedLease && r.ownsValidLeaseRLocked(ctx, now) { r.store.renewableLeases.Store(int64(r.RangeID), unsafe.Pointer(r))
Did we want to get rid of this?
pkg/kv/kvserver/replica_proposal.go
line 368 at r3 (raw file):
} if hasExpirationBasedLease && !requiresExpirationBasedLease {
Should this also have an && r.ownsValidLeaseRLocked(ctx, now) {
condition, to guard against the case where the lease is applied after it has expired?
Also, does it need an iAmTheLeaseHolder
condition? The other two branches have that in addition to ownsValidLeaseRLocked
, though that feels redundant.
5dd866d
to
463258d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)
pkg/kv/kvserver/replica_proposal.go
line 361 at r3 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Did we want to get rid of this?
Gah, no. Reverted. Sorry for being sloppy.
pkg/kv/kvserver/replica_proposal.go
line 368 at r3 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Should this also have an
&& r.ownsValidLeaseRLocked(ctx, now) {
condition, to guard against the case where the lease is applied after it has expired?Also, does it need an
iAmTheLeaseHolder
condition? The other two branches have that in addition toownsValidLeaseRLocked
, though that feels redundant.
Added the expiration check + a comment. The iAmTheLeaseHolder condition felt redundant for that same reason. I'll keep the conditional as is but added a test-only assertion to see if it in fact is redundant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r4, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @irfansharif)
pkg/kv/kvserver/replica_proposal.go
line 371 at r4 (raw file):
} if hasExpirationBasedLease && !requiresExpirationBasedLease && r.ownsValidLeaseRLocked(ctx, now) {
nit: depending on whether you want another pass through CI, it might be easier to understand this structure if it was written as:
if leaseChangingHands && iAmTheLeaseHolder && hasExpirationBasedLease && r.ownsValidLeaseRLocked(ctx, now) {
if requiresExpirationBasedLease {
r.store.renewableLeases.Store(int64(r.RangeID), unsafe.Pointer(r))
...
} else {
...
r.maybeExtendLeaseAsyncLocked(ctx, st)
}
}
463258d
to
fa1b387
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR! Will bors on green.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten)
pkg/kv/kvserver/replica_proposal.go
line 371 at r4 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
nit: depending on whether you want another pass through CI, it might be easier to understand this structure if it was written as:
if leaseChangingHands && iAmTheLeaseHolder && hasExpirationBasedLease && r.ownsValidLeaseRLocked(ctx, now) { if requiresExpirationBasedLease { r.store.renewableLeases.Store(int64(r.RangeID), unsafe.Pointer(r)) ... } else { ... r.maybeExtendLeaseAsyncLocked(ctx, st) } }
Done. Skewed with a test Erik added in #84865, it needed to be updated to similarly run up a clock to upgrade a transferred expiration lease + increment liveness epoch's in a more robust way.
Fixes cockroachdb#81764. In addition to ranges that unconditionally require expiration-based leases (node liveness and earlier), we also use them during lease transfers for all other ranges. After acquiring such expiration-based leases, the leaseholders are expected to soon upgrade them to the more efficient epoch-based ones. By transferring an expiration-based lease, we can limit the effect of an ill-advised lease transfer since the in coming leaseholder needs to recognize itself as such within a few seconds; if it doesn't (we accidentally sent the lease to a replica in need of a snapshot), the lease is up for grabs. If we simply transferred epoch based leases, it would be possible for the new leaseholder in need of a snapshot to maintain its lease if the node it was on is able to heartbeat its liveness record. Release note: None Release justification: low risk, high benefit change
bors r+ |
Build succeeded: |
Accidentally included as part of cockroachdb#85629. Release note: None Release justification: logging fix
Randomly saw this. It was accidentally introduced in cockroachdb#85629. Release justification: low-risk logging fix Release note: None
86644: kvserver: remove accidentally committed debug logging r=erikgrinaker a=tbg Randomly saw this. It was accidentally introduced in #85629. Release justification: low-risk logging fix Release note: None Co-authored-by: Tobias Grieger <[email protected]>
In cockroachdb#85629 we changed our lease transfer protocol to only ever transfer expiration-based leases, and have recipients later upgrade them to the more efficient epoch based ones. This was done to limit the effects of ill-advised lease transfers since the incoming leaseholder would need to recognize itself as such within a few seconds -- so we wanted this upgrade happen after having received the lease. In cockroachdb#83261 however we noticed that the upgrade was not immediate -- we were waiting until the current lease's expiration was within its renewal duration -- 4.5s. When the lease was eventually renewed the upgrade did happen, but it was not immediate. We fix this here and remove the manual clock advancing the supporting test had that masked this issue. It now demonstrates that we're no longer relying on upgrades happen as part of the (slow) renewal process. Release note: None
In cockroachdb#85629 we changed our lease transfer protocol to only ever transfer expiration-based leases, and have recipients later upgrade them to the more efficient epoch based ones. This was done to limit the effects of ill-advised lease transfers since the incoming leaseholder would need to recognize itself as such within a few seconds -- so we wanted this upgrade happen after having received the lease. In cockroachdb#83261 however we noticed that the upgrade was not immediate -- we were waiting until the current lease's expiration was within its renewal duration -- 4.5s. When the lease was eventually renewed the upgrade did happen, but it was not immediate. We fix this here and remove the manual clock advancing the supporting test had that masked this issue. It now demonstrates that we're no longer relying on upgrades happen as part of the (slow) renewal process. Release note: None
85859: pgwire: Add support for cursors with special characters r=knz,rafiss a=rimadeodhar IIn CockroachDB and Postgres, it is possible to declare cursors with special characters enclosed within double quotes, for e.g. "1-2-3". Currently, we store the name as an unescaped string which causes problems during the pgwire DESCRIBE step for looking up the cursor. We should be storing using the tree.Name datatype for the cursor name while storing and looking up cursors. This PR updates the code to start using tree.Name instead of raw strings for handling cursor names. This fixes the issue where the pgwire DESCRIBE step fails while attempting to look up cursors with names containing special characters. Resolves #84261 Release note (bug fix): The pgwire DESCRIBE step no longer fails with an error while attempting to look up cursors declared with names containing special characters. 86492: bulk: perform meta lookup on range cache miss during index backfill r=dt a=nvanbenschoten Fixes #84290. This commit addresses the sustained slowdown described in #84290 by replacing the call in `SSTBatcher.flushIfNeeded` to `RangeCache.GetCached` with a call to `RangeCache.Lookup`. The former method checks the cache but returns no range descriptor if the cache currently has no overlapping key. This is possible if the descriptor was recently evicted because it was stale. The later method performs a meta lookup if the cache currently has no overlapping key, so it should always return _some_ range descriptor. There's a question of whether we should be logging a warning but proceeding if this meta lookup fails. For now, I've decided not to change that behavior. Release justification: None. Don't merge yet. 87885: kv: remove broken attempt to reject lease acquisitions on draining nodes r=nvanbenschoten a=nvanbenschoten Related to #83261. This commit removes "protection" that avoided lease acquisitions on draining nodes. This protection had already been effectively disabled by acc1ad1, which allowed Raft leaders to bypass the check. As the comment here (added in 5ffaa9e) explained, Raft followers are already unable to acquire the lease. If leaders bypass the check and follower (and candidates) don't need it, the check is useless, so we remove it. The commit also removes `TestReplicaDrainLease`, which was supposed to test this behavior. We remove the test not because it started failing after the change, but because it did not. It must not have been testing anything real anymore after acc1ad1. Release justification: low risk change related to release blocker. Release note: None 88205: kvserver: (partially) deflake transfer-leases/drain-other-node r=irfansharif a=irfansharif In #85629 we changed our lease transfer protocol to only ever transfer expiration-based leases, and have recipients later upgrade them to the more efficient epoch based ones. This was done to limit the effects of ill-advised lease transfers since the incoming leaseholder would need to recognize itself as such within a few seconds -- so we wanted this upgrade happen after having received the lease. In #83261 however we noticed that the upgrade was not immediate -- we were waiting until the current lease's expiration was within its renewal duration -- 4.5s. When the lease was eventually renewed the upgrade did happen, but it was not immediate. We fix this here and remove the manual clock advancing the supporting test had that masked this issue. It now demonstrates that we're no longer relying on upgrades happen as part of the (slow) renewal process. Release note: None Co-authored-by: rimadeodhar <[email protected]> Co-authored-by: Nathan VanBenschoten <[email protected]> Co-authored-by: irfan sharif <[email protected]>
In #85629 we changed our lease transfer protocol to only ever transfer expiration-based leases, and have recipients later upgrade them to the more efficient epoch based ones. This was done to limit the effects of ill-advised lease transfers since the incoming leaseholder would need to recognize itself as such within a few seconds -- so we wanted this upgrade happen after having received the lease. In #83261 however we noticed that the upgrade was not immediate -- we were waiting until the current lease's expiration was within its renewal duration -- 4.5s. When the lease was eventually renewed the upgrade did happen, but it was not immediate. We fix this here and remove the manual clock advancing the supporting test had that masked this issue. It now demonstrates that we're no longer relying on upgrades happen as part of the (slow) renewal process. Release note: None
Starting a lease as an epoch lease is risky. The problem is the lease may be marked as transferred to the new node, but the node can take time to realize it is the leaseholder. THis was partially addressed as part of cockroachdb#85629, however there were still cases where this can occur. Specifically in the cases of non-cooperative lease acquisition. While we don't have an example of this causing a problem because the range is likely idle at this point, it is certainly possible to make a contrived case where this happens. Epic: none Release note: None
Fixes #81764.
In addition to ranges that unconditionally require expiration-based
leases (node liveness and earlier), we now also use them during lease
transfers for all other ranges. After acquiring such expiration-based
leases, the leaseholders are expected to soon upgrade them to the more
efficient epoch-based ones. By transferring an expiration-based lease,
we can limit the effect of an ill-advised lease transfer since the incoming
leaseholder needs to recognize itself as such within a few
seconds; if it doesn't (we accidentally sent the lease to a replica in
need of a snapshot), the lease is up for grabs. If we simply transferred
epoch based leases, it would be possible for the new leaseholder in need
of a snapshot to maintain its lease if the node it was on is able to
heartbeat its liveness record.
Release note: None.
Release justification: stability work