Skip to content

Commit

Permalink
kvserver: check PROSCRIBED lease status over UNUSABLE
Browse files Browse the repository at this point in the history
The PROSCRIBED lease status, just like EXPIRED, puts a lease to a definitely
invalid state. The UNUSABLE state (when request timestamp is in stasis period)
is less of a clear cut: we still own the lease but callers may use or not use
it depending on context.

For example, the closed timestamp side-transport ignores the UNUSABLE state
(because we still own the lease), and takes it as usable for its purposes.
Because of the order in which the checks were made, this has lead to a bug: a
PROSCRIBED lease is reported as UNUSABLE during stasis periods, the closed
timestamp side-transport then considers it usable, and updates closed
timestamps when it shouldn't.

This commit fixes the bug by swapping the order of checks in the leaseStatus
method. The order now goes from "hard" checks like EXPIRED and PROSCRIBED, to
"softer" UNUSABLE, and (when the softness is put to the limit) VALID.

Fixes #98698
Fixes #99931
Fixes #100101
Epic: none

Release note (bug fix): a bug is fixed in closed timestamp updates within its
side-transport. Previously, during asymmetric partitions, a node that transfers
a lease away, and misses a liveness heartbeat, could then erroneously update
the closed timestamp during the stasis period of its liveness. This could lead
to closed timestamp invariant violation, and node crashes; in extreme cases
this could lead to inconsistencies in read-only queries.
  • Loading branch information
pav-kv committed Apr 27, 2023
1 parent 37513c1 commit 578a603
Showing 1 changed file with 21 additions and 16 deletions.
37 changes: 21 additions & 16 deletions pkg/kv/kvserver/replica_range_lease.go
Original file line number Diff line number Diff line change
Expand Up @@ -623,21 +623,21 @@ func (p *pendingLeaseRequest) newResolvedHandle(pErr *roachpb.Error) *leaseReque
// EXPIRED status is that it reflects that a new lease with a start timestamp
// greater than or equal to now can be acquired non-cooperatively.
//
// If the lease is not EXPIRED, the request timestamp is taken into account. The
// expiration timestamp is adjusted for clock offset; if the request timestamp
// falls into the so-called "stasis period" at the end of the lifetime of the
// lease, or if the request timestamp is beyond the end of the lifetime of the
// lease, the status is UNUSABLE. Callers typically want to react to an UNUSABLE
// lease status by extending the lease, if they are in a position to do so.
// If the lease is not EXPIRED, the lease's start timestamp is checked against
// the minProposedTimestamp. This timestamp indicates the oldest timestamp that
// a lease can have as its start time and still be used by the node. It is set
// both in cooperative lease transfers and to prevent reuse of leases across
// node restarts (which would result in latching violations). Leases with start
// times preceding this timestamp are assigned a status of PROSCRIBED and can
// not be used. Instead, a new lease should be acquired by callers.
//
// For request timestamps falling before the lease's stasis period, the lease's
// start timestamp is checked against the minProposedTimestamp. This timestamp
// indicates the oldest timestamp that a lease can have as its start time and
// still be used by the node. It is set both in cooperative lease transfers and
// to prevent reuse of leases across node restarts (which would result in
// latching violations). Leases with start times preceding this timestamp are
// assigned a status of PROSCRIBED and can not be not be used. Instead, a new
// lease should be acquired by callers.
// If the lease is not EXPIRED or PROSCRIBED, the request timestamp is taken
// into account. The expiration timestamp is adjusted for clock offset; if the
// request timestamp falls into the so-called "stasis period" at the end of the
// lifetime of the lease, or if the request timestamp is beyond the end of the
// lifetime of the lease, the status is UNUSABLE. Callers typically want to
// react to an UNUSABLE lease status by extending the lease, if they are in a
// position to do so.
//
// Finally, for requests timestamps falling before the stasis period of a lease
// that is not EXPIRED and also not PROSCRIBED, the status is VALID.
Expand Down Expand Up @@ -721,14 +721,19 @@ func (r *Replica) leaseStatus(
maxOffset := r.store.Clock().MaxOffset()
stasis := expiration.Add(-int64(maxOffset), 0)
ownedLocally := lease.OwnedBy(r.store.StoreID())
// NB: the order of these checks is important, and goes from stronger to
// weaker reasons why the lease may be considered invalid. For example,
// EXPIRED or PROSCRIBED must take precedence over UNUSABLE, because some
// callers consider UNUSABLE as valid. For an example issue that this ordering
// may cause, see https://github.com/cockroachdb/cockroach/issues/100101.
if expiration.LessEq(now.ToTimestamp()) {
status.State = kvserverpb.LeaseState_EXPIRED
} else if stasis.LessEq(reqTS) {
status.State = kvserverpb.LeaseState_UNUSABLE
} else if ownedLocally && lease.ProposedTS != nil && lease.ProposedTS.Less(minProposedTS) {
// If the replica owns the lease, additional verify that the lease's
// proposed timestamp is not earlier than the min proposed timestamp.
status.State = kvserverpb.LeaseState_PROSCRIBED
} else if stasis.LessEq(reqTS) {
status.State = kvserverpb.LeaseState_UNUSABLE
} else {
status.State = kvserverpb.LeaseState_VALID
}
Expand Down

0 comments on commit 578a603

Please sign in to comment.