Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kv: acquire clock reading for lease request under lock #135042

Merged

Conversation

nvanbenschoten
Copy link
Member

Fixes #134171.

This commit adds validation that lease construction that the min lease proposed timestamp is never in advance of the current time.

The commit then fixes a case where this could occur by acquiring a clock reading under the replica mutex in redirectOnOrAcquireLeaseForRequest.

While I was never able to reproduce the issue in TestRequestsOnLaggingReplica, I did independently and reliably hit the same issue when adding lease type changes to kvnemesis (#125260, PR to follow soon) and confirmed that this change fixes the issue.

Epic: None

This commit adds a pair of assertions near evaluation-time lease
manipulation which demonstrate why the changes to the new lease are
safe.

Epic: None
Release note: None
Informs cockroachdb#134171.

This commit updates the handling of prevLeaseManipulation during lease
construction to not revoke the previous lease if it has already expired
when switching lease types. This was causing unnecessary bumps to the
MinLeaseProposedTS in tests like TestRequestsOnLaggingReplica (cockroachdb#134171).

Release note: None
Fixes cockroachdb#134171.

This commit adds validation that lease construction that the min
lease proposed timestamp is never in advance of the current time.

The commit then fixes a case where this could occur by acquiring
a clock reading under the replica mutex in redirectOnOrAcquireLeaseForRequest.

While I was never able to reproduce the issue in TestRequestsOnLaggingReplica,
I did independently and reliably hit the same issue when adding lease
type changes to kvnemesis (cockroachdb#125260, PR to follow soon) and confirmed
that this change fixes the issue.

Epic: None
@nvanbenschoten nvanbenschoten added the backport-24.3.x Flags PRs that need to be backported to 24.3 label Nov 12, 2024
@nvanbenschoten nvanbenschoten requested a review from a team as a code owner November 12, 2024 23:51
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Collaborator

@arulajmani arulajmani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 3 of 3 files at r1, 2 of 2 files at r2, 3 of 3 files at r3, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten and @tbg)


pkg/kv/kvserver/leases/build.go line 607 at r2 (raw file):

		prevType := i.PrevLease.Type()
		indirectExp := prevType != roachpb.LeaseExpiration
		switchingType := prevType != nextLease.Type()

Were you able to figure out why switchingType was true? Were we acquiring an expiration based lease for some reason?

Copy link
Member Author

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @arulajmani and @tbg)


pkg/kv/kvserver/leases/build.go line 607 at r2 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

Were you able to figure out why switchingType was true? Were we acquiring an expiration based lease for some reason?

What I think I saw happen was that a leader lease expired when a raft leadership term ended. The replica then attempted to acquire the lease again, and hit this path because it was a follower:

if i.RaftStatus.RaftState != raftpb.StateLeader || i.RaftStatus.LeadTransferee != raft.None {

So once the leader lease expired, it tried to acquire an expiration based lease. I think we will allow this until we get kv.lease.reject_on_leader_unknown.enabled defaulted to true (#118435).

Copy link
Collaborator

@arulajmani arulajmani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten and @tbg)


pkg/kv/kvserver/leases/build.go line 607 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

What I think I saw happen was that a leader lease expired when a raft leadership term ended. The replica then attempted to acquire the lease again, and hit this path because it was a follower:

if i.RaftStatus.RaftState != raftpb.StateLeader || i.RaftStatus.LeadTransferee != raft.None {

So once the leader lease expired, it tried to acquire an expiration based lease. I think we will allow this until we get kv.lease.reject_on_leader_unknown.enabled defaulted to true (#118435).

Nice!

pkg/kv/kvserver/replica_range_lease.go Show resolved Hide resolved
@nvanbenschoten
Copy link
Member Author

TFTR!

bors r+

@craig craig bot merged commit 1759878 into cockroachdb:master Nov 13, 2024
23 checks passed
@nvanbenschoten nvanbenschoten deleted the nvanbenschoten/leaseNowInPast branch November 14, 2024 18:14
@cockroachdb cockroachdb deleted a comment from blathers-crl bot Nov 14, 2024
@nvanbenschoten
Copy link
Member Author

blathers backport 24.3.0-rc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-24.3.x Flags PRs that need to be backported to 24.3
Projects
None yet
Development

Successfully merging this pull request may close these issues.

kv/kvserver: TestRequestsOnLaggingReplica failed
4 participants