-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvserver: don't assign closed timestamps to lease requests #65624
kvserver: don't assign closed timestamps to lease requests #65624
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r1, 1 of 1 files at r2, 4 of 4 files at r3, 11 of 11 files at r4.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15 and @andreimatei)
pkg/kv/kvserver/closed_timestamp_test.go, line 607 at r4 (raw file):
newLease, err := tc.MoveRangeLeaseNonCooperatively(rhsDesc, target, clock) require.NoError(t, err) log.Infof(ctx, "test: new RHS lease: %s", newLease)
Didn't we also add a log line right at the end of MoveRangeLeaseNonCooperatively
?
pkg/kv/kvserver/closed_timestamp_test.go, line 732 at r4 (raw file):
maxClosed, ok := r.MaxClosed(ctx) require.True(t, ok) require.True(t, maxClosed.LessEq(freezeStartTimestamp),
This will not be true for global_read ranges, and that's ok because we ship the closed timestamp in the subsume response. Consider at least adding a comment about this so that it doesn't look like this is a more general invariant.
pkg/kv/kvserver/closed_timestamp_test.go, line 1011 at r4 (raw file):
} func getTargetStoreOrFatal(
Aren't there methods on TestCluster
for this?
pkg/kv/kvserver/replica_follower_read.go, line 169 at r2 (raw file):
maxClosed.Forward(initialMaxClosed) // If the range has not upgraded to the new closed timestamp system,
Do you plan to backport the removal of this condition?
pkg/kv/kvserver/replica_proposal_buf.go, line 702 at r2 (raw file):
// // This shouldn't be called for reproposals; we don't want to update the closed // timestamp they carry.
Want to say why?
pkg/kv/kvserver/replica_proposal_buf.go, line 856 at r3 (raw file):
// OnLeaseChangeLocked is called when a new lease is applied to this range. // closedTimestamp is the range's closed timestamp after the new lease was
closedTS
pkg/kv/kvserver/replica_range_lease.go, line 1066 at r4 (raw file):
} // AcquireLease is redirectOnOrAcquireLease exposed for tests.
TestingAcquireLease
?
pkg/kv/kvserver/testing_knobs.go, line 364 at r4 (raw file):
func (NodeLivenessTestingKnobs) ModuleTestingKnobs() {} // PinnedLeasesKnob is a testing know for controlling what store can acquire a
How does this impact lease transfers? We still allow the pinned leaseholder to transfer the lease away, right?
pkg/roachpb/batch.go, line 600 at r4 (raw file):
if et, ok := req.(*EndTxnRequest); ok { h := req.Header() split := et.InternalCommitTrigger.SplitTrigger != nil
Can we use the (InternalCommitTrigger).Kind
method we just added instead of this custom logic?
Also, isn't this a pointer? Why is this not causing a NPE? Do we never call this method in tests?
pkg/roachpb/batch.go, line 616 at r4 (raw file):
} else { h := req.Header() if req.Method() == PushTxn {
nit: this code structure is strange. We type switch, then switch on req.Method, then type assert. Is there a better way? Maybe just a single type switch?
pkg/testutils/serverutils/test_cluster_shim.go, line 158 at r4 (raw file):
// ownerless lease. Most tests should use the cooperative version of this // method, TransferRangeLease. MoveRangeLeaseNonCooperatively(
Mention what the return value is in the comment.
pkg/testutils/testcluster/testcluster.go, line 929 at r4 (raw file):
} if prevLease.Replica.StoreID == dest.StoreID { return nil
Do we want to return a nil lease if the destination store is the current leaseholder? Probably not, right?
5e7d4e6
to
7dc9485
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15 and @nvanbenschoten)
pkg/kv/kvserver/closed_timestamp_test.go, line 607 at r4 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Didn't we also add a log line right at the end of
MoveRangeLeaseNonCooperatively
?
ok, removed
pkg/kv/kvserver/closed_timestamp_test.go, line 732 at r4 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
This will not be true for global_read ranges, and that's ok because we ship the closed timestamp in the subsume response. Consider at least adding a comment about this so that it doesn't look like this is a more general invariant.
done
pkg/kv/kvserver/closed_timestamp_test.go, line 1011 at r4 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Aren't there methods on
TestCluster
for this?
not that I can find
pkg/kv/kvserver/replica_follower_read.go, line 169 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Do you plan to backport the removal of this condition?
@aayushshah15 asked me the same and he seemed motivated to do it :)
pkg/kv/kvserver/replica_proposal_buf.go, line 702 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Want to say why?
done
pkg/kv/kvserver/replica_proposal_buf.go, line 856 at r3 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
closedTS
done
pkg/kv/kvserver/replica_range_lease.go, line 1066 at r4 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
TestingAcquireLease
?
done
pkg/kv/kvserver/testing_knobs.go, line 364 at r4 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
How does this impact lease transfers? We still allow the pinned leaseholder to transfer the lease away, right?
yes. Added a comment.
pkg/roachpb/batch.go, line 600 at r4 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Can we use the
(InternalCommitTrigger).Kind
method we just added instead of this custom logic?Also, isn't this a pointer? Why is this not causing a NPE? Do we never call this method in tests?
fixed
pkg/roachpb/batch.go, line 616 at r4 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
nit: this code structure is strange. We type switch, then switch on req.Method, then type assert. Is there a better way? Maybe just a single type switch?
I'll leave it for another patch
pkg/testutils/serverutils/test_cluster_shim.go, line 158 at r4 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Mention what the return value is in the comment.
done
pkg/testutils/testcluster/testcluster.go, line 929 at r4 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Do we want to return a nil lease if the destination store is the current leaseholder? Probably not, right?
changed it to return the current lease and documented what happens if the lease start off on the destination node (i.e. this method is a no-op).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 4 of 21 files at r5, 7 of 8 files at r6, 3 of 4 files at r8, 18 of 18 files at r9.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15 and @andreimatei)
pkg/kv/kvserver/closed_timestamp_test.go, line 1011 at r4 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
not that I can find
Ah, I was thinking about FindMemberServer
and findMemberStore
.
pkg/kv/kvserver/replica_tscache.go, line 481 at r9 (raw file):
// func (r *Replica) CanCreateTxnRecord( ctx context.Context, txnID uuid.UUID, txnKey []byte, txnMinTS hlc.Timestamp,
We're not actually using this. You still want to plumb it?
pkg/kv/kvserver/replica_tscache.go, line 608 at r9 (raw file):
s.Local.LowWater, ) //tc.SetLowWater(
!!!
7dc9485
to
965bc61
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15 and @nvanbenschoten)
pkg/kv/kvserver/replica_tscache.go, line 481 at r9 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
We're not actually using this. You still want to plumb it?
I do... My policy is that if I've needed to plumb it to get tracing working for a manual test, I'll merge the plumbing.
pkg/kv/kvserver/replica_tscache.go, line 608 at r9 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
!!!
fixed by rebasing on #65741
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 31 files at r10, 7 of 8 files at r11, 3 of 4 files at r13, 18 of 18 files at r14.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @aayushshah15 and @andreimatei)
pkg/roachpb/metadata.go, line 205 at r14 (raw file):
RangeDEscriptor
RangeDescriptor
Release note: None
Release note: None
Before this patch, RequestLeaseRequests was carrying the lease start time as a closed timestamp. This matched the behavior we used to have in 20.2 where the lease start time was considered to be closed by fiat. On the surface, closing a lease's start time sounds reasonable since, if the proposing replica gets the lease, it will not evaluate writes at lower timestamps. Unfortunately, there's a problem: while it's true that the replica, and the range in general, will not permit writes at timestamps below this lease's start time, it might happen that the range is in the process of merging with its left neighbor. If this range has already been Subsumed as the RHS of a merge then, after merge, the joint range will allow writes to the former RHS's key space at timestamps above the RHS's freeze start (which is below the start time the lease in question). Thus, if the lease were to close its start timestamp while subsumed, then it'd be possible for follower reads to be served before the merge finalizes at timestamps that would become un-closed after the merge. This patch fixes the bug by having lease requests no carry closed timestamps any more. Since the hazard involves subsumed ranges, we could make a distinction between brand new leases for subsumed ranges versus other brand new leases, and let the former category close the lease start time. But, for simplicity, we don't close timestamps on any lease requests. Release note: None
Fixes cockroachdb#59448 This patch moves the test away from using the "old closed timestamp mechanism" to using the new one. The test becomes simpler in the process, and does not appear to be flaky any more (the test was skipped). Release note: None
965bc61
to
1ac67a1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @aayushshah15 and @nvanbenschoten)
pkg/roachpb/metadata.go, line 205 at r14 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
RangeDEscriptor
RangeDescriptor
done
bors r+ |
Build succeeded: |
Before this patch, RequestLeaseRequests was carrying the lease start
time as a closed timestamp. This matched the behavior we used to have
in 20.2 where the lease start time was considered to be closed by fiat.
On the surface, closing a lease's start time sounds reasonable since, if
the proposing replica gets the lease, it will not evaluate writes at
lower timestamps. Unfortunately, there's a problem: while it's true that
the replica, and the range in general, will not permit writes at
timestamps below this lease's start time, it might happen that the range
is in the process of merging with its left neighbor. If this range has
already been Subsumed as the RHS of a merge then, after merge, the joint
range will allow writes to the former RHS's key space at timestamps
above the RHS's freeze start (which is below the start time the lease in
question). Thus, if the lease were to close its start timestamp while
subsumed, then it'd be possible for follower reads to be served before
the merge finalizes at timestamps that would become un-closed after the
merge.
This patch fixes the bug by having lease requests no carry closed
timestamps any more. Since the hazard involves subsumed ranges, we could
make a distinction between brand new leases for subsumed ranges versus
other brand new leases, and let the former category close the lease
start time. But, for simplicity, we don't close timestamps on any lease
requests.
Release note: None