-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvserver: modernize AdminRelocateRange #75676
kvserver: modernize AdminRelocateRange #75676
Conversation
6400a7c
to
ac02157
Compare
@shralex and @nvanbenschoten: This patch does what I've individually spoken to both of you about. #74077 depends on this since without it, |
eb783a4
to
66e981e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 9 of 11 files at r1, 2 of 2 files at r2, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @aayushshah15)
66e981e
to
6525423
Compare
@nvanbenschoten, do you mind giving this another quick look? The previous revision tried to redundantly transfer the lease to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this new change. My understanding of relocateOne
is that it should only return a lease target if either len(ops) == 0
or we are removing the current leaseholder. So I don't see how this code change makes a difference. Are you referring to code redundancy or operational redundancy?
I think this code would be easier to write/read if we either pulled the len(ops) == 0
handling entirely out of relocateOne
, or pushed the transferLeaseToFirstVoter
handling entirely into relocateOne
.
Reviewed all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten)
6525423
to
0ff129c
Compare
Yep that's correct. The previous revision was attempting a lease transfer at the end (if
Done. Much better. |
Previously, `AdminRelocateRange`'s contract was to transfer the lease away to the first voting replica in the input slice of targets. However, this was mostly a result of the fact that leaseholder replicas weren't allowed to remove themselves. Thus, the caller had to specifically tell `AdminRelocateRange` where it wanted the lease to be, and the `AdminRelocateRange` logic would ensure to transfer the lease away to this replica either before it removed the leaseholder or after it was done performing all the replica relocations. There are two internal callers of `AdminRelocateRange`, and this property of the `AdminRelocateRange` API is unfortunate for both of them. 1. The merge queue takes extra effort to avoid a lease transfer when calling into `AdminRelocateRange`. 2. The store rebalancer performs a lease transfer based on very naive logic that is not optimal. With cockroachdb#74077, leaseholder replicas will be able to remove themselves by transferring the lease away to a suitable replica after they enter the joint configuration of the replication change. This patch makes it such that, going forward, `AdminRelocateRange` will only perform an explicit lease transfer if the caller specifically asks for it. The patch also contains a migration path for this functionality. Once cockroachdb#74077 lands, a future patch will change the store rebalancer and the merge queue behavior to leverage this new functionality. Release note: None
0ff129c
to
949658b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r4, 1 of 1 files at r5, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @aayushshah15)
TFTR bors r+ |
Build succeeded: |
…elocateRange` This commit makes it such that the `mergeQueue` no longer has to take special care to avoid a lease transfer when calling into `AdminRelocateRange`. This was enabled by cockroachdb#75676. Release note: None
…elocateRange` This commit makes it such that the `mergeQueue` no longer has to take special care to avoid a lease transfer when calling into `AdminRelocateRange`. This was enabled by cockroachdb#75676. The preceding logic can be cleaned up in the 22.2 cycle. Release note: None
…ateRange` This commit makes it such that the `mergeQueue` no longer has to take special care to avoid a lease transfer when calling into `AdminRelocateRange`. This was enabled by cockroachdb#75676. The preceding logic can be cleaned up in the 22.2 cycle. Release note: None
…76205 75541: rfc: kv lock observability via `crdb_locks` r=AlexTalks a=AlexTalks _Link to [RFC text](https://github.com/AlexTalks/cockroach/blob/rfc_crdb_locks/docs/RFCS/20220104_crdb_locks.md)._ This RFC proposes a feature for observing locks (and contention on locks) via a new virtual table called `crdb_internal.crdb_locks`. Release note: None 75738: roachtest/follower_reads: unskip follower-reads/mixed-version/single-region r=andreimatei a=andreimatei It should have been unskipped with the rest of the follower reads tests when #69817 was closed. Release note: None 75898: sql: consider virtual PK columns as stored in the optimizer r=mgartner a=mgartner Allowing virtual primary key columns created a contradiction of two hard-coded assumptions of the optimizer: 1. Primary key columns are written to the primary index (and all secondary indexes). 2. Virtual columns are not written to the primary index. To prevent this contradiction from causing bugs while planning, the opt catalog presents columns that are marked as virtual PK columns in the descriptor as stored columns to the optimizer. This is valid because virtual computed PK columns behave exactly like stored computed PK columns: they are written to all indexes. The test catalog has been updated to mimic this behavior. Fixes #75147 Fixes #75874 Release note (bug fix): A bug has been fixed that caused internal errors when querying tables with virtual columns in the primary key. This bug was only present since version 22.1.0-alpha.1 and does not appear in any production releases. 75912: util/tracing: don't return recordings for non-recording spans r=andreimatei a=andreimatei This patch makes sp.GetRecording() return nil if the span was not recording. This is already what the documentation on GetRecording() was implying, but the actual behavior was different: a non-recording span (that's not a no-op span) would return a non-nil recording consisting only of the the span's metadata. This behavior is not necessarily unreasonable, but it turns out it's a bit dangerous from a performance perspective - there's a cost (e.g. some allocations) for generating this recording, and it was too easy for unsuspecting callers to pay that when they didn't want to. In particular, DistSQL was indiscriminently calling GetRecording() (through its GetTraceData* utilities), and paying this measurable cost unintentionally. Eliminating this cost is particularly important when the Tracer is configured in the "active spans registry" tracing mode; when not in that mode, all the spans would be no-ops and the GetRecording calls were shortcircuited anyway. This is seen in BenchmarkTracing//trace=on, particularly on "3node/scan" which was the benchmark with the bigest difference between tracing on and off. name old time/op new time/op delta Tracing/1node/scan/trace=on-32 214µs ± 4% 206µs ± 5% ~ (p=0.222 n=5+5) Tracing/1node/insert/trace=on-32 338µs ± 5% 334µs ± 4% ~ (p=0.690 n=5+5) Tracing/3node/scan/trace=on-32 515µs ± 3% 484µs ± 3% -6.11% (p=0.008 n=5+5) Tracing/3node/insert/trace=on-32 635µs ± 1% 627µs ± 1% ~ (p=0.095 n=5+5) name old alloc/op new alloc/op delta Tracing/1node/scan/trace=on-32 27.2kB ± 2% 25.6kB ± 2% -5.94% (p=0.008 n=5+5) Tracing/1node/insert/trace=on-32 45.2kB ± 2% 45.5kB ± 2% ~ (p=0.690 n=5+5) Tracing/3node/scan/trace=on-32 83.2kB ± 3% 79.7kB ± 5% ~ (p=0.095 n=5+5) Tracing/3node/insert/trace=on-32 102kB ± 2% 103kB ± 1% ~ (p=0.548 n=5+5) name old allocs/op new allocs/op delta Tracing/1node/scan/trace=on-32 242 ± 1% 229 ± 0% -5.37% (p=0.000 n=5+4) Tracing/1node/insert/trace=on-32 338 ± 1% 339 ± 1% ~ (p=0.563 n=5+5) Tracing/3node/scan/trace=on-32 819 ± 1% 788 ± 6% ~ (p=0.135 n=5+5) Tracing/3node/insert/trace=on-32 871 ± 0% 880 ± 0% +0.99% (p=0.016 n=4+5) Release note: None 75981: kvserver: clean-up the interaction between the mergeQueue and `AdminRelocateRange` r=aayushshah15 a=aayushshah15 This commit makes it such that the `mergeQueue` no longer has to take special care to avoid a lease transfer when calling into `AdminRelocateRange`. This was enabled by #75676. Release note: None 75999: kvserver: stop reconciling lease count imbalances in the `StoreRebalancer` r=aayushshah15 a=aayushshah15 Generally, the `StoreRebalancer` doesn't try to attempt lease transfers for replicas that account for less than 0.1% of the store's total QPS. However, it allows such lease transfers if the store has a higher than average number of leases. This is fraught because of the following reasons: 1. It doesn't adhere to any padding. It will do these tiny lease transfers until the store has less than or equal to the average number of leases. 2. Additionally, "average" here means the average across the cluster, not just the stores that have replicas for the range. This is clearly not good, as this doesn't translate well to heterogeneously loaded clusters. 3. These lease transfers also don't adhere to `kv.allocator.min_lease_transfer_interval`. This means they are not rate limited like the lease transfers done by the replicateQueue. This patch fixes this behavior. This patch is similar to #64559. Noticed while running a tpcc workload on a multi-region cluster, but on a database constrained to a single region. This was done to simulate a "heterogeneously loaded cluster". Release note: None 76078: opt: plan inner lookup joins on virtual column indexes in more cases r=mgartner a=mgartner ExtractJoinEqualities now reuses computed columns instead of synthesizing new columns when it creates projections that exactly match a computed column expression of a base table. This allows GenerateLookupJoinsWithVirtualCols to generate lookup joins in more cases. This also paves the way for exploring anti- and semi-lookup joins on indexes with virtual columns and expression indexes. Fixes #75872 Release note (performance improvement): The optimizer now plans inner lookup joins using expression indexes in more cases, resulting in more efficient query plans. 76193: optbuilder: do not create invalid casts when building CASE expressions r=mgartner a=mgartner The optbuilder no longer creates invalid casts when building CASE expressions that have branches of different types. CASE expressions that previously caused internal errors now result in user-facing errors. A similar change was made recently to UNION expressions in #75219. Note that there is more work to be done to be fully consistent with Postgres's CASE typing behavior, see #75365. Fixes #75365 Release note (bug fix): CASE expressions with branches that result in types that cannot be cast to a common type now result in a user-facing error instead of an internal error. 76200: cloud: bump orchestrator to v21.2.5 r=celiala a=Azhng Release note: None 76205: roachtest: update 22.1 version map to v21.2.5 r=celiala a=Azhng Release note: None Co-authored-by: Alex Sarkesian <[email protected]> Co-authored-by: Andrei Matei <[email protected]> Co-authored-by: Marcus Gartner <[email protected]> Co-authored-by: Aayush Shah <[email protected]> Co-authored-by: Azhng <[email protected]>
…ateRange` This commit makes it such that the `mergeQueue` no longer has to take special care to avoid a lease transfer when calling into `AdminRelocateRange`. This was enabled by cockroachdb#75676. The preceding logic can be cleaned up in the 22.2 cycle. Release note: None
Previously,
AdminRelocateRange
's contract was to transfer the lease away tothe first voting replica in the input slice of targets. However, this was
mostly a result of the fact that leaseholder replicas weren't allowed to remove
themselves. Thus, the caller had to specifically tell
AdminRelocateRange
whereit wanted the lease to be, and the
AdminRelocateRange
logic would ensure totransfer the lease away to this replica either before it removed the
leaseholder or after it was done performing all the replica relocations.
There are two internal callers of
AdminRelocateRange
, and this property ofthe
AdminRelocateRange
API is unfortunate for both of them.into
AdminRelocateRange
.cockroach/pkg/kv/kvserver/merge_queue.go
Lines 338 to 348 in 52c2df4
that is not optimal.
cockroach/pkg/kv/kvserver/store_rebalancer.go
Lines 601 to 605 in c2f7bd6
With #74077, leaseholder replicas will be able to remove themselves by
transferring the lease away to a suitable replica after they enter the joint
configuration of the replication change. This patch makes it such that, going
forward,
AdminRelocateRange
will only perform an explicit lease transfer ifthe caller specifically asks for it. The patch also contains a migration path
for this functionality.
Once #74077 lands, a future patch will change the store rebalancer and the
merge queue behavior to leverage this new functionality.
Release note: None