-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvserver: make AdminRelocateRange work with non-voting replicas #56197
kvserver: make AdminRelocateRange work with non-voting replicas #56197
Conversation
24db83c
to
7125b86
Compare
5a9bfde
to
0a4436c
Compare
0a4436c
to
51f993a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 21 of 21 files at r1, 17 of 17 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15 and @andreimatei)
pkg/kv/batch.go, line 690 at r2 (raw file):
func (b *Batch) adminRelocateRange( key interface{}, voterTargets []roachpb.ReplicationTarget,
nit: it might be cleaner to do voterTargets, nonVoterTargets []roachpb.ReplicationTarget
. Same thing in a few places in this commit.
pkg/kv/kvclient/kvcoord/integration_test.go, line 27 at r1 (raw file):
"github.com/cockroachdb/cockroach/pkg/testutils" "github.com/cockroachdb/cockroach/pkg/testutils/serverutils" hlc2 "github.com/cockroachdb/cockroach/pkg/util/hlc"
Are these hlc2
changes intentional?
pkg/kv/kvserver/allocator_scorer.go, line 413 at r2 (raw file):
candidateStores StoreList, constraints constraint.AnalyzedConstraints, existing []roachpb.ReplicaDescriptor,
existingReplicas
to stay in sync with allocateTargetFromList
?
pkg/kv/kvserver/client_merge_test.go, line 3652 at r2 (raw file):
} func TestMergeQueueSeesNonVoter(t *testing.T) {
Nice test!
pkg/kv/kvserver/client_merge_test.go, line 3721 at r2 (raw file):
require.Nil(t, err) // We're going to split the dummy range created above with an empty // expiration time. Disable the merge queue before splitting so that the
Was this the only reason we added the expiration time in the previous commit? If so, I wonder if we can use an AdminUnsplitRequest
here instead.
pkg/kv/kvserver/client_replica_test.go, line 2211 at r2 (raw file):
db := tc.Servers[0].DB() require.Nil(t, db.AdminRelocateRange( ctx, key, makeReplicationTargets(1, 2, 3), makeReplicationTargets(),
Same point as elsewhere. Should we just pass nil
here? This function call isn't obviously variadic, so it's not clear what it's meant to return when reading this code.
pkg/kv/kvserver/client_replica_test.go, line 2233 at r2 (raw file):
rand.Intn(2) < 1
minor nit: rand.Intn(2) == 0
is the more canonical way to do a coin flip.
pkg/kv/kvserver/closed_timestamp_test.go, line 905 at r1 (raw file):
setupTestClusterWithDummyRange
Is this name still correct?
pkg/kv/kvserver/closed_timestamp_test.go, line 1185 at r1 (raw file):
dbName string, numNodes int, callback postSetupCallback,
What's the purpose of this callback? Can't a caller use the returned TestClusterInterface to create a new conn if they need one?
pkg/kv/kvserver/merge_queue.go, line 299 at r2 (raw file):
return false, errors.Errorf( `cannot merge because lhs contains replicas that are neither NON_VOTER nor VOTER_FULL: %v`,
Should we use the same new language about "when lhs is in a joint state" here?
pkg/kv/kvserver/merge_queue.go, line 338 at r2 (raw file):
} } // We only care about merging ranges that are aligned with regards to their
So this will try to line up lhs voters with rhs voters and lhs non-voters with rhs non-voters? That seems reasonable, though maybe not strictly necessary, right? Does the replica type on the rhs of the merge need to match up with the replica type on the lhs of the merge? Is there even a meaningful difference between replica types once a range has been subsumed and all replicas have caught up on the log?
I'm mostly interested in this if this is causing superfluous data movement. Will the following config require any movement of data or will we be able to promote and demote voters/non-voters on the RHS in place?
lhs: voters = {1}, non-voters = {2}
rhs: voters = {2}, non-voters = {1}
If we can do everything in place, then this seems fine. But if not, then it seems worth digging into whether this is all necessary.
pkg/kv/kvserver/replica_command.go, line 2128 at r2 (raw file):
rangeDesc = *newDesc if len(nonVoterTargets) > 0 {
Any reason to do the non-voters first?
pkg/kv/kvserver/replica_command.go, line 2257 at r2 (raw file):
isVoter bool, ) ([]roachpb.ReplicationChange, *roachpb.ReplicationTarget, error) { if repls := desc.Replicas(); len(repls.Learners()) != 0 || repls.InAtomicReplicationChange() {
Can we use the new predicate you added here?
pkg/kv/kvserver/replica_command.go, line 2430 at r2 (raw file):
} func targetsToRemove(
nit: you call targetsToAdd
first, so might as well keep the same order here.
pkg/kv/kvserver/replica_command.go, line 2430 at r2 (raw file):
} func targetsToRemove(
Are these the same function, just with flipped args? If so, can/should we avoid the duplication?
pkg/kv/kvserver/replica_rangefeed_test.go, line 117 at r1 (raw file):
// Split the range so that the RHS uses epoch-based leases. startKey := []byte("a") tc.SplitRangeOrFatal(t, startKey, hlc.MaxTimestamp)
Would it be worth adding a SplitRangeOrFatalWithExpiration
and having SplitRangeOrFatal
call into this with hlc.MaxTimestamp as the expiration to avoid all of these changes? It doesn't seem like very many callers want to worry about the expiration time.
pkg/kv/kvserver/testing_knobs.go, line 238 at r2 (raw file):
BeforeSnapshotSSTIngestion func(IncomingSnapshot, SnapshotRequest_Type, []string) error // BeforeRelocateOne intercepts the return values of s.relocateOne // before they're being put into effect.
Was this intentional? It doesn't look like the comment was more than 80 chars.
pkg/roachpb/metadata_replicas.go, line 259 at r2 (raw file):
// VotersAndNonVoters returns the list of VOTER_FULL/NON_VOTER replicas in the // set.
Similar to the comment on ReplicaDescriptors.Voters
, it seems worthwhile to explain the meaning of this subset of replicas here. Why is this combination special?
pkg/sql/relocate.go, line 135 at r2 (raw file):
} else { if err := params.p.ExecCfg().DB.AdminRelocateRange( params.ctx, rowKey, relocationTargets, []roachpb.ReplicationTarget{},
Can we just pass nil
here and avoid the allocation? Same thing in a few places.
pkg/kv/kvserver/merge_queue.go, line 338 at r2 (raw file): Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
The example you gave would not cause superfluous data movement. See the However, when the replica sets are not equal, I thought it made sense to try to line up the voters with voters and non-voters with non-voters because it would preserve the status quo with regards to constraints conformance. The merge queue will only get to this point in the code if the zone configs for the ranges in question are the same, which means that, unless we're already in violation of a zone config constraint, it must be legal for us to try and line up non-voter and voter sets individually. We can't say the same about other configurations that align replica sets. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15 and @nvanbenschoten)
pkg/kv/kvserver/client_relocate_range_test.go, line 157 at r2 (raw file):
// Expect two single additions, and that's it. requireNumAtomic(0, 2, func() int { return relocateAndCheck(t, tc, k, targets, []roachpb.ReplicationTarget{})
you haven't used nils on purpose here?
pkg/kv/kvserver/closed_timestamp_test.go, line 908 at r1 (raw file):
// the {dbname}.kv table and performs splits on the table's range such that the // 2 resulting ranges contain exactly one of the rows each. func splitDummyRangeInTestCluster(
This function takes in a dbName but hardcodes the table name. Is this useful? I think if should go one way or the other.
pkg/kv/kvserver/closed_timestamp_test.go, line 933 at r1 (raw file):
} tcImpl := tc.(*testcluster.TestCluster) // Split at `k` so that the `kv` table has exactly two ranges: [1,2) and [2,
Have you considered using alter table split at (2)
?
pkg/kv/kvserver/closed_timestamp_test.go, line 943 at r1 (raw file):
} leftDesc, rightDesc := tcImpl.SplitRangeOrFatal(t, k, splitExpirationTime) if tc.ReplicationMode() != base.ReplicationManual {
Why don't we do this on ReplicationManual
? Do we depend on any queues here?
pkg/kv/kvserver/closed_timestamp_test.go, line 1185 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
What's the purpose of this callback? Can't a caller use the returned TestClusterInterface to create a new conn if they need one?
+1
pkg/kv/kvserver/closed_timestamp_test.go, line 1195 at r1 (raw file):
-- forever under high load (testrace under high concurrency). SET statement_timeout='30s'; CREATE DATABASE %[1]s;
whoa I didn't know this was a thing
pkg/kv/kvserver/merge_queue.go, line 294 at r2 (raw file):
// Defensive sanity check that the ranges involved only have either VOTER_FULL // or NON_VOTER replicas.
nit: VOTER_FULL and NON_VOTER replicas
:)
pkg/kv/kvserver/merge_queue.go, line 298 at r2 (raw file):
if typ := lhsReplicas[i].GetType(); !(typ == roachpb.VOTER_FULL || typ == roachpb.NON_VOTER) { return false, errors.Errorf(
If I am to read anything into the "defensive sanity check" part, this should be an AssertionError
? Same below.
pkg/kv/kvserver/merge_queue.go, line 314 at r2 (raw file):
} if !replicaSetsEqual(lhsReplicas, rhsReplicas) {
let's rename replicaSetsEqual
to something that makes it more clear that the replica types don't matter. For example replicasCollocated
.
pkg/kv/kvserver/merge_queue.go, line 314 at r2 (raw file):
} if !replicaSetsEqual(lhsReplicas, rhsReplicas) {
please give this whole block a comment about what it'll do it !replicaSetsEqual
pkg/kv/kvserver/merge_queue.go, line 338 at r2 (raw file):
Previously, aayushshah15 (Aayush Shah) wrote…
The example you gave would not cause superfluous data movement. See the
!replicaSetsEqual
predicate above. This doesn't seem to require anything in the way of voter demotion or non-voter promotion since replica types dont need to match up between the left and right ranges.However, when the replica sets are not equal, I thought it made sense to try to line up the voters with voters and non-voters with non-voters because it would preserve the status quo with regards to constraints conformance. The merge queue will only get to this point in the code if the zone configs for the ranges in question are the same, which means that, unless we're already in violation of a zone config constraint, it must be legal for us to try and line up non-voter and voter sets individually. We can't say the same about other configurations that align replica sets.
So then the example we should be discussing is something like
lhs: voters = {1}, non-voters = {2,3}
rhs: voters = {2}, non-voters = {1,4}
It seems to me that we should make it such that only one replica needs to change here, and the voters for the lhs and rhs can stay as they are.
Btw, it'd be nice if we put a comment around here about what the conditions for the merge are. Say that voters vs non-voters doesn't matter, and say that the leaseholders can be on different nodes (right?).
pkg/kv/kvserver/merge_queue.go, line 341 at r2 (raw file):
// zone config constraints. In practice, this means that we likely only care // about aligning the individual replica sets of the Voters() and // NonVoters().
I don't really understand what this comment is trying to tell me.
Can we make the "We only care about merging ranges that are aligned ..." part more precise? Who's we and what happens if this code is asked to merge ranges with different constraints? What does it mean for a range (or for a pair of ranges?) to be "aligned with regards to their zone config constraints"? If what you want to say is that this isn't called for ranges with different zone configs, then I think it'd be much clearer to say exactly that.
And the I don't understand the "in practice" and "likely" parts :). What more could we care about?
pkg/kv/kvserver/replica_rangefeed_test.go, line 117 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Would it be worth adding a
SplitRangeOrFatalWithExpiration
and havingSplitRangeOrFatal
call into this with hlc.MaxTimestamp as the expiration to avoid all of these changes? It doesn't seem like very many callers want to worry about the expiration time.
+1
If for whatever reason you want to leave it like this, at least make a const NeverExpires = hlc.MaxTimestamp
and use it in all the callers to make them self-documenting.
51f993a
to
38b8aef
Compare
aefd3b8
to
1247a2f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @andreimatei, and @nvanbenschoten)
pkg/kv/batch.go, line 690 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
nit: it might be cleaner to do
voterTargets, nonVoterTargets []roachpb.ReplicationTarget
. Same thing in a few places in this commit.
done.
pkg/kv/kvclient/kvcoord/integration_test.go, line 27 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Are these
hlc2
changes intentional?
these imports are all gone now after I made the change to add a new SplitRangeWithExpiration
method.
pkg/kv/kvserver/allocator_scorer.go, line 413 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
existingReplicas
to stay in sync withallocateTargetFromList
?
done
pkg/kv/kvserver/client_merge_test.go, line 3721 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Was this the only reason we added the expiration time in the previous commit? If so, I wonder if we can use an
AdminUnsplitRequest
here instead.
I find the empty expiration bit approach cleaner, but I can make the change if you insist.
pkg/kv/kvserver/client_relocate_range_test.go, line 157 at r2 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
you haven't used nils on purpose here?
a little embarrassed to admit that I didn't know about the nuance between empty and nil slices, changed to nil slices.
pkg/kv/kvserver/client_replica_test.go, line 2211 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Same point as elsewhere. Should we just pass
nil
here? This function call isn't obviously variadic, so it's not clear what it's meant to return when reading this code.
Done.
pkg/kv/kvserver/client_replica_test.go, line 2233 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
rand.Intn(2) < 1
minor nit:
rand.Intn(2) == 0
is the more canonical way to do a coin flip.
Changed.
pkg/kv/kvserver/closed_timestamp_test.go, line 905 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
setupTestClusterWithDummyRange
Is this name still correct?
I think so, unless I'm missing something here
pkg/kv/kvserver/closed_timestamp_test.go, line 908 at r1 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
This function takes in a dbName but hardcodes the table name. Is this useful? I think if should go one way or the other.
added a tableName
param.
pkg/kv/kvserver/closed_timestamp_test.go, line 933 at r1 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
Have you considered using
alter table split at (2)
?
yes but that doesn't really make it any cleaner, right? we'd still have to hustle a bit to get the leftDesc
and rightDesc
rangeDescriptors unless I'm missing some obvious utility method.
pkg/kv/kvserver/closed_timestamp_test.go, line 943 at r1 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
Why don't we do this on
ReplicationManual
? Do we depend on any queues here?
WaitForFullReplication()
forces a scan on the replicate queue, which doesn't work with the queue disabled (because it calls maybeAdd
into the queue).
pkg/kv/kvserver/closed_timestamp_test.go, line 1185 at r1 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
+1
you're right, I dont remember why I did that. Fixed.
pkg/kv/kvserver/closed_timestamp_test.go, line 1195 at r1 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
whoa I didn't know this was a thing
glad to be of service
pkg/kv/kvserver/merge_queue.go, line 294 at r2 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
nit: VOTER_FULL and NON_VOTER replicas
:)
Done.
pkg/kv/kvserver/merge_queue.go, line 298 at r2 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
If I am to read anything into the "defensive sanity check" part, this should be an
AssertionError
? Same below.
Done.
pkg/kv/kvserver/merge_queue.go, line 299 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Should we use the same new language about "when lhs is in a joint state" here?
Changed.
pkg/kv/kvserver/merge_queue.go, line 314 at r2 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
let's rename
replicaSetsEqual
to something that makes it more clear that the replica types don't matter. For examplereplicasCollocated
.
Done.
pkg/kv/kvserver/merge_queue.go, line 314 at r2 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
please give this whole block a comment about what it'll do it
!replicaSetsEqual
Added a method TargetsToCollocateRHSDuringMerge
that should have the comment you want.
pkg/kv/kvserver/merge_queue.go, line 338 at r2 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
So then the example we should be discussing is something like
lhs: voters = {1}, non-voters = {2,3} rhs: voters = {2}, non-voters = {1,4}
It seems to me that we should make it such that only one replica needs to change here, and the voters for the lhs and rhs can stay as they are.
Btw, it'd be nice if we put a comment around here about what the conditions for the merge are. Say that voters vs non-voters doesn't matter, and say that the leaseholders can be on different nodes (right?).
I've revamped the logic here. We now only relocate replicas that are not collocated between LHS and RHS. PTAL.
pkg/kv/kvserver/merge_queue.go, line 341 at r2 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
I don't really understand what this comment is trying to tell me.
Can we make the "We only care about merging ranges that are aligned ..." part more precise? Who's we and what happens if this code is asked to merge ranges with different constraints? What does it mean for a range (or for a pair of ranges?) to be "aligned with regards to their zone config constraints"? If what you want to say is that this isn't called for ranges with different zone configs, then I think it'd be much clearer to say exactly that.And the I don't understand the "in practice" and "likely" parts :). What more could we care about?
Removed the comment, PTAL the comment over TargetsToCollocateRHSDuringMerge
.
pkg/kv/kvserver/replica_command.go, line 2128 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Any reason to do the non-voters first?
I can't particularly rationalize doing either first. Can you think of anything that supports relocating one set over the other?
pkg/kv/kvserver/replica_command.go, line 2257 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Can we use the new predicate you added here?
Done.
pkg/kv/kvserver/replica_rangefeed_test.go, line 117 at r1 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
+1
If for whatever reason you want to leave it like this, at least make aconst NeverExpires = hlc.MaxTimestamp
and use it in all the callers to make them self-documenting.
Added a SplitRangeWithExpiration
.
pkg/kv/kvserver/testing_knobs.go, line 238 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Was this intentional? It doesn't look like the comment was more than 80 chars.
I had renamed the knob to something else and then changed it back. Fixed the wrapping now.
pkg/roachpb/metadata_replicas.go, line 259 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Similar to the comment on
ReplicaDescriptors.Voters
, it seems worthwhile to explain the meaning of this subset of replicas here. Why is this combination special?
Done. Let me know if it seems lacking.
pkg/sql/relocate.go, line 135 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Can we just pass
nil
here and avoid the allocation? Same thing in a few places.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 31 of 31 files at r3, 50 of 50 files at r4, 19 of 19 files at r5.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15 and @andreimatei)
pkg/kv/kvserver/closed_timestamp_test.go, line 1158 at r3 (raw file):
// corresponding to this table. func setupTestClusterWithDummyRange( t *testing.T, clusterArgs base.TestClusterArgs, dbName string, tableName string, numNodes int,
nit: dbName, tableName string
pkg/kv/kvserver/merge_queue.go, line 374 at r5 (raw file):
} // TargetsToCollocateRHSDuringMerge computes the stores which the voters and
Does this need to be exported? Also, should it live next to replicasCollocated
?
pkg/kv/kvserver/merge_queue.go, line 383 at r5 (raw file):
// why or why not. func TargetsToCollocateRHSDuringMerge( leftRepls roachpb.ReplicaSet, rightRepls roachpb.ReplicaSet,
nit: leftRepls, rightRepls roachpb.ReplicaSet
pkg/kv/kvserver/merge_queue.go, line 384 at r5 (raw file):
func TargetsToCollocateRHSDuringMerge( leftRepls roachpb.ReplicaSet, rightRepls roachpb.ReplicaSet, ) ([]roachpb.ReplicationTarget, []roachpb.ReplicationTarget) {
nit: give these names. That will allow you to eliminate the duplicate type
pkg/kv/kvserver/merge_queue.go, line 391 at r5 (raw file):
// Sets of replicas that exist on the LHS but not on the RHS nonCollocated := leftRepls.Filter(notInRight)
I found it difficult to remember which of these 5 vars were referring to the LHS and which were referring to the RHS. Consider renaming to make that more clear
pkg/kv/kvserver/merge_queue.go, line 396 at r5 (raw file):
// We bootstrap our result set by first including the replicas (voting and // non-voting) that _are_ collocated, as these will stay unchanged.
I always find it difficult to remember what a ReplicationTarget means if the range already has a replica there. Consider answering this question here by saying something like will stay unchanged and will result in a no-op when passed through AdminRelocateRange
.
pkg/kv/kvserver/merge_queue.go, line 410 at r5 (raw file):
finalVoters.AddReplica(nonCollocatedNonVoters[0]) nonCollocatedNonVoters = nonCollocatedNonVoters[1:] }
What happens if we fall through here? Should this be possible? If so, would we loop indefinitely? If that's not possible, then we should add an assertion.
pkg/kv/kvserver/replica_command.go, line 2128 at r2 (raw file):
Previously, aayushshah15 (Aayush Shah) wrote…
I can't particularly rationalize doing either first. Can you think of anything that supports relocating one set over the other?
No strong reason, other than that this looks deliberate (because voters comes before non-voters everywhere else), which might trip up readers.
pkg/kv/kvserver/replica_command.go, line 2161 at r4 (raw file):
func checkDescsEqual(desc *roachpb.RangeDescriptor) func(*roachpb.RangeDescriptor) bool { // TODO(jeffreyxiao): This hacky fix ensures that we don't fail the
What changed to make this needed again?
pkg/kv/kvserver/replica_command.go, line 2574 at r5 (raw file):
} func diff(first, second []roachpb.ReplicationTarget ) (diff []roachpb.ReplicaDescriptor) {
Let's give this a better name. Having a diff
function in the pkg/kv/kvserver
namespace is going to be pretty confusing.
Also, give it a quick explanation. What's being diffed? How do first
and second
relate? What's the result?
pkg/roachpb/metadata_replicas.go, line 259 at r2 (raw file):
Previously, aayushshah15 (Aayush Shah) wrote…
Done. Let me know if it seems lacking.
Isn't this also the set that we allow follower reads to evaluate on? If so, maybe talk a bit about that.
Or does that set also include VOTER_INCOMING
?
pkg/roachpb/metadata_replicas.go, line 55 at r4 (raw file):
} // ReplicaSet is a set of replicas, usually the nodes/stores on which
This all seems fine, but it's also a huge diff. Do you mind talking about the motivation a bit more? You said "This is because the current naming makes it hard/confusing to add new methods that return another ReplicaDescriptors
instead of the underlying slice.". Could you give an example of where this became awkward?
pkg/roachpb/metadata_replicas.go, line 64 at r4 (raw file):
// descriptors. // // All construction of ReplicaSet is required to go through this method
nit: rewrap this line
pkg/roachpb/metadata_replicas.go, line 338 at r5 (raw file):
// Contains returns true if the set contains rDesc. func (d ReplicaSet) Contains(rDesc ReplicaDescriptor) bool { for _, repl := range d.Descriptors() {
We should avoid the memory copy of the ReplicaDescriptor
on each iteration. Instead, use
descs := d.Descriptors()
for i := range descs {
repl := &descs[i]
...
}
pkg/roachpb/metadata_replicas.go, line 469 at r5 (raw file):
func (d ReplicaSet) ReplicationTargets() (out []ReplicationTarget) { for _, r := range d.Descriptors() { out = append(out, ReplicationTarget{NodeID: r.NodeID, StoreID: r.StoreID})
Let's pre-allocate the slice's length so we're not wasting work.
And let's avoid the descriptor copy, as discussed above.
pkg/sql/relocate.go, line 135 at r2 (raw file):
Previously, aayushshah15 (Aayush Shah) wrote…
Done.
For this one, give it the /* ... */
treatment.
1247a2f
to
984795f
Compare
This commit extracts some existing testing methods to be more general. so they can be reused for testing merge queue behavior with non-voting replicas. Release note: None
This commit renames `ReplicaDescriptors`, which provides some utility methods over a `[]ReplicaDescriptor`. This is because the current naming makes it hard/confusing to add new methods that return another `ReplicaDescriptors` instead of the underlying slice. Release note: None
This commit teaches `RelocateRange` to work with non-voters. This lets the merge queue rebalance a range that has non-voters so the merge can actually proceed, as range merges require that replica sets of the LHS and RHS ranges must be aligned. Release note: None
9a5ef6f
to
bfe698a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
bors r+
Reviewable status: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @aayushshah15, @andreimatei, and @nvanbenschoten)
pkg/kv/kvserver/replica_command.go, line 2182 at r20 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
nit: consider
_ error
at the end. You're not using theerr
variable, and introducing it in scope can only hurt. You're also not using the other named returns, but for those having the name in the signature is beneficial for documentation (whereas for the error it's not)
Done.
pkg/kv/kvserver/replica_command.go, line 2203 at r20 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
nit: loop through the set
Done.
Build succeeded: |
Before this patch, cockroachdb#56197 broke this test because it changed a range merge error string that let the KV nemesis validator ignore the error. This patch updates the regexp the validator uses to match the error and unskips the test. Fixes cockroachdb#58624. Release note: None
Before this patch, cockroachdb#56197 broke this test because it changed a range merge error string that let the KV nemesis validator ignore the error. This patch updates the regexp the validator uses to match the error and unskips the test. Fixes cockroachdb#58624. Release note: None
58888: sql, jobs: stop queuing schema change jobs for in-txn schema changes r=postamar a=postamar Creating a table and changing its schema within a transaction would cause a schema change job to be queued. Such jobs are not necessary and don't do anything. This patch prevents them from being queued in the first place. Fixes #45985. Release note (sql change): Creating a table and changing its schema within a transaction no longer schedules a schema change job. 58919: kvnemesis: fix and unskip TestKVNemesisMultiNode r=aayushshah15 a=aayushshah15 Before this patch, #56197 broke this test because it changed a range merge error string that let the KV nemesis validator ignore the error. This patch updates the regexp the validator uses to match the error and unskips the test. Fixes #58624. Release note: None 58982: delegate: remove is_active_region from SHOW REGIONS FROM DATABASE r=ajstorm a=otan Due to unpopular demand. Release note: None Co-authored-by: Marius Posta <[email protected]> Co-authored-by: Aayush Shah <[email protected]> Co-authored-by: Oliver Tan <[email protected]>
This commit removes the relocation logic used by the `mergeQueue` thus far to align replica sets (added in cockroachdb#56197). This logic previously existed in order to allow us to align the replica sets of a pair of ranges (which is required for the range merge to proceed), while avoiding redundant data movement. Before cockroachdb#58627 and the previous commit in this PR, `AdminRelocateRange` couldn't be directly used by the mergeQueue under various configurations of the LHS and RHS ranges. Furthermore, even when it could be used, it would involve redundant data movement. This all required us to compute relocation targets for the RHS of a merge separately, above the call to `AdminRelocateRange`, for the range merge to proceed. All these limitations have been resolved by the previous commit which teaches `AdminRelocateRange` to promote non-voters and demote voters when needed, and the aforementioned bespoke relocation logic is no longer needed. Release note: None
This commit removes the relocation logic used by the `mergeQueue` thus far to align replica sets (added in cockroachdb#56197). This logic previously existed in order to allow us to align the replica sets of a pair of ranges (which is required for the range merge to proceed), while avoiding redundant data movement. Before cockroachdb#58627 and the previous commit in this PR, `AdminRelocateRange` couldn't be directly used by the mergeQueue under various configurations of the LHS and RHS ranges. Furthermore, even when it could be used, it would involve redundant data movement. This all required us to compute relocation targets for the RHS of a merge separately, above the call to `AdminRelocateRange`, for the range merge to proceed. All these limitations have been resolved by the previous commit which teaches `AdminRelocateRange` to promote non-voters and demote voters when needed, and the aforementioned bespoke relocation logic is no longer needed. Release note: None
This commit removes the relocation logic used by the `mergeQueue` thus far to align replica sets (added in cockroachdb#56197). This logic previously existed in order to allow us to align the replica sets of a pair of ranges (which is required for the range merge to proceed), while avoiding redundant data movement. Before cockroachdb#58627 and the previous commit in this PR, `AdminRelocateRange` couldn't be directly used by the mergeQueue under various configurations of the LHS and RHS ranges. Furthermore, even when it could be used, it would involve redundant data movement. This all required us to compute relocation targets for the RHS of a merge separately, above the call to `AdminRelocateRange`, for the range merge to proceed. All these limitations have been resolved by the previous commit which teaches `AdminRelocateRange` to promote non-voters and demote voters when needed, and the aforementioned bespoke relocation logic is no longer needed. Release note: None
This commit removes the relocation logic used by the `mergeQueue` thus far to align replica sets (added in cockroachdb#56197). This logic previously existed in order to allow us to align the replica sets of a pair of ranges (which is required for the range merge to proceed), while avoiding redundant data movement. Before cockroachdb#58627 and the previous commit in this PR, `AdminRelocateRange` couldn't be directly used by the mergeQueue under various configurations of the LHS and RHS ranges. Furthermore, even when it could be used, it would involve redundant data movement. This all required us to compute relocation targets for the RHS of a merge separately, above the call to `AdminRelocateRange`, for the range merge to proceed. All these limitations have been resolved by the previous commit which teaches `AdminRelocateRange` to promote non-voters and demote voters when needed, and the aforementioned bespoke relocation logic is no longer needed. Release note: None
62631: kvserver: get rid of bespoke relocation logic used by the mergeQueue r=aayushshah15 a=aayushshah15 This PR contains two main commits: **kvserver: update `AdminRelocateRange` to leverage explicit swaps of voters to non-voters** This commit updates `AdminRelocateRange` to use explicit atomic swaps of voting replicas with non-voting replicas, that #58627 initially added support for. The patch does so by generalizing behavior that's already exercised by the `replicateQueue` when it decides to rebalance replicas. See #61239. This allows us, in the next commit, to remove bespoke relocation logic that's used by the `mergeQueue` to align replica sets for the sake of a range merge. Release note: None **kvserver: get rid of bespoke relocation logic used by the mergeQueue** This commit removes the relocation logic used by the `mergeQueue` thus far to align replica sets (added in #56197). This logic previously existed in order to allow us to align the replica sets of a pair of ranges (which is required for the range merge to proceed), while avoiding redundant data movement. Before #58627 and the previous commit in this PR, `AdminRelocateRange` couldn't be directly used by the mergeQueue under various configurations of the LHS and RHS ranges. Furthermore, even when it could be used, it would involve redundant data movement. This all required us to compute relocation targets for the RHS of a merge separately, above the call to `AdminRelocateRange`, for the range merge to proceed. All these limitations have been resolved by the previous commit which teaches `AdminRelocateRange` to promote non-voters and demote voters when needed, and the aforementioned bespoke relocation logic is no longer needed. Resolves #62370 Release note: None Co-authored-by: aayush <[email protected]> Co-authored-by: Aayush Shah <[email protected]>
This commit removes the relocation logic used by the `mergeQueue` thus far to align replica sets (added in cockroachdb#56197). This logic previously existed in order to allow us to align the replica sets of a pair of ranges (which is required for the range merge to proceed), while avoiding redundant data movement. Before cockroachdb#58627 and the previous commit in this PR, `AdminRelocateRange` couldn't be directly used by the mergeQueue under various configurations of the LHS and RHS ranges. Furthermore, even when it could be used, it would involve redundant data movement. This all required us to compute relocation targets for the RHS of a merge separately, above the call to `AdminRelocateRange`, for the range merge to proceed. All these limitations have been resolved by the previous commit which teaches `AdminRelocateRange` to promote non-voters and demote voters when needed, and the aforementioned bespoke relocation logic is no longer needed. Release note: None
This commit teaches
RelocateRange
to work with non-voters. This letsthe merge queue rebalance a range that has non-voters so the merge can
actually proceed, as range merges require that replica sets of the LHS
and RHS ranges must be aligned.
Makes progress on #51943
Release note: None