-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: prep ChangeReplicasTrigger for atomic rebalancing #39485
Conversation
a932012
to
4cd451f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all nits. Thanks for the consumable change.
Reviewed 14 of 14 files at r1, 5 of 5 files at r2.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @tbg)
docs/generated/settings/settings.html, line 125 at r2 (raw file):
<tr><td><code>trace.lightstep.token</code></td><td>string</td><td><code></code></td><td>if set, traces go to Lightstep using this token</td></tr> <tr><td><code>trace.zipkin.collector</code></td><td>string</td><td><code></code></td><td>if set, traces go to the given Zipkin instance (example: '127.0.0.1:9411'); ignored if trace.lightstep.token is set</td></tr> <tr><td><code>version</code></td><td>custom validation</td><td><code>19.1-8</code></td><td>set the active cluster version in the format '<major>.<minor>'</td></tr>
nit: it looks like this landed in the wrong commit.
pkg/roachpb/data.go, line 1310 at r1 (raw file):
if crt.Desc != nil { nextReplicaID = crt.Desc.NextReplicaID // NB: we don't want to mutate InternalReplicas, so we don't call
Will this change once #39489 is merged?
pkg/roachpb/data.go, line 1327 at r1 (raw file):
fmt.Fprintf(&chgS, "%s%s", REMOVE_REPLICA, removed) } return fmt.Sprintf("%s: after=%s next=%d", chgS.String(), afterReplicas, nextReplicaID)
nit: you might as well keep using the strings.Builder
:
fmt.Fprintf(&chgS, ": after=%s next=%d", afterReplicas, nextReplicaID)
return chgS.String()
pkg/roachpb/data.proto, line 165 at r1 (raw file):
option (gogoproto.goproto_stringer) = false; // TODO(tbg): remove when possible. 19.2 won't actively use this any more,
Mind mentioning when this will be possible?
pkg/roachpb/data.proto, line 189 at r1 (raw file):
// set when this field is used). Must not specify the same replica more than once or // overlap with the added replicas above. repeated ReplicaDescriptor internal_removed_replicas = 7 [(gogoproto.nullable) = false];
The plan is to remove the "internal" prefix from these eventually, right?
pkg/settings/cluster/cockroach_versions.go, line 509 at r1 (raw file):
}, { // VersionAtomicChangeReplicasTrigger is <TODO>.
Reminder to fill this in.
pkg/storage/replica_raft.go, line 293 at r1 (raw file):
prefix = false // Ensure that we aren't trying to remove ourselves from the range without
It seems strange to me that this lives here. I think I tried moving it down to batch evaluation at one point, but ran into issues because the commit triggers are a little panic-y. If you're working in this area and have it all paged in, do you mind taking a pass at giving this check a better home?
The implementation plan for atomic rebalancing is to first expose a testing-only API that lets us carry out general atomic membership changes (i.e. not limited to a single swap which is all that we expect to want to use in production in 19.2). This allows us to really stress the machinery in general and also prevents more API headaches down the road should we want to support more general changes. Besides, the change to an API that supports a single swap only is already similarly invasive. This is the first in a sequence of PRs/commits that plumb the API through the system (but without actually supporting atomic changes yet). The whole pipeline that needs to be plumbed is - `AdminChangeReplicasRequest` which calls into - `(*Replica).ChangeReplicas` which calls into - the internal code adding learners, sending them snapshots, etc, and which finally evaluates into a - `ChangeReplicasTrigger` that needs to eventually be translated into a - `raftpb.ConfChangeV2` (we use a v1 `ConfChange` today). This particular change deprecates the `Replica` and `ChangeType` field of the trigger, and introduces instead two slices of ReplicaDescriptors, one for added and one for removed replicas. This will be easy to translate into a `ConfChangeV2` and seemed more idiomatic (and less error-prone) than including the base desc and computing a sequence of changes from that whenever it is needed. It also matches what the higher levels of the API will do. As a drive-by, also remove ConfChangeContext.Replica, which has been unused essentially forever (<2.0). Release note: None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! All addressed.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten)
pkg/roachpb/data.go, line 1310 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Will this change once #39489 is merged?
Yes! Added a TODO to revisit.
pkg/roachpb/data.proto, line 165 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Mind mentioning when this will be possible?
Added a reference to #39182.
pkg/roachpb/data.proto, line 189 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
The plan is to remove the "internal" prefix from these eventually, right?
Yep, added TODO to the deprecated fields.
pkg/storage/replica_raft.go, line 293 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
It seems strange to me that this lives here. I think I tried moving it down to batch evaluation at one point, but ran into issues because the commit triggers are a little panic-y. If you're working in this area and have it all paged in, do you mind taking a pass at giving this check a better home?
I share the sentiment, will take another look when I'm less pressed for time. This should be straightforward.
4cd451f
to
2e8db6c
Compare
bors r=nvanbenschoten |
39485: storage: prep ChangeReplicasTrigger for atomic rebalancing r=nvanbenschoten a=tbg The implementation plan for atomic rebalancing is to first expose a testing-only API that lets us carry out general atomic membership changes (i.e. not limited to a single swap which is all that we expect to want to use in production in 19.2). This allows us to really stress the machinery in general and also prevents more API headaches down the road should we want to support more general changes. Besides, the change to an API that supports a single swap only is already similarly invasive. This is the first in a sequence of PRs/commits that plumb the API through the system (but without actually supporting atomic changes yet). The whole pipeline that needs to be plumbed is - `AdminChangeReplicasRequest` which calls into - `(*Replica).ChangeReplicas` which calls into - the internal code adding learners, sending them snapshots, etc, and which finally evaluates into a - `ChangeReplicasTrigger` that needs to eventually be translated into a - `raftpb.ConfChangeV2` (we use a v1 `ConfChange` today). This particular change deprecates the `Replica` and `ChangeType` field of the trigger, and introduces instead two slices of ReplicaDescriptors, one for added and one for removed replicas. This will be easy to translate into a `ConfChangeV2` and seemed more idiomatic (and less error-prone) than including the base desc and computing a sequence of changes from that whenever it is needed. It also matches what the higher levels of the API will do. Release note: None Co-authored-by: Tobias Schottdorf <[email protected]>
Build succeeded |
Following the plan laid out in cockroachdb#39485, this adds API support for general replication changes to `AdminChangeReplicasRequest`. `(*DB).AtomicChangeReplicas` will now accept an arbitrary set of additions/removals, though only on paper - the changes will be executed individually. The compatibility story is straightforward since this request is never persisted. We simply populate both the deprecated and the new field (and it isn't safe to emit "mixed" changes until all nodes run 19.2 and it's not worth plumbing a setting around), and in 20.1 we remove the old fields. A maybe-snag is that as a result, there are a few months left in this release in which folks may accidentally mix additions and removals in a replica change without proper version gating. This wasn't deemed very likely; to mitigate we could add in-memory state on the request that fires a panic whenever the changeType changes. Release note: None
Following the plan laid out in cockroachdb#39485, this adds API support for general replication changes to `AdminChangeReplicasRequest`. `(*DB).AtomicChangeReplicas` will now accept an arbitrary set of additions/removals, though only on paper - the changes will be executed individually. In 19.2, production code will not use this request - it exists solely for testing. The single user of atomic replication changes will be the replicate queue, which has direct access to the replication change code on the local replicas and thus does not need to use this request type. The compatibility story is thus straightforward (this request is never persisted): We simply populate both the deprecated and the new field (and it isn't safe to emit "mixed" changes until all nodes run 19.2 and it's not worth plumbing a setting around), and in 20.1 we remove the old fields. A maybe-snag is that as a result, there are a few months left in this release in which folks may accidentally mix additions and removals in a replica change without proper version gating. This wasn't deemed very likely. Release note: None
Following the plan laid out in cockroachdb#39485, this adds API support for general replication changes to `AdminChangeReplicasRequest`. `(*DB).AtomicChangeReplicas` will now accept an arbitrary set of additions/removals, though only on paper - the changes will be executed individually. In 19.2, production code will not use this request - it exists solely for testing. The single user of atomic replication changes will be the replicate queue, which has direct access to the replication change code on the local replicas and thus does not need to use this request type. The compatibility story is thus straightforward (this request is never persisted): We simply populate both the deprecated and the new field (and it isn't safe to emit "mixed" changes until all nodes run 19.2 and it's not worth plumbing a setting around), and in 20.1 we remove the old fields. A maybe-snag is that as a result, there are a few months left in this release in which folks may accidentally mix additions and removals in a replica change without proper version gating. This wasn't deemed very likely. Release note: None
39481: build: update README r=knz a=knz Release note: None 39611: storage: rework AdminChangeReplicas for atomic membership changes r=nvanbenschoten a=tbg Following the plan laid out in #39485, this adds API support for general replication changes to `AdminChangeReplicasRequest`. `(*DB).AtomicChangeReplicas` will now accept an arbitrary set of additions/removals, though only on paper - the changes will be executed individually. The compatibility story is straightforward since this request is never persisted. We simply populate both the deprecated and the new field (and it isn't safe to emit "mixed" changes until all nodes run 19.2 and it's not worth plumbing a setting around), and in 20.1 we remove the old fields. A maybe-snag is that as a result, there are a few months left in this release in which folks may accidentally mix additions and removals in a replica change without proper version gating. This wasn't deemed very likely; to mitigate we could add in-memory state on the request that fires a panic whenever the changeType changes. Release note: None Co-authored-by: Raphael 'kena' Poss <[email protected]> Co-authored-by: Tobias Schottdorf <[email protected]>
This continues the reworking of the various replication change APIs with the goal of allowing a) testing of general atomic replication changes b) issuing replica swaps from the replicate queue (in 19.2). For previous steps, see: cockroachdb#39485 cockroachdb#39611 This change is not a pure plumbing PR. Instead, it unifies `(*Replica).addReplica` and `(*Replica).removeReplica` into a method that can do both, `(*Replica).addAndRemoveReplicas`. Given a slice of ReplicationChanges, this method first adds learner replicas corresponding to the desired new voters. After having sent snapshots to all of them, the method issues a configuration change that atomically - upgrades all learners to voters - removes any undesired replicas. Note that no atomic membership changes are *actually* carried out yet. This is because the callers of `addAndRemoveReplicas` pass in only a single change (i.e. an addition or removal), which the method also verifies. Three pieces are missing after this PR: First, we need to be able to instruct raft to carry out atomic configuration changes: https://github.com/cockroachdb/cockroach/blob/2e8db6ca53c59d3d281e64939f79d937195403d4/pkg/storage/replica_proposal_buf.go#L448-L451 which in particular requires being able to store the ConfState corresponding to a joint configuration in the unreplicated local state (under a new key). Second, we must pass the slice of changes handed to `AdminChangeReplicas` through to `addAndRemoveReplicas` without unrolling it first, see: https://github.com/cockroachdb/cockroach/blob/3b316bac6ef342590ddc68d2989714d6e126371a/pkg/storage/replica_command.go#L870-L891 and https://github.com/cockroachdb/cockroach/blob/3b316bac6ef342590ddc68d2989714d6e126371a/pkg/storage/replica.go#L1314-L1325 Third, we must to teach the replicate queue to issue the "atomic swaps"; this is the reason we're introducing atomic membership changes in the first place. Release note: None
This continues the reworking of the various replication change APIs with the goal of allowing a) testing of general atomic replication changes b) issuing replica swaps from the replicate queue (in 19.2). For previous steps, see: cockroachdb#39485 cockroachdb#39611 This change is not a pure plumbing PR. Instead, it unifies `(*Replica).addReplica` and `(*Replica).removeReplica` into a method that can do both, `(*Replica).addAndRemoveReplicas`. Given a slice of ReplicationChanges, this method first adds learner replicas corresponding to the desired new voters. After having sent snapshots to all of them, the method issues a configuration change that atomically - upgrades all learners to voters - removes any undesired replicas. Note that no atomic membership changes are *actually* carried out yet. This is because the callers of `addAndRemoveReplicas` pass in only a single change (i.e. an addition or removal), which the method also verifies. Three pieces are missing after this PR: First, we need to be able to instruct raft to carry out atomic configuration changes: https://github.com/cockroachdb/cockroach/blob/2e8db6ca53c59d3d281e64939f79d937195403d4/pkg/storage/replica_proposal_buf.go#L448-L451 which in particular requires being able to store the ConfState corresponding to a joint configuration in the unreplicated local state (under a new key). Second, we must pass the slice of changes handed to `AdminChangeReplicas` through to `addAndRemoveReplicas` without unrolling it first, see: https://github.com/cockroachdb/cockroach/blob/3b316bac6ef342590ddc68d2989714d6e126371a/pkg/storage/replica_command.go#L870-L891 and https://github.com/cockroachdb/cockroach/blob/3b316bac6ef342590ddc68d2989714d6e126371a/pkg/storage/replica.go#L1314-L1325 Third, we must to teach the replicate queue to issue the "atomic swaps"; this is the reason we're introducing atomic membership changes in the first place. Release note: None
39640: storage: unify replica addition and removal paths r=nvanbenschoten a=tbg This continues the reworking of the various replication change APIs with the goal of allowing a) testing of general atomic replication changes b) issuing replica swaps from the replicate queue (in 19.2). For previous steps, see: #39485 #39611 This change is not a pure plumbing PR. Instead, it unifies `(*Replica).addReplica` and `(*Replica).removeReplica` into a method that can do both, `(*Replica).addAndRemoveReplicas`. Given a slice of ReplicationChanges, this method first adds learner replicas corresponding to the desired new voters. After having sent snapshots to all of them, the method issues a configuration change that atomically - upgrades all learners to voters - removes any undesired replicas. Note that no atomic membership changes are *actually* carried out yet. This is because the callers of `addAndRemoveReplicas` pass in only a single change (i.e. an addition or removal), which the method also verifies. Three pieces are missing after this PR: First, we need to be able to instruct raft to carry out atomic configuration changes: https://github.com/cockroachdb/cockroach/blob/2e8db6ca53c59d3d281e64939f79d937195403d4/pkg/storage/replica_proposal_buf.go#L448-L451 which in particular requires being able to store the ConfState corresponding to a joint configuration in the unreplicated local state (under a new key). Second, we must pass the slice of changes handed to `AdminChangeReplicas` through to `addAndRemoveReplicas` without unrolling it first, see: https://github.com/cockroachdb/cockroach/blob/3b316bac6ef342590ddc68d2989714d6e126371a/pkg/storage/replica_command.go#L870-L891 and https://github.com/cockroachdb/cockroach/blob/3b316bac6ef342590ddc68d2989714d6e126371a/pkg/storage/replica.go#L1314-L1325 Third, we must to teach the replicate queue to issue the "atomic swaps"; this is the reason we're introducing atomic membership changes in the first place. Release note: None 39656: kv: init heartbeat txn log tag later r=nvanbenschoten a=tbg At init() time, the txn proto has not been populated yet. Found while investigating #39652. This change strikes me as clunky, but I don't have the bandwidth to dig deeper right now. Release note: None 39666: testutils/lint/passes: disable under nightly stress r=mjibson a=mjibson Under stress these error with "go build a: failed to cache compiled Go files". Fixes #39616 Fixes #39541 Fixes #39479 Release note: None 39669: rpc: use gRPC enforced minimum keepalive timeout r=knz a=ajwerner Before this commit we'd experience the following annoying log message from gRPC every time we create a new connection telling us that our setting is being ignored. ``` Adjusting keepalive ping interval to minimum period of 10s ``` Release note: None Co-authored-by: Tobias Schottdorf <[email protected]> Co-authored-by: Matt Jibson <[email protected]> Co-authored-by: Andrew Werner <[email protected]>
The implementation plan for atomic rebalancing is to first expose a
testing-only API that lets us carry out general atomic membership changes
(i.e. not limited to a single swap which is all that we expect to want to
use in production in 19.2).
This allows us to really stress the machinery in general and also prevents
more API headaches down the road should we want to support more general
changes. Besides, the change to an API that supports a single swap only is
already similarly invasive.
This is the first in a sequence of PRs/commits that plumb the API through
the system (but without actually supporting atomic changes yet). The whole
pipeline that needs to be plumbed is
AdminChangeReplicasRequest
which calls into(*Replica).ChangeReplicas
which calls intowhich finally evaluates into a
ChangeReplicasTrigger
that needs to eventually be translatedinto a
raftpb.ConfChangeV2
(we use a v1ConfChange
today).This particular change deprecates the
Replica
andChangeType
field ofthe trigger, and introduces instead two slices of ReplicaDescriptors, one
for added and one for removed replicas. This will be easy to translate into
a
ConfChangeV2
and seemed more idiomatic (and less error-prone) thanincluding the base desc and computing a sequence of changes from that
whenever it is needed. It also matches what the higher levels of the API
will do.
Release note: None