-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: rework AdminChangeReplicas for atomic membership changes #39611
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The compatibility story is straightforward since this request is never
persisted. We simply populate both the deprecated and the new field (and
it isn't safe to emit "mixed" changes until all nodes run 19.2 and it's
not worth plumbing a setting around), and in 20.1 we remove the old
fields.
How are you planning on using these new fields? For an AdminChangeReplicas request that adds and removes replicas, we'll need to break compatibility with existing versions of CRDB, so we'll need a gated cluster version. Maybe I'm missing something.
Reviewed 14 of 14 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @tbg)
pkg/roachpb/api.go, line 1192 at r1 (raw file):
// item for each target. func MakeReplicationChanges( changeType ReplicaChangeType, targets []ReplicationTarget,
nit: consider making the targets
arg variadic.
pkg/roachpb/api.go, line 1209 at r1 (raw file):
Should we assert that this is consistent across all changes until we can get rid of it? This is what you meant by the following, right?
A maybe-snag is that as a result, there are a few months left
in this release in which folks may accidentally mix additions and
removals in a replica change without proper version gating. This wasn't
deemed very likely; to mitigate we could add in-memory state on the
request that fires a panic whenever the changeType changes.
What we have here seems like a footgun because we're blindly ignoring the change type of all but the first change. If we're not careful at the caller, we could end up performing a replica change that we didn't intend on.
pkg/roachpb/api.proto, line 750 at r1 (raw file):
} message ReplicationChange {
Give this a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are not using them except for testing. The only production user of atomic replication changes in 19.2 is the replicate queue, but it doesn't use this request type but calls straight into (*Replica).addReplica
:
cockroach/pkg/storage/replicate_queue.go
Lines 763 to 765 in dd5053e
if _, err := repl.addReplica(ctx, target, desc, priority, reason, details); err != nil { | |
return err | |
} |
I should've added that bit of information to the commit message (done now).
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)
pkg/roachpb/api.go, line 1209 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Should we assert that this is consistent across all changes until we can get rid of it? This is what you meant by the following, right?
A maybe-snag is that as a result, there are a few months left
in this release in which folks may accidentally mix additions and
removals in a replica change without proper version gating. This wasn't
deemed very likely; to mitigate we could add in-memory state on the
request that fires a panic whenever the changeType changes.What we have here seems like a footgun because we're blindly ignoring the change type of all but the first change. If we're not careful at the caller, we could end up performing a replica change that we didn't intend on.
I want to use this for testing, so I don't want to outright prevent mixing (see TestAtomicMembershipChange
). But since I need it only in tests, I added what I feel is an acceptable hack: we look for a "testing" key in the context. If it isn't there, we make sure you don't mix adds and removes. Otherwise, you get to do it. We could also have two flavors of this method, but then we need to worry similarly about the general flavor being called in prod before it's safe. PTAL
1309abd
to
e0a09df
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 10 of 10 files at r2.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten and @tbg)
pkg/internal/client/db.go, line 538 at r2 (raw file):
// TODO(tbg): remove this when 19.2 is released. var typ *roachpb.ReplicaChangeType for _, chg := range chgs {
nit:
for i := range chgs {
chg := &chgs[i]
if typ == nil {
typ = &chg.ChangeType
to avoid the allocation.
Following the plan laid out in cockroachdb#39485, this adds API support for general replication changes to `AdminChangeReplicasRequest`. `(*DB).AtomicChangeReplicas` will now accept an arbitrary set of additions/removals, though only on paper - the changes will be executed individually. In 19.2, production code will not use this request - it exists solely for testing. The single user of atomic replication changes will be the replicate queue, which has direct access to the replication change code on the local replicas and thus does not need to use this request type. The compatibility story is thus straightforward (this request is never persisted): We simply populate both the deprecated and the new field (and it isn't safe to emit "mixed" changes until all nodes run 19.2 and it's not worth plumbing a setting around), and in 20.1 we remove the old fields. A maybe-snag is that as a result, there are a few months left in this release in which folks may accidentally mix additions and removals in a replica change without proper version gating. This wasn't deemed very likely. Release note: None
e0a09df
to
66acbc0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
Hmm, CI hit a test failure that is probably new, I assume from https://github.com/cockroachdb/cockroach/pulls?q=is%3Apr+author%3Adanhhz+is%3Aclosed
Looks "benign" enough, let's see if I can repro it.
TestReplicateQueueDownReplicate
replicate_queue_test.go:255: expected range log event reason range over-replicated, got abandoned learner replica from info {r21:{m-/Table/SystemConfigSpan/Start} [(n2,s2):1, (n4,s4):2, (n1,s1):5, next=7, gen=16] (n5,s5):6LEARNER abandoned learner replica }
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant #39034.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make stressrace PKG=./pkg/storage/ TESTS=TestReplicateQueueDownReplicate STRESSFLAGS='-stderr=false -p 12'
2 runs completed, 1 failures, over 3m42s
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten)
Going to PR the fix separately (problem not introduced in this PR). bors r=nvanbenschoten |
39481: build: update README r=knz a=knz Release note: None 39611: storage: rework AdminChangeReplicas for atomic membership changes r=nvanbenschoten a=tbg Following the plan laid out in #39485, this adds API support for general replication changes to `AdminChangeReplicasRequest`. `(*DB).AtomicChangeReplicas` will now accept an arbitrary set of additions/removals, though only on paper - the changes will be executed individually. The compatibility story is straightforward since this request is never persisted. We simply populate both the deprecated and the new field (and it isn't safe to emit "mixed" changes until all nodes run 19.2 and it's not worth plumbing a setting around), and in 20.1 we remove the old fields. A maybe-snag is that as a result, there are a few months left in this release in which folks may accidentally mix additions and removals in a replica change without proper version gating. This wasn't deemed very likely; to mitigate we could add in-memory state on the request that fires a panic whenever the changeType changes. Release note: None Co-authored-by: Raphael 'kena' Poss <[email protected]> Co-authored-by: Tobias Schottdorf <[email protected]>
Build succeeded |
This continues the reworking of the various replication change APIs with the goal of allowing a) testing of general atomic replication changes b) issuing replica swaps from the replicate queue (in 19.2). For previous steps, see: cockroachdb#39485 cockroachdb#39611 This change is not a pure plumbing PR. Instead, it unifies `(*Replica).addReplica` and `(*Replica).removeReplica` into a method that can do both, `(*Replica).addAndRemoveReplicas`. Given a slice of ReplicationChanges, this method first adds learner replicas corresponding to the desired new voters. After having sent snapshots to all of them, the method issues a configuration change that atomically - upgrades all learners to voters - removes any undesired replicas. Note that no atomic membership changes are *actually* carried out yet. This is because the callers of `addAndRemoveReplicas` pass in only a single change (i.e. an addition or removal), which the method also verifies. Three pieces are missing after this PR: First, we need to be able to instruct raft to carry out atomic configuration changes: https://github.com/cockroachdb/cockroach/blob/2e8db6ca53c59d3d281e64939f79d937195403d4/pkg/storage/replica_proposal_buf.go#L448-L451 which in particular requires being able to store the ConfState corresponding to a joint configuration in the unreplicated local state (under a new key). Second, we must pass the slice of changes handed to `AdminChangeReplicas` through to `addAndRemoveReplicas` without unrolling it first, see: https://github.com/cockroachdb/cockroach/blob/3b316bac6ef342590ddc68d2989714d6e126371a/pkg/storage/replica_command.go#L870-L891 and https://github.com/cockroachdb/cockroach/blob/3b316bac6ef342590ddc68d2989714d6e126371a/pkg/storage/replica.go#L1314-L1325 Third, we must to teach the replicate queue to issue the "atomic swaps"; this is the reason we're introducing atomic membership changes in the first place. Release note: None
This continues the reworking of the various replication change APIs with the goal of allowing a) testing of general atomic replication changes b) issuing replica swaps from the replicate queue (in 19.2). For previous steps, see: cockroachdb#39485 cockroachdb#39611 This change is not a pure plumbing PR. Instead, it unifies `(*Replica).addReplica` and `(*Replica).removeReplica` into a method that can do both, `(*Replica).addAndRemoveReplicas`. Given a slice of ReplicationChanges, this method first adds learner replicas corresponding to the desired new voters. After having sent snapshots to all of them, the method issues a configuration change that atomically - upgrades all learners to voters - removes any undesired replicas. Note that no atomic membership changes are *actually* carried out yet. This is because the callers of `addAndRemoveReplicas` pass in only a single change (i.e. an addition or removal), which the method also verifies. Three pieces are missing after this PR: First, we need to be able to instruct raft to carry out atomic configuration changes: https://github.com/cockroachdb/cockroach/blob/2e8db6ca53c59d3d281e64939f79d937195403d4/pkg/storage/replica_proposal_buf.go#L448-L451 which in particular requires being able to store the ConfState corresponding to a joint configuration in the unreplicated local state (under a new key). Second, we must pass the slice of changes handed to `AdminChangeReplicas` through to `addAndRemoveReplicas` without unrolling it first, see: https://github.com/cockroachdb/cockroach/blob/3b316bac6ef342590ddc68d2989714d6e126371a/pkg/storage/replica_command.go#L870-L891 and https://github.com/cockroachdb/cockroach/blob/3b316bac6ef342590ddc68d2989714d6e126371a/pkg/storage/replica.go#L1314-L1325 Third, we must to teach the replicate queue to issue the "atomic swaps"; this is the reason we're introducing atomic membership changes in the first place. Release note: None
39640: storage: unify replica addition and removal paths r=nvanbenschoten a=tbg This continues the reworking of the various replication change APIs with the goal of allowing a) testing of general atomic replication changes b) issuing replica swaps from the replicate queue (in 19.2). For previous steps, see: #39485 #39611 This change is not a pure plumbing PR. Instead, it unifies `(*Replica).addReplica` and `(*Replica).removeReplica` into a method that can do both, `(*Replica).addAndRemoveReplicas`. Given a slice of ReplicationChanges, this method first adds learner replicas corresponding to the desired new voters. After having sent snapshots to all of them, the method issues a configuration change that atomically - upgrades all learners to voters - removes any undesired replicas. Note that no atomic membership changes are *actually* carried out yet. This is because the callers of `addAndRemoveReplicas` pass in only a single change (i.e. an addition or removal), which the method also verifies. Three pieces are missing after this PR: First, we need to be able to instruct raft to carry out atomic configuration changes: https://github.com/cockroachdb/cockroach/blob/2e8db6ca53c59d3d281e64939f79d937195403d4/pkg/storage/replica_proposal_buf.go#L448-L451 which in particular requires being able to store the ConfState corresponding to a joint configuration in the unreplicated local state (under a new key). Second, we must pass the slice of changes handed to `AdminChangeReplicas` through to `addAndRemoveReplicas` without unrolling it first, see: https://github.com/cockroachdb/cockroach/blob/3b316bac6ef342590ddc68d2989714d6e126371a/pkg/storage/replica_command.go#L870-L891 and https://github.com/cockroachdb/cockroach/blob/3b316bac6ef342590ddc68d2989714d6e126371a/pkg/storage/replica.go#L1314-L1325 Third, we must to teach the replicate queue to issue the "atomic swaps"; this is the reason we're introducing atomic membership changes in the first place. Release note: None 39656: kv: init heartbeat txn log tag later r=nvanbenschoten a=tbg At init() time, the txn proto has not been populated yet. Found while investigating #39652. This change strikes me as clunky, but I don't have the bandwidth to dig deeper right now. Release note: None 39666: testutils/lint/passes: disable under nightly stress r=mjibson a=mjibson Under stress these error with "go build a: failed to cache compiled Go files". Fixes #39616 Fixes #39541 Fixes #39479 Release note: None 39669: rpc: use gRPC enforced minimum keepalive timeout r=knz a=ajwerner Before this commit we'd experience the following annoying log message from gRPC every time we create a new connection telling us that our setting is being ignored. ``` Adjusting keepalive ping interval to minimum period of 10s ``` Release note: None Co-authored-by: Tobias Schottdorf <[email protected]> Co-authored-by: Matt Jibson <[email protected]> Co-authored-by: Andrew Werner <[email protected]>
Following the plan laid out in #39485, this adds API support for
general replication changes to
AdminChangeReplicasRequest
.(*DB).AtomicChangeReplicas
will now accept an arbitrary setof additions/removals, though only on paper - the changes will
be executed individually.
The compatibility story is straightforward since this request is never
persisted. We simply populate both the deprecated and the new field (and
it isn't safe to emit "mixed" changes until all nodes run 19.2 and it's
not worth plumbing a setting around), and in 20.1 we remove the old
fields. A maybe-snag is that as a result, there are a few months left
in this release in which folks may accidentally mix additions and
removals in a replica change without proper version gating. This wasn't
deemed very likely; to mitigate we could add in-memory state on the
request that fires a panic whenever the changeType changes.
Release note: None