Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: rework AdminChangeReplicas for atomic membership changes #39611

Merged
merged 1 commit into from
Aug 13, 2019

Conversation

tbg
Copy link
Member

@tbg tbg commented Aug 12, 2019

Following the plan laid out in #39485, this adds API support for
general replication changes to AdminChangeReplicasRequest.

(*DB).AtomicChangeReplicas will now accept an arbitrary set
of additions/removals, though only on paper - the changes will
be executed individually.

The compatibility story is straightforward since this request is never
persisted. We simply populate both the deprecated and the new field (and
it isn't safe to emit "mixed" changes until all nodes run 19.2 and it's
not worth plumbing a setting around), and in 20.1 we remove the old
fields. A maybe-snag is that as a result, there are a few months left
in this release in which folks may accidentally mix additions and
removals in a replica change without proper version gating. This wasn't
deemed very likely; to mitigate we could add in-memory state on the
request that fires a panic whenever the changeType changes.

Release note: None

@tbg tbg requested a review from nvanbenschoten August 12, 2019 22:35
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The compatibility story is straightforward since this request is never
persisted. We simply populate both the deprecated and the new field (and
it isn't safe to emit "mixed" changes until all nodes run 19.2 and it's
not worth plumbing a setting around), and in 20.1 we remove the old
fields.

How are you planning on using these new fields? For an AdminChangeReplicas request that adds and removes replicas, we'll need to break compatibility with existing versions of CRDB, so we'll need a gated cluster version. Maybe I'm missing something.

Reviewed 14 of 14 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @tbg)


pkg/roachpb/api.go, line 1192 at r1 (raw file):

// item for each target.
func MakeReplicationChanges(
	changeType ReplicaChangeType, targets []ReplicationTarget,

nit: consider making the targets arg variadic.


pkg/roachpb/api.go, line 1209 at r1 (raw file):
Should we assert that this is consistent across all changes until we can get rid of it? This is what you meant by the following, right?

A maybe-snag is that as a result, there are a few months left
in this release in which folks may accidentally mix additions and
removals in a replica change without proper version gating. This wasn't
deemed very likely; to mitigate we could add in-memory state on the
request that fires a panic whenever the changeType changes.

What we have here seems like a footgun because we're blindly ignoring the change type of all but the first change. If we're not careful at the caller, we could end up performing a replica change that we didn't intend on.


pkg/roachpb/api.proto, line 750 at r1 (raw file):

}

message ReplicationChange {

Give this a comment.

@tbg tbg requested a review from nvanbenschoten August 13, 2019 12:39
Copy link
Member Author

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are not using them except for testing. The only production user of atomic replication changes in 19.2 is the replicate queue, but it doesn't use this request type but calls straight into (*Replica).addReplica:

if _, err := repl.addReplica(ctx, target, desc, priority, reason, details); err != nil {
return err
}

I should've added that bit of information to the commit message (done now).

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)


pkg/roachpb/api.go, line 1209 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Should we assert that this is consistent across all changes until we can get rid of it? This is what you meant by the following, right?

A maybe-snag is that as a result, there are a few months left
in this release in which folks may accidentally mix additions and
removals in a replica change without proper version gating. This wasn't
deemed very likely; to mitigate we could add in-memory state on the
request that fires a panic whenever the changeType changes.

What we have here seems like a footgun because we're blindly ignoring the change type of all but the first change. If we're not careful at the caller, we could end up performing a replica change that we didn't intend on.

I want to use this for testing, so I don't want to outright prevent mixing (see TestAtomicMembershipChange). But since I need it only in tests, I added what I feel is an acceptable hack: we look for a "testing" key in the context. If it isn't there, we make sure you don't mix adds and removes. Otherwise, you get to do it. We could also have two flavors of this method, but then we need to worry similarly about the general flavor being called in prod before it's safe. PTAL

@tbg tbg force-pushed the atomic/changereplicas branch from 1309abd to e0a09df Compare August 13, 2019 12:39
Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 10 of 10 files at r2.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten and @tbg)


pkg/internal/client/db.go, line 538 at r2 (raw file):

		// TODO(tbg): remove this when 19.2 is released.
		var typ *roachpb.ReplicaChangeType
		for _, chg := range chgs {

nit:

for i := range chgs {
    chg := &chgs[i]
    if typ == nil {
        typ = &chg.ChangeType

to avoid the allocation.

Following the plan laid out in cockroachdb#39485, this adds API support for
general replication changes to `AdminChangeReplicasRequest`.

`(*DB).AtomicChangeReplicas` will now accept an arbitrary set
of additions/removals, though only on paper - the changes will
be executed individually.

In 19.2, production code will not use this request - it exists solely
for testing. The single user of atomic replication changes will be the
replicate queue, which has direct access to the replication change code
on the local replicas and thus does not need to use this request type.

The compatibility story is thus straightforward (this request is never
persisted): We simply populate both the deprecated and the new field (and
it isn't safe to emit "mixed" changes until all nodes run 19.2 and it's
not worth plumbing a setting around), and in 20.1 we remove the old
fields.  A maybe-snag is that as a result, there are a few months left
in this release in which folks may accidentally mix additions and
removals in a replica change without proper version gating. This wasn't
deemed very likely.

Release note: None
@tbg tbg force-pushed the atomic/changereplicas branch from e0a09df to 66acbc0 Compare August 13, 2019 14:27
@tbg tbg requested a review from nvanbenschoten August 13, 2019 15:04
Copy link
Member Author

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

Hmm, CI hit a test failure that is probably new, I assume from https://github.com/cockroachdb/cockroach/pulls?q=is%3Apr+author%3Adanhhz+is%3Aclosed

Looks "benign" enough, let's see if I can repro it.

TestReplicateQueueDownReplicate
replicate_queue_test.go:255: expected range log event reason range over-replicated, got abandoned learner replica from info {r21:{m-/Table/SystemConfigSpan/Start} [(n2,s2):1, (n4,s4):2, (n1,s1):5, next=7, gen=16] (n5,s5):6LEARNER abandoned learner replica }

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten)

Copy link
Member Author

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant #39034.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten)

Copy link
Member Author

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make stressrace PKG=./pkg/storage/ TESTS=TestReplicateQueueDownReplicate STRESSFLAGS='-stderr=false -p 12'
2 runs completed, 1 failures, over 3m42s

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten)

@tbg
Copy link
Member Author

tbg commented Aug 13, 2019

Going to PR the fix separately (problem not introduced in this PR).

bors r=nvanbenschoten

craig bot pushed a commit that referenced this pull request Aug 13, 2019
39481: build: update README r=knz a=knz

Release note: None

39611: storage: rework AdminChangeReplicas for atomic membership changes r=nvanbenschoten a=tbg

Following the plan laid out in #39485, this adds API support for
general replication changes to `AdminChangeReplicasRequest`.

`(*DB).AtomicChangeReplicas` will now accept an arbitrary set
of additions/removals, though only on paper - the changes will
be executed individually.

The compatibility story is straightforward since this request is never
persisted. We simply populate both the deprecated and the new field (and
it isn't safe to emit "mixed" changes until all nodes run 19.2 and it's
not worth plumbing a setting around), and in 20.1 we remove the old
fields.  A maybe-snag is that as a result, there are a few months left
in this release in which folks may accidentally mix additions and
removals in a replica change without proper version gating. This wasn't
deemed very likely; to mitigate we could add in-memory state on the
request that fires a panic whenever the changeType changes.

Release note: None

Co-authored-by: Raphael 'kena' Poss <[email protected]>
Co-authored-by: Tobias Schottdorf <[email protected]>
@craig
Copy link
Contributor

craig bot commented Aug 13, 2019

Build succeeded

@craig craig bot merged commit 66acbc0 into cockroachdb:master Aug 13, 2019
tbg added a commit to tbg/cockroach that referenced this pull request Aug 13, 2019
This continues the reworking of the various replication change APIs with
the goal of allowing
a) testing of general atomic replication changes
b) issuing replica swaps from the replicate queue (in 19.2).

For previous steps, see:

cockroachdb#39485
cockroachdb#39611

This change is not a pure plumbing PR. Instead, it unifies
`(*Replica).addReplica` and `(*Replica).removeReplica` into a method
that can do both, `(*Replica).addAndRemoveReplicas`.

Given a slice of ReplicationChanges, this method first adds learner
replicas corresponding to the desired new voters. After having sent
snapshots to all of them, the method issues a configuration change that
atomically
- upgrades all learners to voters
- removes any undesired replicas.

Note that no atomic membership changes are *actually* carried out yet.
This is because the callers of `addAndRemoveReplicas` pass in only a
single change (i.e. an addition or removal), which the method also
verifies.

Three pieces are missing after this PR: First, we need to be able to
instruct raft to carry out atomic configuration changes:

https://github.com/cockroachdb/cockroach/blob/2e8db6ca53c59d3d281e64939f79d937195403d4/pkg/storage/replica_proposal_buf.go#L448-L451

which in particular requires being able to store the ConfState
corresponding to a joint configuration in the unreplicated local state
(under a new key).

Second, we must pass the slice of changes handed to
`AdminChangeReplicas` through to `addAndRemoveReplicas` without
unrolling it first, see:

https://github.com/cockroachdb/cockroach/blob/3b316bac6ef342590ddc68d2989714d6e126371a/pkg/storage/replica_command.go#L870-L891

and

https://github.com/cockroachdb/cockroach/blob/3b316bac6ef342590ddc68d2989714d6e126371a/pkg/storage/replica.go#L1314-L1325

Third, we must to teach the replicate queue to issue the "atomic
swaps"; this is the reason we're introducing atomic membership changes
in the first place.

Release note: None
tbg added a commit to tbg/cockroach that referenced this pull request Aug 14, 2019
This continues the reworking of the various replication change APIs with
the goal of allowing
a) testing of general atomic replication changes
b) issuing replica swaps from the replicate queue (in 19.2).

For previous steps, see:

cockroachdb#39485
cockroachdb#39611

This change is not a pure plumbing PR. Instead, it unifies
`(*Replica).addReplica` and `(*Replica).removeReplica` into a method
that can do both, `(*Replica).addAndRemoveReplicas`.

Given a slice of ReplicationChanges, this method first adds learner
replicas corresponding to the desired new voters. After having sent
snapshots to all of them, the method issues a configuration change that
atomically
- upgrades all learners to voters
- removes any undesired replicas.

Note that no atomic membership changes are *actually* carried out yet.
This is because the callers of `addAndRemoveReplicas` pass in only a
single change (i.e. an addition or removal), which the method also
verifies.

Three pieces are missing after this PR: First, we need to be able to
instruct raft to carry out atomic configuration changes:

https://github.com/cockroachdb/cockroach/blob/2e8db6ca53c59d3d281e64939f79d937195403d4/pkg/storage/replica_proposal_buf.go#L448-L451

which in particular requires being able to store the ConfState
corresponding to a joint configuration in the unreplicated local state
(under a new key).

Second, we must pass the slice of changes handed to
`AdminChangeReplicas` through to `addAndRemoveReplicas` without
unrolling it first, see:

https://github.com/cockroachdb/cockroach/blob/3b316bac6ef342590ddc68d2989714d6e126371a/pkg/storage/replica_command.go#L870-L891

and

https://github.com/cockroachdb/cockroach/blob/3b316bac6ef342590ddc68d2989714d6e126371a/pkg/storage/replica.go#L1314-L1325

Third, we must to teach the replicate queue to issue the "atomic
swaps"; this is the reason we're introducing atomic membership changes
in the first place.

Release note: None
craig bot pushed a commit that referenced this pull request Aug 14, 2019
39640: storage: unify replica addition and removal paths r=nvanbenschoten a=tbg

This continues the reworking of the various replication change APIs with
the goal of allowing a) testing of general atomic replication changes b)
issuing replica swaps from the replicate queue (in 19.2).

For previous steps, see:

#39485
#39611

This change is not a pure plumbing PR. Instead, it unifies
`(*Replica).addReplica` and `(*Replica).removeReplica` into a method that
can do both, `(*Replica).addAndRemoveReplicas`.

Given a slice of ReplicationChanges, this method first adds learner
replicas corresponding to the desired new voters. After having sent
snapshots to all of them, the method issues a configuration change that
atomically
- upgrades all learners to voters
- removes any undesired replicas.

Note that no atomic membership changes are *actually* carried out yet. This
is because the callers of `addAndRemoveReplicas` pass in only a single
change (i.e. an addition or removal), which the method also verifies.

Three pieces are missing after this PR: First, we need to be able to
instruct raft to carry out atomic configuration changes:

https://github.com/cockroachdb/cockroach/blob/2e8db6ca53c59d3d281e64939f79d937195403d4/pkg/storage/replica_proposal_buf.go#L448-L451

which in particular requires being able to store the ConfState
corresponding to a joint configuration in the unreplicated local state
(under a new key).

Second, we must pass the slice of changes handed to
`AdminChangeReplicas` through to `addAndRemoveReplicas` without unrolling
it first, see:

https://github.com/cockroachdb/cockroach/blob/3b316bac6ef342590ddc68d2989714d6e126371a/pkg/storage/replica_command.go#L870-L891

and

https://github.com/cockroachdb/cockroach/blob/3b316bac6ef342590ddc68d2989714d6e126371a/pkg/storage/replica.go#L1314-L1325

Third, we must to teach the replicate queue to issue the "atomic swaps";
this is the reason we're introducing atomic membership changes in the first
place.

Release note: None

39656: kv: init heartbeat txn log tag later r=nvanbenschoten a=tbg

At init() time, the txn proto has not been populated yet.
Found while investigating #39652.

This change strikes me as clunky, but I don't have the bandwidth to dig deeper
right now.

Release note: None

39666: testutils/lint/passes: disable under nightly stress r=mjibson a=mjibson

Under stress these error with "go build a: failed to cache compiled Go files".

Fixes #39616
Fixes #39541
Fixes #39479

Release note: None

39669: rpc: use gRPC enforced minimum keepalive timeout r=knz a=ajwerner

Before this commit we'd experience the following annoying log message from gRPC
every time we create a new connection telling us that our setting is being
ignored.

```
Adjusting keepalive ping interval to minimum period of 10s
```

Release note: None

Co-authored-by: Tobias Schottdorf <[email protected]>
Co-authored-by: Matt Jibson <[email protected]>
Co-authored-by: Andrew Werner <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants