Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: prep ChangeReplicasTrigger for atomic rebalancing #39485

Merged
merged 1 commit into from
Aug 12, 2019

Conversation

tbg
Copy link
Member

@tbg tbg commented Aug 9, 2019

The implementation plan for atomic rebalancing is to first expose a
testing-only API that lets us carry out general atomic membership changes
(i.e. not limited to a single swap which is all that we expect to want to
use in production in 19.2).

This allows us to really stress the machinery in general and also prevents
more API headaches down the road should we want to support more general
changes. Besides, the change to an API that supports a single swap only is
already similarly invasive.

This is the first in a sequence of PRs/commits that plumb the API through
the system (but without actually supporting atomic changes yet). The whole
pipeline that needs to be plumbed is

  • AdminChangeReplicasRequest which calls into
  • (*Replica).ChangeReplicas which calls into
  • the internal code adding learners, sending them snapshots, etc, and
    which finally evaluates into a
  • ChangeReplicasTrigger that needs to eventually be translated
    into a
  • raftpb.ConfChangeV2 (we use a v1 ConfChange today).

This particular change deprecates the Replica and ChangeType field of
the trigger, and introduces instead two slices of ReplicaDescriptors, one
for added and one for removed replicas. This will be easy to translate into
a ConfChangeV2 and seemed more idiomatic (and less error-prone) than
including the base desc and computing a sequence of changes from that
whenever it is needed. It also matches what the higher levels of the API
will do.

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@tbg tbg requested a review from nvanbenschoten August 9, 2019 12:33
@tbg tbg force-pushed the atomic/changetrigger branch 3 times, most recently from a932012 to 4cd451f Compare August 9, 2019 21:23
Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: all nits. Thanks for the consumable change.

Reviewed 14 of 14 files at r1, 5 of 5 files at r2.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @tbg)


docs/generated/settings/settings.html, line 125 at r2 (raw file):

<tr><td><code>trace.lightstep.token</code></td><td>string</td><td><code></code></td><td>if set, traces go to Lightstep using this token</td></tr>
<tr><td><code>trace.zipkin.collector</code></td><td>string</td><td><code></code></td><td>if set, traces go to the given Zipkin instance (example: '127.0.0.1:9411'); ignored if trace.lightstep.token is set</td></tr>
<tr><td><code>version</code></td><td>custom validation</td><td><code>19.1-8</code></td><td>set the active cluster version in the format '<major>.<minor>'</td></tr>

nit: it looks like this landed in the wrong commit.


pkg/roachpb/data.go, line 1310 at r1 (raw file):

	if crt.Desc != nil {
		nextReplicaID = crt.Desc.NextReplicaID
		// NB: we don't want to mutate InternalReplicas, so we don't call

Will this change once #39489 is merged?


pkg/roachpb/data.go, line 1327 at r1 (raw file):

		fmt.Fprintf(&chgS, "%s%s", REMOVE_REPLICA, removed)
	}
	return fmt.Sprintf("%s: after=%s next=%d", chgS.String(), afterReplicas, nextReplicaID)

nit: you might as well keep using the strings.Builder:

fmt.Fprintf(&chgS, ": after=%s next=%d", afterReplicas, nextReplicaID)
return chgS.String()

pkg/roachpb/data.proto, line 165 at r1 (raw file):

  option (gogoproto.goproto_stringer) = false;

  // TODO(tbg): remove when possible. 19.2 won't actively use this any more,

Mind mentioning when this will be possible?


pkg/roachpb/data.proto, line 189 at r1 (raw file):

  // set when this field is used). Must not specify the same replica more than once or
  // overlap with the added replicas above.
  repeated ReplicaDescriptor internal_removed_replicas = 7 [(gogoproto.nullable) = false];

The plan is to remove the "internal" prefix from these eventually, right?


pkg/settings/cluster/cockroach_versions.go, line 509 at r1 (raw file):

	},
	{
		// VersionAtomicChangeReplicasTrigger is <TODO>.

Reminder to fill this in.


pkg/storage/replica_raft.go, line 293 at r1 (raw file):

		prefix = false

		// Ensure that we aren't trying to remove ourselves from the range without

It seems strange to me that this lives here. I think I tried moving it down to batch evaluation at one point, but ran into issues because the commit triggers are a little panic-y. If you're working in this area and have it all paged in, do you mind taking a pass at giving this check a better home?

The implementation plan for atomic rebalancing is to first expose a
testing-only API that lets us carry out general atomic membership
changes (i.e. not limited to a single swap which is all that we
expect to want to use in production in 19.2).

This allows us to really stress the machinery in general and also
prevents more API headaches down the road should we want to support
more general changes. Besides, the change to an API that supports
a single swap only is already similarly invasive.

This is the first in a sequence of PRs/commits that plumb the API
through the system (but without actually supporting atomic changes yet).
The whole pipeline that needs to be plumbed is

- `AdminChangeReplicasRequest` which calls into
- `(*Replica).ChangeReplicas` which calls into
- the internal code adding learners, sending them snapshots, etc, and
  which finally evaluates into a
- `ChangeReplicasTrigger` that needs to eventually be translated
  into a
- `raftpb.ConfChangeV2` (we use a v1 `ConfChange` today).

This particular change deprecates the `Replica` and `ChangeType` field
of the trigger, and introduces instead two slices of ReplicaDescriptors,
one for added and one for removed replicas. This will be easy to
translate into a `ConfChangeV2` and seemed more idiomatic (and less
error-prone) than including the base desc and computing a sequence of
changes from that whenever it is needed. It also matches what the higher
levels of the API will do.

As a drive-by, also remove ConfChangeContext.Replica, which has been
unused essentially forever (<2.0).

Release note: None
@tbg tbg requested a review from nvanbenschoten August 12, 2019 18:00
Copy link
Member Author

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! All addressed.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten)


pkg/roachpb/data.go, line 1310 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Will this change once #39489 is merged?

Yes! Added a TODO to revisit.


pkg/roachpb/data.proto, line 165 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Mind mentioning when this will be possible?

Added a reference to #39182.


pkg/roachpb/data.proto, line 189 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

The plan is to remove the "internal" prefix from these eventually, right?

Yep, added TODO to the deprecated fields.


pkg/storage/replica_raft.go, line 293 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

It seems strange to me that this lives here. I think I tried moving it down to batch evaluation at one point, but ran into issues because the commit triggers are a little panic-y. If you're working in this area and have it all paged in, do you mind taking a pass at giving this check a better home?

I share the sentiment, will take another look when I'm less pressed for time. This should be straightforward.

@tbg tbg force-pushed the atomic/changetrigger branch from 4cd451f to 2e8db6c Compare August 12, 2019 18:00
@tbg
Copy link
Member Author

tbg commented Aug 12, 2019

bors r=nvanbenschoten

craig bot pushed a commit that referenced this pull request Aug 12, 2019
39485: storage: prep ChangeReplicasTrigger for atomic rebalancing r=nvanbenschoten a=tbg

The implementation plan for atomic rebalancing is to first expose a
testing-only API that lets us carry out general atomic membership changes
(i.e. not limited to a single swap which is all that we expect to want to
use in production in 19.2).

This allows us to really stress the machinery in general and also prevents
more API headaches down the road should we want to support more general
changes. Besides, the change to an API that supports a single swap only is
already similarly invasive.

This is the first in a sequence of PRs/commits that plumb the API through
the system (but without actually supporting atomic changes yet). The whole
pipeline that needs to be plumbed is

- `AdminChangeReplicasRequest` which calls into
- `(*Replica).ChangeReplicas` which calls into
- the internal code adding learners, sending them snapshots, etc, and
which finally evaluates into a
- `ChangeReplicasTrigger` that needs to eventually be translated
into a
- `raftpb.ConfChangeV2` (we use a v1 `ConfChange` today).

This particular change deprecates the `Replica` and `ChangeType` field of
the trigger, and introduces instead two slices of ReplicaDescriptors, one
for added and one for removed replicas. This will be easy to translate into
a `ConfChangeV2` and seemed more idiomatic (and less error-prone) than
including the base desc and computing a sequence of changes from that
whenever it is needed. It also matches what the higher levels of the API
will do.

Release note: None

Co-authored-by: Tobias Schottdorf <[email protected]>
@craig
Copy link
Contributor

craig bot commented Aug 12, 2019

Build succeeded

@craig craig bot merged commit 2e8db6c into cockroachdb:master Aug 12, 2019
tbg added a commit to tbg/cockroach that referenced this pull request Aug 12, 2019
Following the plan laid out in cockroachdb#39485, this adds API support for
general replication changes to `AdminChangeReplicasRequest`.

`(*DB).AtomicChangeReplicas` will now accept an arbitrary set
of additions/removals, though only on paper - the changes will
be executed individually.

The compatibility story is straightforward since this request is never
persisted. We simply populate both the deprecated and the new field (and
it isn't safe to emit "mixed" changes until all nodes run 19.2 and it's
not worth plumbing a setting around), and in 20.1 we remove the old
fields.  A maybe-snag is that as a result, there are a few months left
in this release in which folks may accidentally mix additions and
removals in a replica change without proper version gating. This wasn't
deemed very likely; to mitigate we could add in-memory state on the
request that fires a panic whenever the changeType changes.

Release note: None
tbg added a commit to tbg/cockroach that referenced this pull request Aug 13, 2019
Following the plan laid out in cockroachdb#39485, this adds API support for
general replication changes to `AdminChangeReplicasRequest`.

`(*DB).AtomicChangeReplicas` will now accept an arbitrary set
of additions/removals, though only on paper - the changes will
be executed individually.

In 19.2, production code will not use this request - it exists solely
for testing. The single user of atomic replication changes will be the
replicate queue, which has direct access to the replication change code
on the local replicas and thus does not need to use this request type.

The compatibility story is thus straightforward (this request is never
persisted): We simply populate both the deprecated and the new field (and
it isn't safe to emit "mixed" changes until all nodes run 19.2 and it's
not worth plumbing a setting around), and in 20.1 we remove the old
fields.  A maybe-snag is that as a result, there are a few months left
in this release in which folks may accidentally mix additions and
removals in a replica change without proper version gating. This wasn't
deemed very likely.

Release note: None
tbg added a commit to tbg/cockroach that referenced this pull request Aug 13, 2019
Following the plan laid out in cockroachdb#39485, this adds API support for
general replication changes to `AdminChangeReplicasRequest`.

`(*DB).AtomicChangeReplicas` will now accept an arbitrary set
of additions/removals, though only on paper - the changes will
be executed individually.

In 19.2, production code will not use this request - it exists solely
for testing. The single user of atomic replication changes will be the
replicate queue, which has direct access to the replication change code
on the local replicas and thus does not need to use this request type.

The compatibility story is thus straightforward (this request is never
persisted): We simply populate both the deprecated and the new field (and
it isn't safe to emit "mixed" changes until all nodes run 19.2 and it's
not worth plumbing a setting around), and in 20.1 we remove the old
fields.  A maybe-snag is that as a result, there are a few months left
in this release in which folks may accidentally mix additions and
removals in a replica change without proper version gating. This wasn't
deemed very likely.

Release note: None
craig bot pushed a commit that referenced this pull request Aug 13, 2019
39481: build: update README r=knz a=knz

Release note: None

39611: storage: rework AdminChangeReplicas for atomic membership changes r=nvanbenschoten a=tbg

Following the plan laid out in #39485, this adds API support for
general replication changes to `AdminChangeReplicasRequest`.

`(*DB).AtomicChangeReplicas` will now accept an arbitrary set
of additions/removals, though only on paper - the changes will
be executed individually.

The compatibility story is straightforward since this request is never
persisted. We simply populate both the deprecated and the new field (and
it isn't safe to emit "mixed" changes until all nodes run 19.2 and it's
not worth plumbing a setting around), and in 20.1 we remove the old
fields.  A maybe-snag is that as a result, there are a few months left
in this release in which folks may accidentally mix additions and
removals in a replica change without proper version gating. This wasn't
deemed very likely; to mitigate we could add in-memory state on the
request that fires a panic whenever the changeType changes.

Release note: None

Co-authored-by: Raphael 'kena' Poss <[email protected]>
Co-authored-by: Tobias Schottdorf <[email protected]>
tbg added a commit to tbg/cockroach that referenced this pull request Aug 13, 2019
This continues the reworking of the various replication change APIs with
the goal of allowing
a) testing of general atomic replication changes
b) issuing replica swaps from the replicate queue (in 19.2).

For previous steps, see:

cockroachdb#39485
cockroachdb#39611

This change is not a pure plumbing PR. Instead, it unifies
`(*Replica).addReplica` and `(*Replica).removeReplica` into a method
that can do both, `(*Replica).addAndRemoveReplicas`.

Given a slice of ReplicationChanges, this method first adds learner
replicas corresponding to the desired new voters. After having sent
snapshots to all of them, the method issues a configuration change that
atomically
- upgrades all learners to voters
- removes any undesired replicas.

Note that no atomic membership changes are *actually* carried out yet.
This is because the callers of `addAndRemoveReplicas` pass in only a
single change (i.e. an addition or removal), which the method also
verifies.

Three pieces are missing after this PR: First, we need to be able to
instruct raft to carry out atomic configuration changes:

https://github.com/cockroachdb/cockroach/blob/2e8db6ca53c59d3d281e64939f79d937195403d4/pkg/storage/replica_proposal_buf.go#L448-L451

which in particular requires being able to store the ConfState
corresponding to a joint configuration in the unreplicated local state
(under a new key).

Second, we must pass the slice of changes handed to
`AdminChangeReplicas` through to `addAndRemoveReplicas` without
unrolling it first, see:

https://github.com/cockroachdb/cockroach/blob/3b316bac6ef342590ddc68d2989714d6e126371a/pkg/storage/replica_command.go#L870-L891

and

https://github.com/cockroachdb/cockroach/blob/3b316bac6ef342590ddc68d2989714d6e126371a/pkg/storage/replica.go#L1314-L1325

Third, we must to teach the replicate queue to issue the "atomic
swaps"; this is the reason we're introducing atomic membership changes
in the first place.

Release note: None
tbg added a commit to tbg/cockroach that referenced this pull request Aug 14, 2019
This continues the reworking of the various replication change APIs with
the goal of allowing
a) testing of general atomic replication changes
b) issuing replica swaps from the replicate queue (in 19.2).

For previous steps, see:

cockroachdb#39485
cockroachdb#39611

This change is not a pure plumbing PR. Instead, it unifies
`(*Replica).addReplica` and `(*Replica).removeReplica` into a method
that can do both, `(*Replica).addAndRemoveReplicas`.

Given a slice of ReplicationChanges, this method first adds learner
replicas corresponding to the desired new voters. After having sent
snapshots to all of them, the method issues a configuration change that
atomically
- upgrades all learners to voters
- removes any undesired replicas.

Note that no atomic membership changes are *actually* carried out yet.
This is because the callers of `addAndRemoveReplicas` pass in only a
single change (i.e. an addition or removal), which the method also
verifies.

Three pieces are missing after this PR: First, we need to be able to
instruct raft to carry out atomic configuration changes:

https://github.com/cockroachdb/cockroach/blob/2e8db6ca53c59d3d281e64939f79d937195403d4/pkg/storage/replica_proposal_buf.go#L448-L451

which in particular requires being able to store the ConfState
corresponding to a joint configuration in the unreplicated local state
(under a new key).

Second, we must pass the slice of changes handed to
`AdminChangeReplicas` through to `addAndRemoveReplicas` without
unrolling it first, see:

https://github.com/cockroachdb/cockroach/blob/3b316bac6ef342590ddc68d2989714d6e126371a/pkg/storage/replica_command.go#L870-L891

and

https://github.com/cockroachdb/cockroach/blob/3b316bac6ef342590ddc68d2989714d6e126371a/pkg/storage/replica.go#L1314-L1325

Third, we must to teach the replicate queue to issue the "atomic
swaps"; this is the reason we're introducing atomic membership changes
in the first place.

Release note: None
craig bot pushed a commit that referenced this pull request Aug 14, 2019
39640: storage: unify replica addition and removal paths r=nvanbenschoten a=tbg

This continues the reworking of the various replication change APIs with
the goal of allowing a) testing of general atomic replication changes b)
issuing replica swaps from the replicate queue (in 19.2).

For previous steps, see:

#39485
#39611

This change is not a pure plumbing PR. Instead, it unifies
`(*Replica).addReplica` and `(*Replica).removeReplica` into a method that
can do both, `(*Replica).addAndRemoveReplicas`.

Given a slice of ReplicationChanges, this method first adds learner
replicas corresponding to the desired new voters. After having sent
snapshots to all of them, the method issues a configuration change that
atomically
- upgrades all learners to voters
- removes any undesired replicas.

Note that no atomic membership changes are *actually* carried out yet. This
is because the callers of `addAndRemoveReplicas` pass in only a single
change (i.e. an addition or removal), which the method also verifies.

Three pieces are missing after this PR: First, we need to be able to
instruct raft to carry out atomic configuration changes:

https://github.com/cockroachdb/cockroach/blob/2e8db6ca53c59d3d281e64939f79d937195403d4/pkg/storage/replica_proposal_buf.go#L448-L451

which in particular requires being able to store the ConfState
corresponding to a joint configuration in the unreplicated local state
(under a new key).

Second, we must pass the slice of changes handed to
`AdminChangeReplicas` through to `addAndRemoveReplicas` without unrolling
it first, see:

https://github.com/cockroachdb/cockroach/blob/3b316bac6ef342590ddc68d2989714d6e126371a/pkg/storage/replica_command.go#L870-L891

and

https://github.com/cockroachdb/cockroach/blob/3b316bac6ef342590ddc68d2989714d6e126371a/pkg/storage/replica.go#L1314-L1325

Third, we must to teach the replicate queue to issue the "atomic swaps";
this is the reason we're introducing atomic membership changes in the first
place.

Release note: None

39656: kv: init heartbeat txn log tag later r=nvanbenschoten a=tbg

At init() time, the txn proto has not been populated yet.
Found while investigating #39652.

This change strikes me as clunky, but I don't have the bandwidth to dig deeper
right now.

Release note: None

39666: testutils/lint/passes: disable under nightly stress r=mjibson a=mjibson

Under stress these error with "go build a: failed to cache compiled Go files".

Fixes #39616
Fixes #39541
Fixes #39479

Release note: None

39669: rpc: use gRPC enforced minimum keepalive timeout r=knz a=ajwerner

Before this commit we'd experience the following annoying log message from gRPC
every time we create a new connection telling us that our setting is being
ignored.

```
Adjusting keepalive ping interval to minimum period of 10s
```

Release note: None

Co-authored-by: Tobias Schottdorf <[email protected]>
Co-authored-by: Matt Jibson <[email protected]>
Co-authored-by: Andrew Werner <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants