Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

raft: allow use of joint quorums #10914

Merged
merged 4 commits into from
Jul 23, 2019
Merged

raft: allow use of joint quorums #10914

merged 4 commits into from
Jul 23, 2019

Conversation

tbg
Copy link
Contributor

@tbg tbg commented Jul 22, 2019

This change introduces joint quorums by changing the Node and RawNode API
to accept pb.ConfChangeV2 (on top of pb.ConfChange).

pb.ConfChange continues to work as today: it allows carrying out a single
configuration change. A pb.ConfChange proposal gets added to the Raft log
as such and is thus also observed by the app during Ready handling, and fed
back to ApplyConfChange.

ConfChangeV2 allows joint configuration changes but will continue to carry
out configuration changes in "one phase" (i.e. without ever entering a
joint config) when this is possible.

@@ -102,6 +102,16 @@ func (c MajorityConfig) Describe(l AckedIndexer) string {
return buf.String()
}

// Slice returns the MajorityConfig as a sorted slice.
func (c MajorityConfig) Slice() []uint64 {
var sl []uint64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we do the same thing as String method? sl := make([]uint64, 0, len(c))?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to return nil when len(c) == 0. But we have similar code in so many places right now that I'll take a (later) pass where I factor out some helpers for map[uint64]struct{}{}.

raft/raftpb/raft.proto Outdated Show resolved Hide resolved
raft/raftpb/raft.proto Show resolved Hide resolved
raft/raftpb/raft.proto Show resolved Hide resolved
message ConfChangeV2 {
optional ConfChangeTransition transition = 1 [(gogoproto.nullable) = false];
repeated ConfChangeSingle changes = 2 [(gogoproto.nullable) = false];
optional bytes context = 3;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a top-level context on the ConfChangeV2, in addition to ConfChangeSingle.context? The existence of ConfChangeUpdateNode implies that this field is meant to be a mutable per-node state. Do we have a concrete use case for this non-node context or is it just copying the existing pattern?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ConfChangeSingle does not have a context, this is precisely why I introduced it. The Context field here is what holds the actual entry that piggybacks on top of the config change (at least in CRDB). Does that answer your question?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I had it in my head that ConfChangeSingle was a rename of the V1 ConfChange.

I think that unless we're getting rid of ConfChangeUpdateNode, the context needs to be on the ConfChangeSingle instead of ConfChangeV2, so that someone using ConfChangeUpdateNode would be able to bundle multiple updates into a single joint update.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the point of bundling multiple ConfChangeUpdateNode? I don't think there's ever any interest in doing that in etcd, and if you do want to do that in a new application you just encode a slice of payloads into your top-level Context. On the other hand, if you only have one Context (as CRDB will) you have to awkwardly stash it into the first single, which sucks.

If you prefer, I can make UpdateNode unsupported in joint configs, but it seems like shaving a yak that doesn't need shaving. An app can always emulate UpdateNode as a sequence of add-remove operations touching a NodeID the app knows doesn't exist.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If ConfChangeSingle doesn't have a context, what does an UpdateNode in a joint config even mean? I'm not sure how exactly etcd is using ConfChangeUpdateNode, but I think that anything but adding a context to ConfChangeSingle will be a partial move towards getting rid of it.

Anyway, this isn't a blocking comment. If anyone's going to fight for ConfChangeUpdateNode, it should be the etcd team :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but I think that anything but adding a context to ConfChangeSingle will be a partial move towards getting rid of it.

Oh yeah, that's the intention :-) But as I said, you can just use ConfChangeRemoveNode(99999) instead if you know you're never going to actually use NodeID 9999.

etcd really doesn't have to use config changes for UpdateNode, it was just a lazy thing to do because it allows them to reuse all of their plumbing for config changes to get the payload where it needs to go. Similar to that ID field which as you'll have noticed is also not available with V2.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If ConfChangeSingle doesn't have a context, what does an UpdateNode in a joint config even mean?

Say you have a ConfChangeV2 with two UpdateNode singles. You just make the context something like []myproto.NodeInfo{{Host: "node1.mynewdns"}, {Host: "node2.mynewdns"}}.Marshal() where the two legacy conf changes would've had one of each. This is similar to how in CRDB a joint conf change replacing a node will have a context encoding a change replicas trigger that has as its new desc one reflecting both a removal and addition of a voter.
This seems sort of obvious to me, but you've been asking about it twice, so I want to make sure I'm not missing something.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I get that. But this means that at the same time you transition from v1 to v2 conf changes, you have to transition your context from a single-node value to a map from node id to value.

I guess my question/confusion is why not put the context on the ConfChangeSingle? For users of UpdateNode, it's an easier transition, and I can't see any downside to it for people who don't use it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess my question/confusion is why not put the context on the ConfChangeSingle? For users of UpdateNode

One very practical reason I haven't mentioned is that if you want to be able to leave a joint state while injecting a Context, you won't be able to if it sits on the singles. Leaving a joint state has zero singles.

I also generally believe that one top-level opaque payload is the right way to do it semantically because it's straightforward to use it to get per-single payloads (you just encode a slice of len(singles) as your top-level Context), it's easy to migrate into if you want that (match on EntryConfChangeV2) and if you don't have per-single payloads, the alternative would force you into awkwardly choosing which of the multiple possible Contexts to populate (and also forces you to think about what if there's more than one, etc).

I'm also not understanding your points about UpdateNode. There's no strong reason to ever want to run more than one UpdateNode at once. I think there's nothing special about UpdateNode here, you could take the way etcd uses any other config change similarly (the payload is an encoded info blob about the affected member). But as mentioned above, the migration is easy and they need a migration already to use V2 if they ever want that. At that point they will also have to encode that "ID" field into the top-level Context (and this one is a singleton). Again, this is easy, but it just means switching to something like a struct{UniqueID uint64; Infos []NodeInfo} as the Context.

Happy to discuss this more if you feel that it's necessary.

raft/confchange/quick_test.go Show resolved Hide resolved
raft/node.go Outdated Show resolved Hide resolved
raft/raft.go Outdated Show resolved Hide resolved
@tbg
Copy link
Contributor Author

tbg commented Jul 22, 2019

I added new commits to ease the reviewing. Didn't touch the existing ones. I will squash them in before merging.

@codecov-io
Copy link

codecov-io commented Jul 23, 2019

Codecov Report

Merging #10914 into master will increase coverage by 0.18%.
The diff coverage is 87.5%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #10914      +/-   ##
==========================================
+ Coverage   62.93%   63.12%   +0.18%     
==========================================
  Files         400      400              
  Lines       37646    37689      +43     
==========================================
+ Hits        23694    23791      +97     
+ Misses      12343    12295      -48     
+ Partials     1609     1603       -6
Impacted Files Coverage Δ
raft/quorum/majority.go 100% <100%> (ø) ⬆️
raft/tracker/tracker.go 96.62% <100%> (+0.15%) ⬆️
raft/rawnode.go 72.52% <100%> (-3.21%) ⬇️
raft/bootstrap.go 63.63% <100%> (ø) ⬆️
raft/confchange/confchange.go 81.28% <77.77%> (-0.69%) ⬇️
raft/node.go 89.51% <78.94%> (-1.31%) ⬇️
raft/raft.go 90.07% <88.37%> (-0.97%) ⬇️
client/keys.go 56.78% <0%> (-34.68%) ⬇️
pkg/adt/interval_tree.go 82.58% <0%> (-7.51%) ⬇️
client/client.go 79.73% <0%> (-4.25%) ⬇️
... and 24 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fe86a78...29fc15b. Read the comment docs.

tbg added 4 commits July 23, 2019 10:39
This change introduces joint quorums by changing the Node and RawNode
API to accept pb.ConfChangeV2 (on top of pb.ConfChange).

pb.ConfChange continues to work as today: it allows carrying out a
single configuration change. A pb.ConfChange proposal gets added to
the Raft log as such and is thus also observed by the app during Ready
handling, and fed back to ApplyConfChange.

ConfChangeV2 allows joint configuration changes but will continue to
carry out configuration changes in "one phase" (i.e. without ever
entering a joint config) when this is possible.
@tbg tbg merged commit d118342 into etcd-io:master Jul 23, 2019
@tbg tbg deleted the api branch July 23, 2019 09:49
tbg added a commit to tbg/cockroach that referenced this pull request Jul 24, 2019
This picks up Joint Consensus, see:

etcd-io/etcd#10914

Release note: None
craig bot pushed a commit to cockroachdb/cockroach that referenced this pull request Jul 25, 2019
39046: vendor: bump go.etcd.io/etcd r=nvanbenschoten a=tbg

This picks up Joint Consensus, see:

etcd-io/etcd#10914

Release note: None

Co-authored-by: Tobias Schottdorf <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants