-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
raft: allow use of joint quorums #10914
Conversation
@@ -102,6 +102,16 @@ func (c MajorityConfig) Describe(l AckedIndexer) string { | |||
return buf.String() | |||
} | |||
|
|||
// Slice returns the MajorityConfig as a sorted slice. | |||
func (c MajorityConfig) Slice() []uint64 { | |||
var sl []uint64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we do the same thing as String
method? sl := make([]uint64, 0, len(c))
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to return nil
when len(c) == 0
. But we have similar code in so many places right now that I'll take a (later) pass where I factor out some helpers for map[uint64]struct{}{}
.
message ConfChangeV2 { | ||
optional ConfChangeTransition transition = 1 [(gogoproto.nullable) = false]; | ||
repeated ConfChangeSingle changes = 2 [(gogoproto.nullable) = false]; | ||
optional bytes context = 3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need a top-level context
on the ConfChangeV2, in addition to ConfChangeSingle.context
? The existence of ConfChangeUpdateNode
implies that this field is meant to be a mutable per-node state. Do we have a concrete use case for this non-node context or is it just copying the existing pattern?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ConfChangeSingle does not have a context, this is precisely why I introduced it. The Context
field here is what holds the actual entry that piggybacks on top of the config change (at least in CRDB). Does that answer your question?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I had it in my head that ConfChangeSingle was a rename of the V1 ConfChange.
I think that unless we're getting rid of ConfChangeUpdateNode, the context needs to be on the ConfChangeSingle instead of ConfChangeV2, so that someone using ConfChangeUpdateNode would be able to bundle multiple updates into a single joint update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the point of bundling multiple ConfChangeUpdateNode? I don't think there's ever any interest in doing that in etcd, and if you do want to do that in a new application you just encode a slice of payloads into your top-level Context. On the other hand, if you only have one Context (as CRDB will) you have to awkwardly stash it into the first single, which sucks.
If you prefer, I can make UpdateNode unsupported in joint configs, but it seems like shaving a yak that doesn't need shaving. An app can always emulate UpdateNode as a sequence of add-remove operations touching a NodeID the app knows doesn't exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If ConfChangeSingle doesn't have a context, what does an UpdateNode in a joint config even mean? I'm not sure how exactly etcd is using ConfChangeUpdateNode, but I think that anything but adding a context to ConfChangeSingle will be a partial move towards getting rid of it.
Anyway, this isn't a blocking comment. If anyone's going to fight for ConfChangeUpdateNode, it should be the etcd team :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but I think that anything but adding a context to ConfChangeSingle will be a partial move towards getting rid of it.
Oh yeah, that's the intention :-) But as I said, you can just use ConfChangeRemoveNode(99999)
instead if you know you're never going to actually use NodeID 9999.
etcd really doesn't have to use config changes for UpdateNode, it was just a lazy thing to do because it allows them to reuse all of their plumbing for config changes to get the payload where it needs to go. Similar to that ID field which as you'll have noticed is also not available with V2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If ConfChangeSingle doesn't have a context, what does an UpdateNode in a joint config even mean?
Say you have a ConfChangeV2 with two UpdateNode singles. You just make the context something like []myproto.NodeInfo{{Host: "node1.mynewdns"}, {Host: "node2.mynewdns"}}.Marshal()
where the two legacy conf changes would've had one of each. This is similar to how in CRDB a joint conf change replacing a node will have a context encoding a change replicas trigger that has as its new desc one reflecting both a removal and addition of a voter.
This seems sort of obvious to me, but you've been asking about it twice, so I want to make sure I'm not missing something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I get that. But this means that at the same time you transition from v1 to v2 conf changes, you have to transition your context from a single-node value to a map from node id to value.
I guess my question/confusion is why not put the context on the ConfChangeSingle? For users of UpdateNode, it's an easier transition, and I can't see any downside to it for people who don't use it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess my question/confusion is why not put the context on the ConfChangeSingle? For users of UpdateNode
One very practical reason I haven't mentioned is that if you want to be able to leave a joint state while injecting a Context, you won't be able to if it sits on the singles. Leaving a joint state has zero singles.
I also generally believe that one top-level opaque payload is the right way to do it semantically because it's straightforward to use it to get per-single payloads (you just encode a slice of len(singles) as your top-level Context), it's easy to migrate into if you want that (match on EntryConfChangeV2
) and if you don't have per-single payloads, the alternative would force you into awkwardly choosing which of the multiple possible Contexts to populate (and also forces you to think about what if there's more than one, etc).
I'm also not understanding your points about UpdateNode. There's no strong reason to ever want to run more than one UpdateNode at once. I think there's nothing special about UpdateNode here, you could take the way etcd uses any other config change similarly (the payload is an encoded info blob about the affected member). But as mentioned above, the migration is easy and they need a migration already to use V2 if they ever want that. At that point they will also have to encode that "ID" field into the top-level Context (and this one is a singleton). Again, this is easy, but it just means switching to something like a struct{UniqueID uint64; Infos []NodeInfo}
as the Context
.
Happy to discuss this more if you feel that it's necessary.
I added new commits to ease the reviewing. Didn't touch the existing ones. I will squash them in before merging. |
Codecov Report
@@ Coverage Diff @@
## master #10914 +/- ##
==========================================
+ Coverage 62.93% 63.12% +0.18%
==========================================
Files 400 400
Lines 37646 37689 +43
==========================================
+ Hits 23694 23791 +97
+ Misses 12343 12295 -48
+ Partials 1609 1603 -6
Continue to review full report at Codecov.
|
This change introduces joint quorums by changing the Node and RawNode API to accept pb.ConfChangeV2 (on top of pb.ConfChange). pb.ConfChange continues to work as today: it allows carrying out a single configuration change. A pb.ConfChange proposal gets added to the Raft log as such and is thus also observed by the app during Ready handling, and fed back to ApplyConfChange. ConfChangeV2 allows joint configuration changes but will continue to carry out configuration changes in "one phase" (i.e. without ever entering a joint config) when this is possible.
This picks up Joint Consensus, see: etcd-io/etcd#10914 Release note: None
39046: vendor: bump go.etcd.io/etcd r=nvanbenschoten a=tbg This picks up Joint Consensus, see: etcd-io/etcd#10914 Release note: None Co-authored-by: Tobias Schottdorf <[email protected]>
This change introduces joint quorums by changing the Node and RawNode API
to accept pb.ConfChangeV2 (on top of pb.ConfChange).
pb.ConfChange continues to work as today: it allows carrying out a single
configuration change. A pb.ConfChange proposal gets added to the Raft log
as such and is thus also observed by the app during Ready handling, and fed
back to ApplyConfChange.
ConfChangeV2 allows joint configuration changes but will continue to carry
out configuration changes in "one phase" (i.e. without ever entering a
joint config) when this is possible.