-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
raft.LeaderCh() always deliver latest transition #384
Conversation
This makes leaderCh always deliver the latest leadership transition rather than dropping messages somewhat unexpectedly Managing leadership transition is quite fragile and easy to get wrong. Prior to this, `LeaderCh()` is unreliable and is prone to misuse. For example, both Consul[1] and Nomad[2] started with LeaderCh and ended up switching to manage their own channels after hitting issues. But managing reliable leadership channels is tricky; both Consul[3] and Nomad[4] had deadlock bugs related to leadership flagging! Here, we aim to improve the library ergonomics for future library applications as well as Nomad, by having stricter guarantees of `raft.LeaderCh`. Namely, receivers are guaranteed to receive a notification if leadership transition event occured, and slow receivers will be guaranteed to receive the latest view since their last receive*. This seems like a reasonable interface for practical purposes. If a leadership flapped many times but but ended as a non-leader ultimately, the receiver should just recored the final step down without attempting to handle the intermediary states events. One edge case is if a leader lost but then regained leadership quickly. The application may need to issue a Barrier call and reconcile its state. Thus, the library should still send notification events even if final leadership value matches last delivered value. * note that due to concurrency, last received value might still be stale, if raft just recognized transition event but didn't send yet. Receivers on a loop will get the event in subsequent receive.
9699196
to
9e800fe
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I started going through the code to understand your PR better and one thing that confused me is the difference between leaderCh
and NotifyCh
. I found a comment in consul code https://github.com/hashicorp/consul/blob/b3cf47c861f9a957e66011e7d466304f9bd93434/agent/consul/leader.go#L50-L53:
// We use the notify channel we configured Raft with, NOT Raft's
// leaderCh, which is only notified best-effort. Doing this ensures
// that we get all notifications in order, which is required for
// cleanup and to ensure we never run multiple leader loops.
So far I think your changes mitigate this problem because this channel no longer has the old value but always the latest one. Which means it would be preferable to use that instead of NotifyCh. Will keep thinking about it.
// transition has occured. | ||
// | ||
// If receivers aren't ready for the signal, signals may drop and only the | ||
// latest leadership transition. For example, if a receiver receives subsequent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// latest leadership transition. For example, if a receiver receives subsequent | |
// latest leadership transition will be received. For example, if a receiver receives consecutive |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 Interesting one Mahmood.
I can see how this is more reliable than before but I still wonder if it's prone to misuse/errors?
The barrier example you gave is one case where this has a pretty big gotcha - if we see consecutive true
when already in leader state the app probably needs to still rescind leadership and take it up again in order to get back into a good state where we need to not only be sure FSM is in sync but also any other state the leader has setup based on the FSM and believes no one else can make writes without it knowing might need to be recreated from the FSM. In Consul for example, thinks like the Connect CA state are setup during establishLeadership
and only modified on updates to CA config. It wouldn't be safe to just keep running even if we issued a new barrier on the FSM, we'd explicitly need to exit the leader loop and re-enter it again to ensure everything was in correct state.
Maybe this change makes that possible at least but I think you'd need to explicitly treat every message delivered as "you need to change state now" even if that's transitioning to follower and back to leader again to ensure everything is correct. That's hinted at in the comment you added but I think if we keep this we should spell it out more concretely.
On the other hand, if Consul and Nomad have already given up on trying to use this method, perhaps we should instead just document it as Deprecated and impossible to use correctly and describe the better alternative instead?
@banks Thanks for the review - yeah, I can spill the need to handle transition more specifically. In Nomad land, I implemented this approach, just outside the library in https://github.com/hashicorp/nomad/pull/6977/files (would love your review there too :P). I was hoping that we'd simply nomad's logic by incorporating the leadership channel behavior here and rely on this instead. Based on our experience, requiring clients to configure their reliable channels properly is tricky. Another alternative that may relieve consumers more and be less prone to misuse is to common and "best practices" bits of Consul or Nomad's monitorLoop and leaderLoop into the library and expose a I'm motivated to address as much of this in raft library, because managing leadership transitions is very subtle now - and exposing a better api will hopefully avoid having other applications (Vault?!) rediscover Consul/Nomad bugs and easies fixing them globally. |
Got it,
I'm definitely in favour of making the raft lib easier for other
implementers to use correctly, I guess I wonder if this is enough better
considering the still hefty gotchas involved to use it correctly.
Perhaps encapsulating the way Consul and Nomad do this via notifyCh in the
lib like you suggest is a better improvement?
…On Fri, Jan 24, 2020 at 1:10 PM Mahmood Ali ***@***.***> wrote:
@banks <https://github.com/banks> Thanks for the review - yeah, I can
spill the need to handle transition more specifically. In Nomad land, I
implemented this approach, just outside the library in
https://github.com/hashicorp/nomad/pull/6977/files (would love your
review there too :P). I was hoping that we'd simply nomad's logic by
incorporating the leadership channel behavior here and rely on this
instead. Based on our experience, requiring clients to configure their
reliable channels properly is tricky.
Another alternative that may relieve consumers more and be less prone to
misuse is to common and "best practices" bits of Consul or Nomad's
monitorLoop and leaderLoop into the library and expose a
LeadershipUpdateDelegate for application specific logic. If this is
something worth checking, I can implement it.
I'm motivated to address as much of this in raft library, because managing
leadership transitions is very subtle now - and exposing a better api will
hopefully avoid having other applications (Vault?!) rediscover Consul/Nomad
bugs and easies fixing them globally.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#384?email_source=notifications&email_token=AAA5QU42HHCT7L6M4ZQHRRTQ7LSFVA5CNFSM4KKYFAUKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJ2XQ3Y#issuecomment-578123887>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAA5QU72KDKW2U6KDTCUG5TQ7LSFVANCNFSM4KKYFAUA>
.
|
Maybe - Defining the right interface isn't obvious. Nomad and Consul monitor leadership loop have other concerns (reconciling Serf, tracking members, and enabling ACL progress). I'm unclear what the right abstraction is here. Let's not make the perfect be the enemy of the better. This change makes Alternatively, if we do want to punt, let's simply remove the current |
I was wondering if this change would be technically speaking backwards incompatible.
agreed.
I was thinking that if we would drain notifyCh before doing anything, we could get the latest state as well and act on it. Avoiding unnecessary transitions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed in Raft Force - approved with the intention to look into alternatives in the future.
This makes leaderCh always deliver the latest leadership transition
rather than dropping messages somewhat unexpectedly
Managing leadership transition is quite fragile and easy to get wrong.
Prior to this,
LeaderCh()
is unreliable and is prone to misuse.For example, both Consul[1] and Nomad[2] started with LeaderCh and
ended up switching to manage their own channels after hitting issues.
But managing reliable leadership channels is tricky; both Consul[3] and
Nomad[4] had deadlock bugs related to leadership flagging!
Here, we aim to improve the library ergonomics for future library
applications as well as Nomad, by having stricter guarantees of
raft.LeaderCh
. Namely, receivers are guaranteed to receive anotification if leadership transition event occured, and slow receivers
will be guaranteed to receive the latest view since their last receive*.
This seems like a reasonable interface for practical purposes. If a
leadership flapped many times but but ended as a non-leader ultimately,
the receiver should just recored the final step down without attempting
to handle the intermediary states events.
One edge case is if a leader lost but then regained leadership quickly. The
application may need to issue a Barrier call and reconcile its state.
Thus, the library should still send notification events even if final
leadership value matches last delivered value.
stale, if raft just recognized transition event but didn't send yet.
Receivers on a loop will get the event in subsequent receive.
[1] hashicorp/consul#2911
[2] hashicorp/nomad@c2b7ce9
[3] hashicorp/consul#6852
[4] hashicorp/nomad#4749