raft.LeaderCh() always deliver latest transition #384

notnoop · 2020-01-23T15:18:57Z

This makes leaderCh always deliver the latest leadership transition
rather than dropping messages somewhat unexpectedly

Managing leadership transition is quite fragile and easy to get wrong.
Prior to this, LeaderCh() is unreliable and is prone to misuse.
For example, both Consul[1] and Nomad[2] started with LeaderCh and
ended up switching to manage their own channels after hitting issues.
But managing reliable leadership channels is tricky; both Consul[3] and
Nomad[4] had deadlock bugs related to leadership flagging!

Here, we aim to improve the library ergonomics for future library
applications as well as Nomad, by having stricter guarantees of
raft.LeaderCh. Namely, receivers are guaranteed to receive a
notification if leadership transition event occured, and slow receivers
will be guaranteed to receive the latest view since their last receive*.

This seems like a reasonable interface for practical purposes. If a
leadership flapped many times but but ended as a non-leader ultimately,
the receiver should just recored the final step down without attempting
to handle the intermediary states events.

One edge case is if a leader lost but then regained leadership quickly. The
application may need to issue a Barrier call and reconcile its state.
Thus, the library should still send notification events even if final
leadership value matches last delivered value.

note that due to concurrency, last received value might still be
stale, if raft just recognized transition event but didn't send yet.
Receivers on a loop will get the event in subsequent receive.

[1] hashicorp/consul#2911
[2] hashicorp/nomad@c2b7ce9
[3] hashicorp/consul#6852
[4] hashicorp/nomad#4749

This makes leaderCh always deliver the latest leadership transition rather than dropping messages somewhat unexpectedly Managing leadership transition is quite fragile and easy to get wrong. Prior to this, `LeaderCh()` is unreliable and is prone to misuse. For example, both Consul[1] and Nomad[2] started with LeaderCh and ended up switching to manage their own channels after hitting issues. But managing reliable leadership channels is tricky; both Consul[3] and Nomad[4] had deadlock bugs related to leadership flagging! Here, we aim to improve the library ergonomics for future library applications as well as Nomad, by having stricter guarantees of `raft.LeaderCh`. Namely, receivers are guaranteed to receive a notification if leadership transition event occured, and slow receivers will be guaranteed to receive the latest view since their last receive*. This seems like a reasonable interface for practical purposes. If a leadership flapped many times but but ended as a non-leader ultimately, the receiver should just recored the final step down without attempting to handle the intermediary states events. One edge case is if a leader lost but then regained leadership quickly. The application may need to issue a Barrier call and reconcile its state. Thus, the library should still send notification events even if final leadership value matches last delivered value. * note that due to concurrency, last received value might still be stale, if raft just recognized transition event but didn't send yet. Receivers on a loop will get the event in subsequent receive.

hanshasselberg

I started going through the code to understand your PR better and one thing that confused me is the difference between leaderCh and NotifyCh. I found a comment in consul code https://github.com/hashicorp/consul/blob/b3cf47c861f9a957e66011e7d466304f9bd93434/agent/consul/leader.go#L50-L53:

        // We use the notify channel we configured Raft with, NOT Raft's
        // leaderCh, which is only notified best-effort. Doing this ensures
        // that we get all notifications in order, which is required for
        // cleanup and to ensure we never run multiple leader loops.

So far I think your changes mitigate this problem because this channel no longer has the old value but always the latest one. Which means it would be preferable to use that instead of NotifyCh. Will keep thinking about it.

banks · 2020-01-24T10:40:14Z

api.go

+// transition has occured.
+//
+// If receivers aren't ready for the signal, signals may drop and only the
+// latest leadership transition. For example, if a receiver receives subsequent


Suggested change

// latest leadership transition. For example, if a receiver receives subsequent

// latest leadership transition will be received. For example, if a receiver receives consecutive

banks

🤔 Interesting one Mahmood.

I can see how this is more reliable than before but I still wonder if it's prone to misuse/errors?

The barrier example you gave is one case where this has a pretty big gotcha - if we see consecutive true when already in leader state the app probably needs to still rescind leadership and take it up again in order to get back into a good state where we need to not only be sure FSM is in sync but also any other state the leader has setup based on the FSM and believes no one else can make writes without it knowing might need to be recreated from the FSM. In Consul for example, thinks like the Connect CA state are setup during establishLeadership and only modified on updates to CA config. It wouldn't be safe to just keep running even if we issued a new barrier on the FSM, we'd explicitly need to exit the leader loop and re-enter it again to ensure everything was in correct state.

Maybe this change makes that possible at least but I think you'd need to explicitly treat every message delivered as "you need to change state now" even if that's transitioning to follower and back to leader again to ensure everything is correct. That's hinted at in the comment you added but I think if we keep this we should spell it out more concretely.

On the other hand, if Consul and Nomad have already given up on trying to use this method, perhaps we should instead just document it as Deprecated and impossible to use correctly and describe the better alternative instead?

notnoop · 2020-01-24T13:10:49Z

@banks Thanks for the review - yeah, I can spill the need to handle transition more specifically. In Nomad land, I implemented this approach, just outside the library in https://github.com/hashicorp/nomad/pull/6977/files (would love your review there too :P). I was hoping that we'd simply nomad's logic by incorporating the leadership channel behavior here and rely on this instead. Based on our experience, requiring clients to configure their reliable channels properly is tricky.

Another alternative that may relieve consumers more and be less prone to misuse is to common and "best practices" bits of Consul or Nomad's monitorLoop and leaderLoop into the library and expose a LeadershipUpdateDelegate for application specific logic. If this is something worth checking, I can implement it.

I'm motivated to address as much of this in raft library, because managing leadership transitions is very subtle now - and exposing a better api will hopefully avoid having other applications (Vault?!) rediscover Consul/Nomad bugs and easies fixing them globally.

banks · 2020-01-24T14:58:17Z

Got it, I'm definitely in favour of making the raft lib easier for other implementers to use correctly, I guess I wonder if this is enough better considering the still hefty gotchas involved to use it correctly. Perhaps encapsulating the way Consul and Nomad do this via notifyCh in the lib like you suggest is a better improvement?

…

On Fri, Jan 24, 2020 at 1:10 PM Mahmood Ali ***@***.***> wrote: @banks <https://github.com/banks> Thanks for the review - yeah, I can spill the need to handle transition more specifically. In Nomad land, I implemented this approach, just outside the library in https://github.com/hashicorp/nomad/pull/6977/files (would love your review there too :P). I was hoping that we'd simply nomad's logic by incorporating the leadership channel behavior here and rely on this instead. Based on our experience, requiring clients to configure their reliable channels properly is tricky. Another alternative that may relieve consumers more and be less prone to misuse is to common and "best practices" bits of Consul or Nomad's monitorLoop and leaderLoop into the library and expose a LeadershipUpdateDelegate for application specific logic. If this is something worth checking, I can implement it. I'm motivated to address as much of this in raft library, because managing leadership transitions is very subtle now - and exposing a better api will hopefully avoid having other applications (Vault?!) rediscover Consul/Nomad bugs and easies fixing them globally. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#384?email_source=notifications&email_token=AAA5QU42HHCT7L6M4ZQHRRTQ7LSFVA5CNFSM4KKYFAUKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJ2XQ3Y#issuecomment-578123887>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAA5QU72KDKW2U6KDTCUG5TQ7LSFVANCNFSM4KKYFAUA> .

notnoop · 2020-01-25T03:39:31Z

Perhaps encapsulating the way Consul and Nomad do this via notifyCh in the
lib like you suggest is a better improvement?

Maybe - Defining the right interface isn't obvious. Nomad and Consul monitor leadership loop have other concerns (reconciling Serf, tracking members, and enabling ACL progress). I'm unclear what the right abstraction is here.

Let's not make the perfect be the enemy of the better. This change makes leaderCh useful (previous version was unusable), simplify Nomad and Consul implementation, and doesn't preclude us from adding a delegate in future. This will be an improvement for Nomad; Consul would benefit too: its current approach of arbitrary increasing buffer limit is susceptible to two issues: first, it relies on a manually-set channel buffering value for safety without grantees of how fast channel is dequeued; also in extreme flapping, it does redundant unnecessary transitions (e.g. if a server gains and relinquished leadership few times and ultimately became a non-voter, it's safe to step down without ever calling establishLeadership).

Alternatively, if we do want to punt, let's simply remove the current leaderCh - it's a ticking bomb for bugs - and properly document how applications can monitor leadership properly.

hanshasselberg · 2020-01-27T20:28:13Z

I was wondering if this change would be technically speaking backwards incompatible. leaderCh was dropping messages before and still does, just other messages. From the outside the message dropping was unpredictable and now we are conscious about it. From an API perspective I don't see why this change would be a problem.

This change makes leaderCh useful (previous version was unusable)

agreed.

also in extreme flapping, it does redundant unnecessary transitions

I was thinking that if we would drain notifyCh before doing anything, we could get the latest state as well and act on it. Avoiding unnecessary transitions.

schristoff

Discussed in Raft Force - approved with the intention to look into alternatives in the future.

notnoop requested review from hanshasselberg and banks January 23, 2020 15:18

notnoop self-assigned this Jan 23, 2020

notnoop force-pushed the f-stricter-leaderch-semantics branch from 9699196 to 9e800fe Compare January 23, 2020 16:07

hanshasselberg reviewed Jan 24, 2020

View reviewed changes

banks reviewed Jan 24, 2020

View reviewed changes

schristoff approved these changes Feb 6, 2020

View reviewed changes

notnoop merged commit 365023d into master Feb 11, 2020

notnoop deleted the f-stricter-leaderch-semantics branch February 11, 2020 19:22

notnoop mentioned this pull request Feb 14, 2020

Update Raft dependency hashicorp/nomad#7150

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

raft.LeaderCh() always deliver latest transition #384

raft.LeaderCh() always deliver latest transition #384

notnoop commented Jan 23, 2020

hanshasselberg left a comment •

edited

Loading

banks Jan 24, 2020

banks left a comment

notnoop commented Jan 24, 2020

banks commented Jan 24, 2020 via email

notnoop commented Jan 25, 2020 •

edited

Loading

hanshasselberg commented Jan 27, 2020

schristoff left a comment

	// latest leadership transition. For example, if a receiver receives subsequent
	// latest leadership transition will be received. For example, if a receiver receives consecutive

raft.LeaderCh() always deliver latest transition #384

raft.LeaderCh() always deliver latest transition #384

Conversation

notnoop commented Jan 23, 2020

hanshasselberg left a comment • edited Loading

Choose a reason for hiding this comment

banks Jan 24, 2020

Choose a reason for hiding this comment

banks left a comment

Choose a reason for hiding this comment

notnoop commented Jan 24, 2020

banks commented Jan 24, 2020 via email

notnoop commented Jan 25, 2020 • edited Loading

hanshasselberg commented Jan 27, 2020

schristoff left a comment

Choose a reason for hiding this comment

hanshasselberg left a comment •

edited

Loading

notnoop commented Jan 25, 2020 •

edited

Loading