Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: Reinstate coalesced heartbeats #6107

Closed
bdarnell opened this issue Apr 18, 2016 · 6 comments
Closed

storage: Reinstate coalesced heartbeats #6107

bdarnell opened this issue Apr 18, 2016 · 6 comments
Assignees
Milestone

Comments

@bdarnell
Copy link
Contributor

Coalesced heartbeats were removed when the multiraft package was folded into storage. We should bring them back. This may help with issues seen in #5970, but the available evidence suggests this is lower priority than #6106.

@bdarnell
Copy link
Contributor Author

For reference, see #3528, which removed multiraft.

@tbg
Copy link
Member

tbg commented May 31, 2016

@spencerkimball brought up the interesting idea of measuring liveness based on Gossip as a first approximation. That is, a node would locally respond to heartbeats for all of its Raft groups as long as the recipient node had a signal in gossip (maybe the store status, or something smaller that can be gossiped more frequently). Historically we've had trouble with coalesced heartbeats and the semantics they carry in Raft, especially during asymmetrical partitions. However we reinstate them, we should be clear on what we're guaranteeing.

@tbg
Copy link
Member

tbg commented Jun 23, 2016

BTW, take a look at @d4l3k's recent PR #7399. Effectively this adds a measure of node-to-node connection liveliness to rpcContext. You could imagine that replacing actual heartbeats (similar to the Gossip idea above, but more directly tied to node-to-node communication), by sending heartbeats to those Raft groups which claim to follow a leader on the healthy node. It doesn't quite pan out that way (because if the leader steps down for some reason, you may never find out because you will never campaign). Still worth considering, as standard coalesced heartbeats may have the same problem.

Pulling that thread further, you could imagine not ticking Raft groups for which regular outside heartbeats exist, and ticking them only in the absence of heartbeats (so that elections would ensue), to address the large number of raft groups that would otherwise have to be touched by every tick (might be an unrealistic idea, but just throwing it out there).

@petermattis
Copy link
Collaborator

@tschottdorf I had a similar idea yesterday. It seems possible to leverage the rpc heartbeats to coalesce (or eliminate) the Raft level heartbeats. We'd have to increase the rpc heartbeat frequency, but that's better than having each range heartbeat every 100ms. As you and Arjun both point out there are complexities, but it seems doable.

@rjnn
Copy link
Contributor

rjnn commented Oct 21, 2016

Closed by #9380.

@rjnn rjnn closed this as completed Oct 21, 2016
@d4l3k
Copy link
Contributor

d4l3k commented Oct 21, 2016

🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants