Externalize log entry flow control #64

pav-kv · 2023-05-30T18:54:28Z

This issue states a problem, and introduces a high-level idea on solving this and similar problems. This is not a ready design yet, more like a conversation starter.

Currently, the raft package takes an active role in managing the flow of log entries in and out of Storage, and pushing them from leader to followers. There is an argument for shifting the flow control responsibility to the application layer, and reducing raft's responsibility to mostly managing correctness (i.e. make it more a passive "library" than an active "framework").

For example, currently, once an entry is committed, raft:

fetches it from Storage (+unstable)
pushes it through the Ready struct to the application
expects the application layer to do the job by the next Ready iteration (or a few iterations, in case of async storage writes)

Since raft library is not fully aware of the application layer's resource allocation strategy or the semantics of the commands in the log, it sometimes may push too much work through the Ready struct. The static "max bytes" policy is somewhat helpful in this regard, however in more complex setups it still does not suffice. For example, in CockroachDB one node may host and schedule tens/hundreds of thousands of raft instances, and if many instances push many entries simultaneously, an out-of-memory situation may occur.

This necessitates introducing more ad-hoc back pressure / flow control mechanisms into raft to constrain this flow dynamically. The mechanisms should be flexible enough to be used by many applications (such as etcd and CockroachDB). Generally, these mechanisms are two-fold: a) introspection into raft state before it pushes more work to the application, so that the application can preallocate and/or signal raft to slow down before it's too late; b) the feedback mechanism that signals raft to slow down (e.g. see #60).

As another example, the MsgApp flow from leader to a follower is driven by raft too. There are (mostly) two states in which a flow can be: StateProbe and StateReplicate. In overflow situations, the application layer has no option other than dropping the messages, which eventually causes raft to retry sending the same entry appends. It is currently impossible to ask raft to gracefully slow down instead.

The above are examples of somewhat ad-hoc flow control mechanisms that we currently have or could introduce to workaround the resource overuse issues. For a holistic control, every link in the pipeline requires such a mechanism integrated into the raft library. This enlarges the API surface and implementation complexity, is error-prone, and not necessarily solves the problems optimally for all raft users.

For best control, the responsibility could be shifted to the users. For example, instead of fetching entries and pushing them to the application (+providing introspection and back pressure knobs), raft could simply:

indicate to the application a log index/term range of committed entries
expect that the application fetches and applies the entries at its own pace
react to a message from application confirming that some entries were applied (making sure of the overall raft algorithm correctness)

A backwards compatibility consideration should be taken into account. There are applications already relying on flow control mechanisms currently built-in to raft.

The text was updated successfully, but these errors were encountered:

pav-kv added the enhancement New feature or request label Jun 2, 2023

pav-kv mentioned this issue Jun 6, 2023

Questions about log probe #72

Closed

pav-kv mentioned this issue Jun 30, 2023

kv: make disk reads asynchronous with respect to Raft state machine cockroachdb/cockroach#105850

Open

pav-kv mentioned this issue Jul 11, 2023

Add Config.DisableConfChangeValidation #81

Merged

tbg mentioned this issue Jul 12, 2023

Configuration change validation has false positives #80

Open

This was referenced Jan 22, 2024

Flow control for MsgApp messages #130

Open

raftLog: decouple log data structure and flow control #142

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Externalize log entry flow control #64

Externalize log entry flow control #64

pav-kv commented May 30, 2023 •

edited

Loading

Externalize log entry flow control #64

Externalize log entry flow control #64

Comments

pav-kv commented May 30, 2023 • edited Loading

pav-kv commented May 30, 2023 •

edited

Loading