raft: separate log and state storage logically #132030
Labels
A-kv-replication
Relating to Raft, consensus, and coordination.
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
In service of the separate raft log [#16624] and witness projects, it appears that the best results can be achieved with the support from the
raft
package. While the log and state storage in CRDB physically sit in the same storage engine, it is possible to start separating them logically inraft
, and introducing assumptions that the two can work asynchronously.When two storages are separated, the model becomes:
RawNode
can report the durable applied state, which gets input into the log compaction decisions, and prevents long apply catchups upon restart.RawNode
restarts. It must be possible to reconciliate a correct in-memory raft state from the initial state read from both storages.To support the above, both storages have to provide logical clocks (
HardState
-style) that make it possible to compare the two states. Initially, both can be sourced from the unifiedHardState
that we have today (which means they will always be in sync), and eventually they can become asynchronous at the physical level.Doing the logical separation first gives the benefit of being able to test it extensively in
raft
datadriven tests long before the actual physical separation happens. The CRDB-specific aspects of the physical separation would be added on top.Jira issue: CRDB-42792
The text was updated successfully, but these errors were encountered: