kvserver: test-only API for proposal lifecycle intercepting #115759
Labels
A-kv-replication
Relating to Raft, consensus, and coordination.
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
T-kv
KV Team
Background
A write proposal in CRDB goes through a sequence of key events on multiple layers (e.g. see the high-level overview), such as: "above-raft" evaluation, proposal to raft, local raft log append, commit in raft, "below-raft" application of the command, etc.
In addition, there is a notion of reproposals of two kinds: an exact copy reproposal, and a reproposal with a bumped LAI. The first mechanism is there to ensure at-least-once commit to raft, and the second mechanism (together with the LAI mechanism) ensures at-most-once "below-raft" command application. So, a single proposal's lifecycle is not a linear sequence, but rather can be decomposed into multiple (re-)proposals that fan out, and join back in exactly-once fashion at the end.
We have various testing knobs that allow intercepting / injecting errors into various points of a proposal lifetime. However, they are neither comprehensive, nor ergonomic. To use them properly, the test writer needs to understand the proposal lifecycle in detail. If they want to intercept multiple lifetime points of a single proposal, they have to manually match things across these callbacks. When a new injection point needs to be added (e.g. in PR #114191), this ever-growing flat list of knobs is extended. It gets harder to maintain.
Proposal
We should consolidate the lifetime of a single proposal in a test-only type/interface. The type would allow the test writer to "register" proposals and "watch" (intercept or inject into) them until "unregistered", or their lifetime ends. It would free the test writer from matching the events to proposals.
The proposal lifecycle would be consolidated in a single type, which would make it more understandable and maintainable. It would also enable testing of the key correctness properties (like the fact that a proposal applies at most once; and a proposal never applies after it returns an "I will never be applied" promise).
This would force us to rethink the proposal lifecycle and its invariants in the first place, and potentially lead to refactoring this bug-prone code to a more understandable state. With a powerful test-only interface and the corresponding tests this can be done with confidence. The ideal end state is that the test-only lifecycle interface simply matches the reality.
Jira issue: CRDB-34194
Epic: CRDB-25287
The text was updated successfully, but these errors were encountered: