Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: attempt to explain ProposalData lifecycle #104658

Merged
merged 1 commit into from
Jun 14, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
64 changes: 64 additions & 0 deletions pkg/kv/kvserver/replica_proposal.go
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,70 @@ import (
// ProposalData is data about a command which allows it to be
// evaluated, proposed to raft, and for the result of the command to
// be returned to the caller.
//
// A proposal (=ProposalData) is created during write request evaluation and
// then handed to the proposal buffer (Replica.mu.proposalBuf) via
// propBuf.Insert. In the common case, the proposal transfers from there into
// Replica.mu.proposals (and the raft log) via propBuf.FlushLockedWithRaftGroup
// (called from handleRaftReady). But if the proposal buffer is full, Insert may
// also flush the buffer to make room for a new proposal. ProposalData itself
// has no mutex and locking is provided by `Replica.mu` which must always be
// held when accessing a ProposalData (except before the first insertion into
// the proposal buffer). However, at the time of writing, this locking is
// possibly insufficient and the above rule is not adhered to at all times. In
// particular, during raft application, the ProposalData is mutated without the
// Replica.mu lock being held in all places. As log application is a
// well-trodden code path and no relevant data races are known, they are either
// absent or extremely rare. Still, an improvement seems advisable.
//
// This goes for the locking, but also for the lifecycle of ProposalData in
// aggregate. It is extremely complex due to internal reproposals and the
// multiple, poorly synchronized, and circular hand-offs that exist between the
// proposals map and the proposal buffer. The following diagram attempts to
// describe the status quo.
//
// created
// │ proposal map, apply
// │[1] goroutine, and propBuf
// ▼ ▲ │
// ┌─┬────►propBuf ┌────────────┐ │[6] │[2]
// │ │ │ │ │ │ ▼
// │ │ │[2] │ proposal map and
// │ │ ▼ ▼ apply goroutine
// │ │ proposal map────┐ ▲
tbg marked this conversation as resolved.
Show resolved Hide resolved
// │ │ │ │ │ [4] │
// │ │[5] │[3] │[6] └───────┘
// │ │ │ │
// │ │ ▼ └──────►proposal map
// │ └─apply goroutine and propBuf
// │ │ │[7]
// │ ▼ ▼
// │[8] finished apply goroutine and propBuf
// │ │
// └────────────────────────────┘
//
// [1]: `(*Replica).propose` calls `(*proposalBuf).Insert`
// [2]: `(*proposalBuf).FlushLockedWithRaftGroup` on the `handleRaftReady`
// goroutine, or `(*proposalBuf).Insert` for another command on a full
// `proposalBuf`, or `FlushLockedWithoutProposing` (less interesting).
// [3]: picked up by `(*replicaDecoder).retrieveLocalProposals` when the log
// entry has the most recent MaxLeaseIndex (common path)
// [4]: picked up by `(*replicaDecoder).retrieveLocalProposals` but the log
// entry is a past incarnation, i.e. `MaxLeaseIndex` of `ProposalData` is newer.
// In this case, the command will be rejected (apply as a no-op and not signal a
// result) since a newer copy of it is inflight. (Unhappy path).
// [5]: log entry failed MaxLeaseIndex check, and we are reproposing with a
// newer MaxLeaseIndex in `(*Replica).tryReproposeWithNewLeaseIndex`.
// [6]: while being applied (rejected), the `*ProposalData` may be reinserted
// into the `propBuf` by `(*Replica).refreshProposalsLocked`, which finds it in
// the map.
// [7]: like [3] but the goroutine is now also in the propBuf.
// [8]: either [5] or the proposal applies successfully but is still in the
// propBuf since it was there initially. In that latter case we now have a
// finished proposal in the propBuf
// (https://github.com/cockroachdb/cockroach/issues/97605)
//
// See also the prose near the call to `tryReproposeWithNewLeaseIndex`.
type ProposalData struct {
// The caller's context, used for logging proposals, reproposals, message
// sends, but not command application. In order to enable safely tracing events
Expand Down