Skip to content

Commit

Permalink
storage: interface for ReplicasStorage
Browse files Browse the repository at this point in the history
ReplicasStorage provides an interface to manage the persistent state that
includes the lifecycle of a range replica, its raft log, and the state
machine state. The implementation(s) are expected to be a stateless wrapper
around persistent state in the underlying engine(s) (any state they
maintain in-memory would be simply a performance optimization and always
be in-sync with the persistent state).

We consider the following distinct kinds of persistent state:
- State machine state: It contains all replicated keys: replicated range-id
  local keys, range local keys, range lock keys, lock table keys, global
  keys. This includes the RangeAppliedState and the RangeDescriptor.

- Raft and replica life-cycle state: This includes all the unreplicated
  range-ID local key names prefixed by Raft, and the RangeTombstoneKey.
  We will loosely refer to all of these as "raft state".

The interface requires that any mutation (batch or sst) only touch one of
these kinds of state. This discipline will allow us to eventually separate
the engines containing these two kinds of state. This interface is not
relevant for store local keys though they will be in the latter engine. The
interface does not allow the caller to specify whether to sync a mutation
to the raft log or state machine state -- that decision is left to the
implementation of ReplicasStorage. So the hope is that even when we don't
separate the state machine and raft engines, this abstraction will force us
to reason more carefully about effects of crashes, and when to sync, and
allow us to test more thoroughly (including "crash" testing using
strict-mem FS).

ReplicasStorage does not interpret most of the data in the state machine.
It expects mutations to that state to be provided as an opaque batch, or a
set of files to be ingested. There are a few exceptions where it can read
state machine state, mainly when recovering from a crash, so as to make
changes to get to a consistent state.
- RangeAppliedStateKey: needs to read this in order to truncate the log,
  both as part of regular log truncation and on crash recovery.
- RangeDescriptorKey: needs to read this to discover ranges whose state
  machine state needs to be discarded on crash recovery.

A corollary to this lack of interpretation is that reads of the state
machine are not handled by this interface, though it does expose some
metadata in case the reader want to be sure that the range it is trying to
read actually exists in storage. ReplicasStorage also does not offer an
interface to construct changes to the state machine state. It simply
applies changes, and requires the caller to obey some simple invariants to
not cause inconsistencies. It is aware of the keyspace occupied by a range
and the difference between rangeID keys and range keys -- it needs this
awareness to restore internal consistency when initializing (say after a
crash), by clearing the state machine state for replicas that should no
longer exist.

ReplicasStorage does interpret the raft state (all the unreplicated
range-ID local key names prefixed by Raft), and the RangeTombstoneKey. This
is necessary for it to be able to maintain invariants spanning the raft log
and the state machine (related to raft log truncation, replica lifetime
etc.), including reapplying raft log entries on restart to the state
machine. All accesses (read or write) to the raft log and RangeTombstoneKey
must happen via ReplicasStorage.

Since this abstraction is mutating the same underlying engine state that
was previously mutated via lower-level interfaces, and is not a
data-structure in the usual sense, we should be able to migrate callers
incrementally to use this interface. That is, callers that use this
interface, and those that use the lower-level engine interfaces could
co-exist correctly.

Informs #38322

Release note: None
  • Loading branch information
sumeerbhola committed Dec 3, 2021
1 parent 46dcf39 commit 015f0b9
Showing 1 changed file with 693 additions and 0 deletions.
Loading

0 comments on commit 015f0b9

Please sign in to comment.