You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We currently have two ways in which a replica is created. This is causing unnecessary complexity is part of a conglomerate of organically grown (read: messy and hard to grok) code around replica lifecycle management.
The first, and less important, way to create a replica is newReplica,
// We can't lock s.mu across NewReplica due to the lock ordering
// constraint (*Replica).raftMu < (*Store).mu. See the comment on
// (Store).mu.
s.mu.Lock()
err=s.addReplicaInternalLocked(rep)
s.mu.Unlock()
iferr!=nil {
returnerr
}
In particular, note that newReplica does not register the *Replica with the *Store, and that it only works to create an initialized (i.e. knows its descriptor) Replica, and it is very light on checks. For example, it doesn't check range tombstones or tries to assert invariants we know should hold for initialized replicas, and it leaves updating all of the relevant store metrics to the caller. In particular, if the replica overlaps an existing one, it won't scream; after all, the caller has to register with the store. But by that time it will be too late. This method feels a lot like a testing helper that has slipped into production (though this is not how it went, rather it's very old code).
On the other hand, we have tryGetOrCreateReplicaLocked. This is used all of the other time to create replicas; checks tombstones, generally does everything right, but it will only ever create uninitialized replicas and also asserts that it won't ever be invoked for a state that looks initialized1. And yet, when invoked for a Replica that does already exist and is initialized, it will happily return that *Replica and this is also a hot path in CRDB. It's a weird dichotomy.
We should remove newReplica and teach tryGetOrCreateReplica to also allow instantiating replicas that are initialized. Doing this without creating more confusion requires some care: in the common case of a raft message being handed to an initialized replica, we need to be able to look up the replica (i.e. call tryGetOrCreateReplica) without also providing the descriptor. At the same time, should the replica need to be created, we need the descriptor to be present.
Maybe not a perfect solution but here is a first stab at how we could encode that via an options struct:
typeGetOrCreateOptionsstruct {
RangeID storage.FullReplicaID// required// AssertExists returns an assertion failure in case the Replica is not present// in the Store.AssertExistsbool// AssertInitialized verifies that if the Replica exists, it is initialized, or// if it doesn't exist, that the Desc field is populated and thus an initialized// Replica is created.//// An initialized Replica has a RangeDescriptor with a nontrivial Span and a// nontrivial TruncatedState.AssertInitializedbool// AssertUninitialized verifies that if the Replica exists, it is uninitialized// (which includes verifying that the Desc field below is zero)//// An uninitialized Replica has a RangeDescriptor with a zero Span and a trivial// TruncatedState.AssertUninitializedbool// Desc is passed in for callers that expect to create an initialized Replica.// An error will result if Desc is not passed in but the storage indicates that// the Replica that is being created should be an initialized one.Desc*roachpb.RangeDescriptor// FromReplica optionally indicates a remote peer from which a message was// received, prompting the current Replica lookup. For initialized Replicas,// this allows checking whether FromReplica is still a member of the range.// Should this check fail, a ReplicaTooOldError will be returned instead of// returning or creating a Replica.FromReplica roachpb.ReplicaDescriptor
}
We currently have two ways in which a replica is created. This is causing unnecessary complexity is part of a conglomerate of organically grown (read: messy and hard to grok) code around replica lifecycle management.
The first, and less important, way to create a replica is
newReplica
,cockroach/pkg/kv/kvserver/replica_init.go
Lines 43 to 58 in 3d6fe9e
which, other than a few testing callers, has only one usage, during
(*Store).Start
:cockroach/pkg/kv/kvserver/store.go
Lines 2016 to 2029 in 68aac6f
In particular, note that
newReplica
does not register the*Replica
with the*Store
, and that it only works to create an initialized (i.e. knows its descriptor)Replica
, and it is very light on checks. For example, it doesn't check range tombstones or tries to assert invariants we know should hold for initialized replicas, and it leaves updating all of the relevant store metrics to the caller. In particular, if the replica overlaps an existing one, it won't scream; after all, the caller has to register with the store. But by that time it will be too late. This method feels a lot like a testing helper that has slipped into production (though this is not how it went, rather it's very old code).On the other hand, we have
tryGetOrCreateReplicaLocked
. This is used all of the other time to create replicas; checks tombstones, generally does everything right, but it will only ever create uninitialized replicas and also asserts that it won't ever be invoked for a state that looks initialized1. And yet, when invoked for a Replica that does already exist and is initialized, it will happily return that*Replica
and this is also a hot path in CRDB. It's a weird dichotomy.We should remove
newReplica
and teachtryGetOrCreateReplica
to also allow instantiating replicas that are initialized. Doing this without creating more confusion requires some care: in the common case of a raft message being handed to an initialized replica, we need to be able to look up the replica (i.e. calltryGetOrCreateReplica
) without also providing the descriptor. At the same time, should the replica need to be created, we need the descriptor to be present.Maybe not a perfect solution but here is a first stab at how we could encode that via an options struct:
Jira issue: CRDB-23224
Epic: CRDB-220
Footnotes
https://github.com/cockroachdb/cockroach/blob/e1c24edea342a83dbd250732e8d1de770cc47ae8/pkg/kv/kvserver/store_create_replica.go#L214-L221 ↩
The text was updated successfully, but these errors were encountered: