Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: centralize all replica creation in tryGetOrCreateReplicaLocked #94911

Open
tbg opened this issue Jan 9, 2023 · 1 comment
Open
Labels
C-cleanup Tech debt, refactors, loose ends, etc. Solution not expected to significantly change behavior. T-kv KV Team

Comments

@tbg
Copy link
Member

tbg commented Jan 9, 2023

We currently have two ways in which a replica is created. This is causing unnecessary complexity is part of a conglomerate of organically grown (read: messy and hard to grok) code around replica lifecycle management.

The first, and less important, way to create a replica is newReplica,

// newReplica constructs a new Replica. If the desc is initialized, the store
// must be present in it and the corresponding replica descriptor must have
// replicaID as its ReplicaID.
func newReplica(
ctx context.Context, desc *roachpb.RangeDescriptor, store *Store, replicaID roachpb.ReplicaID,
) (*Replica, error) {
repl := newUnloadedReplica(ctx, desc, store, replicaID)
repl.raftMu.Lock()
defer repl.raftMu.Unlock()
repl.mu.Lock()
defer repl.mu.Unlock()
if err := repl.loadRaftMuLockedReplicaMuLocked(desc); err != nil {
return nil, err
}
return repl, nil
}

which, other than a few testing callers, has only one usage, during (*Store).Start:

rep, err := newReplica(ctx, &desc, s, replicaDesc.ReplicaID)
if err != nil {
return err
}
// We can't lock s.mu across NewReplica due to the lock ordering
// constraint (*Replica).raftMu < (*Store).mu. See the comment on
// (Store).mu.
s.mu.Lock()
err = s.addReplicaInternalLocked(rep)
s.mu.Unlock()
if err != nil {
return err
}

In particular, note that newReplica does not register the *Replica with the *Store, and that it only works to create an initialized (i.e. knows its descriptor) Replica, and it is very light on checks. For example, it doesn't check range tombstones or tries to assert invariants we know should hold for initialized replicas, and it leaves updating all of the relevant store metrics to the caller. In particular, if the replica overlaps an existing one, it won't scream; after all, the caller has to register with the store. But by that time it will be too late. This method feels a lot like a testing helper that has slipped into production (though this is not how it went, rather it's very old code).

On the other hand, we have tryGetOrCreateReplicaLocked. This is used all of the other time to create replicas; checks tombstones, generally does everything right, but it will only ever create uninitialized replicas and also asserts that it won't ever be invoked for a state that looks initialized1. And yet, when invoked for a Replica that does already exist and is initialized, it will happily return that *Replica and this is also a hot path in CRDB. It's a weird dichotomy.

We should remove newReplica and teach tryGetOrCreateReplica to also allow instantiating replicas that are initialized. Doing this without creating more confusion requires some care: in the common case of a raft message being handed to an initialized replica, we need to be able to look up the replica (i.e. call tryGetOrCreateReplica) without also providing the descriptor. At the same time, should the replica need to be created, we need the descriptor to be present.

Maybe not a perfect solution but here is a first stab at how we could encode that via an options struct:

type GetOrCreateOptions struct {
	RangeID storage.FullReplicaID // required

	// AssertExists returns an assertion failure in case the Replica is not present
	// in the Store.
	AssertExists bool
	// AssertInitialized verifies that if the Replica exists, it is initialized, or
	// if it doesn't exist, that the Desc field is populated and thus an initialized
	// Replica is created.
	//
	// An initialized Replica has a RangeDescriptor with a nontrivial Span and a
	// nontrivial TruncatedState.
	AssertInitialized bool
	// AssertUninitialized verifies that if the Replica exists, it is uninitialized
	// (which includes verifying that the Desc field below is zero)
	//
	// An uninitialized Replica has a RangeDescriptor with a zero Span and a trivial
	// TruncatedState.
	AssertUninitialized bool

	// Desc is passed in for callers that expect to create an initialized Replica.
	// An error will result if Desc is not passed in but the storage indicates that
	// the Replica that is being created should be an initialized one.
	Desc *roachpb.RangeDescriptor

	// FromReplica optionally indicates a remote peer from which a message was
	// received, prompting the current Replica lookup. For initialized Replicas,
	// this allows checking whether FromReplica is still a member of the range.
	// Should this check fail, a ReplicaTooOldError will be returned instead of
	// returning or creating a Replica.
	FromReplica roachpb.ReplicaDescriptor
}

Jira issue: CRDB-23224
Epic: CRDB-220

Footnotes

  1. https://github.com/cockroachdb/cockroach/blob/e1c24edea342a83dbd250732e8d1de770cc47ae8/pkg/kv/kvserver/store_create_replica.go#L214-L221

@tbg tbg added C-cleanup Tech debt, refactors, loose ends, etc. Solution not expected to significantly change behavior. T-kv-replication labels Jan 9, 2023
@blathers-crl
Copy link

blathers-crl bot commented Jan 9, 2023

cc @cockroachdb/replication

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-cleanup Tech debt, refactors, loose ends, etc. Solution not expected to significantly change behavior. T-kv KV Team
Projects
No open projects
Status: Incoming
Development

No branches or pull requests

1 participant