forked from cockroachdb/cockroach
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
kvserver: avoid unprotected keyspace during MergeRange
In cockroachdb#73721 we saw the following assertion fire: > kv/kvserver/replica_raftstorage.go:932 [n4,s4,r46/3:{-}] 1 unable to > remove placeholder: corrupted replicasByKey map: <nil> and [...] This is because `MergeRange` removes from the Store's in-memory map the right-hand side Replica before extending the left-hand side, leaving a gap for a snapshot to sneak in. A similar problem exists when a snapshot widens the existing range (i.e. the snapshot reflects the results of a merge). This commit closes both gaps. I verified the fix by inserting this code & calling it at the top of `(*Store).MergeRange` as well as `applySnapshot`: ```go func (s *Store) assertNoHole(ctx context.Context, from, to roachpb.RKey) func() { caller := string(debug.Stack()) if from.Equal(roachpb.RKeyMax) { // There will be a hole to the right of RKeyMax but it's just the end of // the addressable keyspace. return func() {} } // Check that there's never a gap to the right of the pre-merge LHS in replicasByKey. ctx, stopAsserting := context.WithCancel(ctx) _ = s.stopper.RunAsyncTask(ctx, "force-assertion", func(ctx context.Context) { for ctx.Err() == nil { func() { s.mu.Lock() defer s.mu.Unlock() var it replicaOrPlaceholder if err := s.mu.replicasByKey.VisitKeyRange( context.Background(), from, to, AscendingKeyOrder, func(ctx context.Context, iit replicaOrPlaceholder) error { it = iit return iterutil.StopIteration() }); err != nil { log.Fatalf(ctx, "%v", err) } if it.item != nil { return } log.Fatalf(ctx, "found hole in keyspace [%s,%s), during:\n%s", from, to, caller) }() } }) return stopAsserting } ``` ```go // (*Store).applySnapshot { var from, to roachpb.RKey if isInitialSnap { // For uninitialized replicas, there must be a placeholder that covers // the snapshot's bounds, so basically check that. A synchronous check // here would be simpler but this works well enough. d := inSnap.placeholder.Desc() from, to = d.StartKey, d.EndKey } else { // For snapshots to existing replicas, from and to usually match (i.e. // nothing is asserted) but if the snapshot spans a merge then we're // going to assert that we're transferring the keyspace from the subsumed // replicas to this replica seamlessly. d := r.Desc() from, to = d.EndKey, inSnap.Desc.EndKey } defer r.store.assertNoHole(ctx, from, to)() } // (*Store).MergeRange defer s.assertNoHole(ctx, leftRepl.Desc().EndKey, newLeftDesc.EndKey)() ``` The bug reproduced, before this fix, in `TestStoreRangeMergeTimestampCache` and `TestChangeReplicasLeaveAtomicRacesWithMerge`, covering both the snapshot and merge trigger cases. I'm not too happy to merge this without the same kind of active test coverage, but the above has a chance of false positives (if Replica gets removed while assertion loop still running) and it's unclear when exactly we would enable it (behind the `crdbtest` tag perhaps)? I am dissatisfied with a few things I realized (or rather, rediscovered) while working on this, but since this PR needs to be backported possibly to all past versions, I am refraining from any refactors. Nevertheless, here's what annoyed me: - There is no unified API for managing the store's tracked replicas. As a result, there are lots of callers meddling with the `replicasByKey`, `uninitializedRanges`, etc, maps, adding complexity. - the `replicasByKey` btree contains initialized `Replicas` and uses their key bounds. This makes for a fairly complex locking story, and in particular it's easy to deadlock when holding any replica's lock and accessing the btree. It could be easier, in conjunction with the above point, to make the btree not hold the `Replica` directly, and to mutate the btree in a critical section with calling `(*Replica).setDescLocked`. Release note (bug fix): A bug was fixed that, in very rare cases, could result in a node terminating with fatal error "unable to remove placeholder: corrupted replicasByKey map". To avoid potential data corruption, users affected by this crash should not restart the node, but instead decommission it in absentia and have it rejoin the cluster under a new NodeID.
- Loading branch information
Showing
7 changed files
with
165 additions
and
31 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters