release-21.2: snapshot can sneak into RHS of split #73721

cockroach-teamcity · 2021-12-12T08:12:54Z

kv/kvserver.TestChangeReplicasLeaveAtomicRacesWithMerge failed with artifacts on master @ 22df0a6658a927e7b65dafbdfc9d790500791993:

F211212 08:12:51.323075 627789 kv/kvserver/replica_raftstorage.go:932  [n4,s4,r46/3:{-}] 1 !	/go/src/github.com/cockroachdb/cockroach/pkg/rpc/auth.go:97 +0x1c3
F211212 08:12:51.323075 627789 kv/kvserver/replica_raftstorage.go:932  [n4,s4,r46/3:{-}] 1 !google.golang.org/grpc.chainStreamInterceptors.func1.1({0x4055b40, 0xc007a01500}, {0x566e5b0, 0xc0035eab40})
F211212 08:12:51.323075 627789 kv/kvserver/replica_raftstorage.go:932  [n4,s4,r46/3:{-}] 1 !	/go/src/github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/server.go:1419 +0x87
F211212 08:12:51.323075 627789 kv/kvserver/replica_raftstorage.go:932  [n4,s4,r46/3:{-}] 1 !github.com/cockroachdb/cockroach/pkg/rpc.NewServer.func2.1({0x0, 0x8})
F211212 08:12:51.323075 627789 kv/kvserver/replica_raftstorage.go:932  [n4,s4,r46/3:{-}] 1 !	/go/src/github.com/cockroachdb/cockroach/pkg/rpc/context.go:189 +0x2d
F211212 08:12:51.323075 627789 kv/kvserver/replica_raftstorage.go:932  [n4,s4,r46/3:{-}] 1 !github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunTaskWithErr(0xc005eae1b0, {0x563aeb8, 0xc007fe67e0}, {0x427f54, 0x203001}, 0xc00880fb58)
F211212 08:12:51.323075 627789 kv/kvserver/replica_raftstorage.go:932  [n4,s4,r46/3:{-}] 1 !	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:347 +0xdd
F211212 08:12:51.323075 627789 kv/kvserver/replica_raftstorage.go:932  [n4,s4,r46/3:{-}] 1 !github.com/cockroachdb/cockroach/pkg/rpc.NewServer.func2({0x4055b40, 0xc007a01500}, {0x566e5b0, 0xc0035eab40}, 0xc014177170, 0xc006466380)
F211212 08:12:51.323075 627789 kv/kvserver/replica_raftstorage.go:932  [n4,s4,r46/3:{-}] 1 !	/go/src/github.com/cockroachdb/cockroach/pkg/rpc/context.go:188 +0xf3
F211212 08:12:51.323075 627789 kv/kvserver/replica_raftstorage.go:932  [n4,s4,r46/3:{-}] 1 !google.golang.org/grpc.chainStreamInterceptors.func1.1({0x4055b40, 0xc007a01500}, {0x566e5b0, 0xc0035eab40})
F211212 08:12:51.323075 627789 kv/kvserver/replica_raftstorage.go:932  [n4,s4,r46/3:{-}] 1 !	/go/src/github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/server.go:1419 +0x87
F211212 08:12:51.323075 627789 kv/kvserver/replica_raftstorage.go:932  [n4,s4,r46/3:{-}] 1 !google.golang.org/grpc.chainStreamInterceptors.func1({0x4055b40, 0xc007a01500}, {0x566e5b0, 0xc0035eab40}, 0xc014177170, 0x4792910)
F211212 08:12:51.323075 627789 kv/kvserver/replica_raftstorage.go:932  [n4,s4,r46/3:{-}] 1 !	/go/src/github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/server.go:1421 +0x159
F211212 08:12:51.323075 627789 kv/kvserver/replica_raftstorage.go:932  [n4,s4,r46/3:{-}] 1 !google.golang.org/grpc.(*Server).processStreamingRPC(0xc00e47a380, {0x56b4988, 0xc00274e900}, 0xc008658c60, 0xc0096ae660, 0x79c8d00, 0x0)
F211212 08:12:51.323075 627789 kv/kvserver/replica_raftstorage.go:932  [n4,s4,r46/3:{-}] 1 !	/go/src/github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/server.go:1557 +0xe9a
F211212 08:12:51.323075 627789 kv/kvserver/replica_raftstorage.go:932  [n4,s4,r46/3:{-}] 1 !google.golang.org/grpc.(*Server).handleStream(0xc00e47a380, {0x56b4988, 0xc00274e900}, 0xc008658c60, 0x0)
F211212 08:12:51.323075 627789 kv/kvserver/replica_raftstorage.go:932  [n4,s4,r46/3:{-}] 1 !	/go/src/github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/server.go:1630 +0x9e5
F211212 08:12:51.323075 627789 kv/kvserver/replica_raftstorage.go:932  [n4,s4,r46/3:{-}] 1 !google.golang.org/grpc.(*Server).serveStreams.func1.2()
F211212 08:12:51.323075 627789 kv/kvserver/replica_raftstorage.go:932  [n4,s4,r46/3:{-}] 1 !	/go/src/github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/server.go:941 +0x98
F211212 08:12:51.323075 627789 kv/kvserver/replica_raftstorage.go:932  [n4,s4,r46/3:{-}] 1 !created by google.golang.org/grpc.(*Server).serveStreams.func1
F211212 08:12:51.323075 627789 kv/kvserver/replica_raftstorage.go:932  [n4,s4,r46/3:{-}] 1 !	/go/src/github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/server.go:939 +0x294
F211212 08:12:51.323075 627789 kv/kvserver/replica_raftstorage.go:932  [n4,s4,r46/3:{-}] 1 !
F211212 08:12:51.323075 627789 kv/kvserver/replica_raftstorage.go:932  [n4,s4,r46/3:{-}] 1 !For more context, check log files in: /go/src/github.com/cockroachdb/cockroach/artifacts/logTestChangeReplicasLeaveAtomicRacesWithMerge2349777449
F211212 08:12:51.323075 627789 kv/kvserver/replica_raftstorage.go:932  [n4,s4,r46/3:{-}] 1 !
F211212 08:12:51.323075 627789 kv/kvserver/replica_raftstorage.go:932  [n4,s4,r46/3:{-}] 1 !
F211212 08:12:51.323075 627789 kv/kvserver/replica_raftstorage.go:932  [n4,s4,r46/3:{-}] 1 !****************************************************************************
F211212 08:12:51.323075 627789 kv/kvserver/replica_raftstorage.go:932  [n4,s4,r46/3:{-}] 1 !
F211212 08:12:51.323075 627789 kv/kvserver/replica_raftstorage.go:932  [n4,s4,r46/3:{-}] 1 !This node experienced a fatal error (printed above), and as a result the
F211212 08:12:51.323075 627789 kv/kvserver/replica_raftstorage.go:932  [n4,s4,r46/3:{-}] 1 !process is terminating.
F211212 08:12:51.323075 627789 kv/kvserver/replica_raftstorage.go:932  [n4,s4,r46/3:{-}] 1 !
F211212 08:12:51.323075 627789 kv/kvserver/replica_raftstorage.go:932  [n4,s4,r46/3:{-}] 1 !Fatal errors can occur due to faulty hardware (disks, memory, clocks) or a
F211212 08:12:51.323075 627789 kv/kvserver/replica_raftstorage.go:932  [n4,s4,r46/3:{-}] 1 !problem in CockroachDB. With your help, the support team at Cockroach Labs
F211212 08:12:51.323075 627789 kv/kvserver/replica_raftstorage.go:932  [n4,s4,r46/3:{-}] 1 !will try to determine the root cause, recommend next steps, and we can
F211212 08:12:51.323075 627789 kv/kvserver/replica_raftstorage.go:932  [n4,s4,r46/3:{-}] 1 !improve CockroachDB based on your report.
F211212 08:12:51.323075 627789 kv/kvserver/replica_raftstorage.go:932  [n4,s4,r46/3:{-}] 1 !
F211212 08:12:51.323075 627789 kv/kvserver/replica_raftstorage.go:932  [n4,s4,r46/3:{-}] 1 !Please submit a crash report by following the instructions here:
F211212 08:12:51.323075 627789 kv/kvserver/replica_raftstorage.go:932  [n4,s4,r46/3:{-}] 1 !
F211212 08:12:51.323075 627789 kv/kvserver/replica_raftstorage.go:932  [n4,s4,r46/3:{-}] 1 !    https://github.com/cockroachdb/cockroach/issues/new/choose
F211212 08:12:51.323075 627789 kv/kvserver/replica_raftstorage.go:932  [n4,s4,r46/3:{-}] 1 !
F211212 08:12:51.323075 627789 kv/kvserver/replica_raftstorage.go:932  [n4,s4,r46/3:{-}] 1 !If you would rather not post publicly, please contact us directly at:
F211212 08:12:51.323075 627789 kv/kvserver/replica_raftstorage.go:932  [n4,s4,r46/3:{-}] 1 !
F211212 08:12:51.323075 627789 kv/kvserver/replica_raftstorage.go:932  [n4,s4,r46/3:{-}] 1 !    [email protected]
F211212 08:12:51.323075 627789 kv/kvserver/replica_raftstorage.go:932  [n4,s4,r46/3:{-}] 1 !
F211212 08:12:51.323075 627789 kv/kvserver/replica_raftstorage.go:932  [n4,s4,r46/3:{-}] 1 !The Cockroach Labs team appreciates your feedback.


ERROR: exit status 7

1 runs completed, 1 failures, over 6m7s
context canceled

Help

See also: [How To Investigate a Go Test Failure \(internal\)](https://cockroachlabs.atlassian.net/l/c/HgfXfJgM)Parameters in this failure:

GOFLAGS=-parallel=4

/cc @cockroachdb/kv _{This test on roachdash | Improve this report!

Jira issue: CRDB-11707}

The text was updated successfully, but these errors were encountered:

tbg · 2021-12-13T08:31:41Z

F211212 08:12:51.323075 627789 kv/kvserver/replica_raftstorage.go:932  [n4,s4,r46/3:{-}] 1  unable to remove placeholder: corrupted replicasByKey map: <nil> and &{AmbientContext:{Tracer:0xc00b39baa0 eventLog:<nil> tags:0xc0077c8500 backgroundCtx:0xc007fe65a0} RangeID:44 startKey:[250] store:0xc00831d000 abortSpan:0xc00ef224d8 leaseholderStats:0xc0072c1dd0 writeStats:0xc0072c1e60 creatingReplica:0xc009b75b78 readOnlyCmdMu:{RWMutex:{w:{state:0 sema:0} writerSem:0 readerSem:0 readerCount:0 readerWait:0}} rangeStr:{strPtr:0xc00f34bfa0} connectionClass:0 raftCtx:0xc007fe65d0 raftMu:{Mutex:{Mutex:{state:1 sema:0}} stateLoader:{RangeIDPrefixBuf:[1 105 180]} sideloaded:0xc0064660c0 stateMachine:{r:0xc0034eca80 batch:{r:0xc0034eca80 sm:0xc0034ecb58 batch:0xc004197400 state:{RaftAppliedIndex:35 LeaseAppliedIndex:23 Desc:0xc00b47e2a0 Lease:0xc014b41740 TruncatedState:0xc00f80d3e0 GCThreshold:0xc00f80d370 Stats:0xc0034ecbf8 DeprecatedUsingAppliedStateKey:false Version:0xc00f80d420 RaftClosedTimestamp:{WallTime:1639296768312812135 Logical:0 Synthetic:false}} closedTimestampSetter:{lease:0xc014b41740 leaseIdx:23 leaseReq:<nil> split:false merge:false} stats:{ContainsEstimates:0 LastUpdateNanos:1639296771212240367 IntentAge:0 GCBytesAge:0 LiveBytes:0 LiveCount:0 KeyBytes:0 KeyCount:0 ValBytes:0 ValCount:0 IntentBytes:0 IntentCount:0 SeparatedIntentCount:0 SysBytes:5566 SysCount:20 AbortSpanBytes:0} maxTS:{WallTime:0 Logical:0 Synthetic:false} changeRemovesReplica:false entries:0 emptyEntries:0 mutations:0 start:{wall:320480044 ext:63774893571 loc:<nil>}} ephemeralBatch:{r:<nil> state:{RaftAppliedIndex:0 LeaseAppliedIndex:0 Desc:<nil> Lease:<nil> TruncatedState:<nil> GCThreshold:<nil> Stats:<nil> DeprecatedUsingAppliedStateKey:false Version:<nil> RaftClosedTimestamp:{WallTime:0 Logical:0 Synthetic:false}}} stats:{batchesProcessed:1 entriesProcessed:2 stateAssertions:0 numEmptyEntries:0 numConfChangeEntries:0}} decoder:{r:0xc0034eca80 cmdBuf:{len:3 head:0xc00345d000 tail:0xc00345d000 free:{iters:[{head:{idx:3 buf:0xc0034ecd50 node:0xc00345d000} tail:{idx:3 buf:0xc0034ecd50 node:0xc00345d000}} {head:{idx:0 buf:0xc0034ecd50 node:0xc00345d000} tail:{idx:1 buf:0xc0034ecd50 node:0xc00345d000}} {head:{idx:0 buf:0xc0034ecd50 node:0xc00345d000} tail:{idx:0 buf:0xc0034ecd50 node:0xc00345d000}}] inUse:[true true true]}}}} leaseHistory:0xc007fe6510 concMgr:0xc00a9d5180 tenantLimiter:<nil> tenantMetricsRef:0xc016dfd950 sideTransportClosedTimestamp:{rangeID:44 receiver:0xc008cdb880 mu:{RWMutex:{RWMutex:{w:{state:0 sema:0} writerSem:0 readerSem:0 readerCount:0 readerWait:0}} cur:{ts:{WallTime:0 Logical:0 Synthetic:false} lai:0} next:{ts:{WallTime:0 Logical:0 Synthetic:false} lai:0}}} mu:{RWMutex:{RWMutex:{w:{state:0 sema:0} writerSem:0 readerSem:0 readerCount:0 readerWait:0}} destroyStatus:{reason:0 err:<nil>} quiescent:false laggingFollowersOnQuiesce:[] mergeComplete:<nil> mergeTxnID:[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0] freezeStart:{WallTime:0 Logical:0 Synthetic:false} state:{RaftAppliedIndex:35 LeaseAppliedIndex:23 Desc:0xc00b47e2a0 Lease:0xc014b41740 TruncatedState:0xc00f80d3e0 GCThreshold:0xc00f80d370 Stats:0xc003d57600 DeprecatedUsingAppliedStateKey:false Version:0xc00f80d420 RaftClosedTimestamp:{WallTime:1639296768312812135 Logical:0 Synthetic:false}} lastIndex:36 lastTerm:6 snapshotLogTruncationConstraints:map[] raftLogSize:5571 raftLogSizeTrusted:false raftLogLastCheckSize:0 pendingLeaseRequest:{repl:0xc0034eca80 llHandles:map[] cancelLocked:<nil> nextLease:{Start:{WallTime:0 Logical:0 Synthetic:false} Expiration:<nil> Replica:{NodeID:0 StoreID:0 ReplicaID:0 Type:<nil>} DeprecatedStartStasis:<nil> ProposedTS:<nil> Epoch:0 Sequence:0 AcquisitionType:0}} minLeaseProposedTS:{WallTime:0 Logical:0 Synthetic:false} conf:{RangeMinBytes:134217728 RangeMaxBytes:536870912 GCPolicy:{TTLSeconds:90000} GlobalReads:false NumReplicas:3 NumVoters:0 Constraints:[] VoterConstraints:[] LeasePreferences:[]} proposalBuf:{p:0xc0034eca80 clock:0xc002d182d0 settings:0xc0009fe000 evalTracker:0xc007fe64e0 full:{noCopy:{} L:0xc0034ece90 notify:{wait:0 notify:0 lock:0 head:<nil> tail:<nil>} checker:0} arr:{small:[<nil> <nil> <nil> <nil>] large:[] shrink:0} allocatedIdx:0 assignedLAI:0 assignedClosedTimestamp:{WallTime:0 Logical:0 Synthetic:false} scratchFooter:{MaxLeaseIndex:0 ClosedTimestamp:{WallTime:0 Logical:0 Synthetic:false}} testing:{leaseIndexFilter:<nil> insertFilter:<nil> submitProposalFilter:<nil> allowLeaseProposalWhenNotLeader:false dontCloseTimestamps:false}} proposals:map[] applyingEntries:true internalRaftGroup:0xc007fe6c90 replicaID:4 tombstoneMinReplicaID:0 leaderID:1 lastReplicaAdded:4 lastReplicaAddedTime:{wall:245571740 ext:63774893571 loc:<nil>} lastUpdateTimes:map[] lastToReplica:{NodeID:4 StoreID:4 ReplicaID:4 Type:<nil>} lastFromReplica:{NodeID:1 StoreID:1 ReplicaID:1 Type:<nil>} checksums:map[] proposalQuota:<nil> proposalQuotaBaseIndex:0 quotaReleaseQueue:[] ticks:0 droppedMessages:0 stateLoader:{RangeIDPrefixBuf:[1 105 180]} cachedProtectedTS:{readAt:{WallTime:0 Logical:0 Synthetic:false} earliestRecord:<nil>} largestPreviousMaxRangeSizeBytes:0 spanConfigExplicitlySet:true failureToGossipSystemConfig:false tenantID:{id:1} closedTimestampSetter:{lease:0xc014b41740 leaseIdx:23 leaseReq:<nil> split:false merge:false}} rangefeedMu:{RWMutex:{RWMutex:{w:{state:0 sema:0} writerSem:0 readerSem:0 readerCount:0 readerWait:0}} proc:<nil> opFilter:<nil>} splitQueueThrottle:{N:5000000000 Mutex:{Mutex:{state:0 sema:0}} lastProcessed:{wall:0 ext:0 loc:<nil>}} mergeQueueThrottle:{N:5000000000 Mutex:{Mutex:{state:0 sema:0}} lastProcessed:{wall:247508751 ext:63774893571 loc:<nil>}} loadBasedSplitter:{intn:0x520bc0 qpsThreshold:0x19bddc0 qpsRetention:0x19bdd60 mu:{Mutex:{Mutex:{state:0 sema:0}} lastQPSRollover:{wall:0 ext:0 loc:<nil>} lastQPS:0 count:0 maxQPS:{windows:[0 0 0 0 0 0] curIdx:0 curStart:{wall:0 ext:0 loc:<nil>} lastReset:{wall:0 ext:0 loc:<nil>} minRetention:0} splitFinder:<nil> lastSplitSuggestion:{wall:0 ext:0 loc:<nil>}}} unreachablesMu:{Mutex:{Mutex:{state:0 sema:0}} remotes:map[]} protectedTimestampMu:{Mutex:{Mutex:{state:0 sema:0}} minStateReadTimestamp:{WallTime:0 Logical:0 Synthetic:false} pendingGCThreshold:{WallTime:0 Logical:0 Synthetic:false}}} overlapped

tbg · 2021-12-13T08:34:47Z

This assertion unfortunately isn't logging the right thing:

cockroach/pkg/kv/kvserver/store_remove_replica.go

Lines 361 to 363 in 84e3fdf

    
           if it := s.getOverlappingKeyRangeLocked(&placeholder.rangeDesc); it.item != nil { 
        
           	return false, errors.AssertionFailedf("corrupted replicasByKey map: %+v and %+v overlapped", it.ph, it.item) 
        
           }

It is meant to log placeholder. I'll see if I can get a repro.

tbg · 2021-12-13T09:18:02Z

Just going from the log messages, I think what is happening is an overlap between two different ranges:

r44 merges r45 (\xfaa-/Max)
r46 is created via a re-split at \xfaa of r44
on n4, we see a snapshot arrive for r46 (the splitTriggerHelper is supposed to suppress that; the left neighbor is there, but maybe there is a bug in the delay heuristic because r44 hasn't applied the preceding merge yet, potentially?
we make an uninitialized replica r46/3 in response to the snapshot request (see [n4,s4,r46/3:{-}] log tag on fatal msg)
while applying the snapshot, when the placeholder r46:\xfaa-/Max gets removed, we find that it overlaps the replica r44 (see sideTransportClosedTimestamp:{rangeID:‹44› in assertion)

I am confused as to why the snapshot is even getting received. The way it should work is that we check for overlap before accepting the snapshot:

cockroach/pkg/kv/kvserver/store_snapshot.go

Lines 606 to 610 in 541adf9

    
           if placeholder != nil { 
        
           	if err := s.addPlaceholderLocked(placeholder); err != nil { 
        
           		return roachpb.NewError(err) 
        
           	} 
        
           }

I can see in the logs that n4 removes the subsumed replica r45/4 just a hair before it receives the snapshot, which comes from this place in MergeRange:

cockroach/pkg/kv/kvserver/store_merge.go

Lines 49 to 58 in 84e3fdf

    
           // Note that we were called (indirectly) from raft processing so we must 
        
           // call removeInitializedReplicaRaftMuLocked directly to avoid deadlocking 
        
           // on the right-hand replica's raftMu. 
        
           if err := s.removeInitializedReplicaRaftMuLocked(ctx, rightRepl, rightDesc.NextReplicaID, RemoveOptions{ 
        
           	// The replica was destroyed by the tombstones added to the batch in 
        
           	// runPreApplyTriggersAfterStagingWriteBatch. 
        
           	DestroyData: false, 
        
           }); err != nil { 
        
           	return errors.Errorf("cannot remove range: %s", err) 
        
           }

Interestingly, we update (widen) the left hand side's descriptor only at the very end of MergeRange:

cockroach/pkg/kv/kvserver/store_merge.go

Lines 119 to 122 in 84e3fdf

    
           	// Update the subsuming range's descriptor. 
        
           	leftRepl.setDescRaftMuLocked(ctx, &newLeftDesc) 
        
           	return nil 
        
           }

So the conjecture goes like this, from the point of view of n4:

a snapshot arrives for r46 on \xfaa-/Max
the merge trigger on r44 starts executing and removes the replica r45 (\xfaa-/Max)
the snapshot checks for overlap, but r44 at the time is something-\xfaa, so it doesn't overlap, and the replica for r46 is already unliked so it doesn't conflict either; we accept the snapshot
the merge trigger finishes, updating r44 to something-/Max
when the snapshot is applied, it wants to remove its placeholder r46: \xfaa-/Max but in the process finds that this placeholder overlaps something-/Max (r44).

If this checks out, the bug is in MergeRange: it leaves the keyspace belonging to the right-hand side unprotected for a brief period of time. This isn't benign: the snapshot is basically clobbering the right-hand side.

I managed to reproduce the bug on my gceworker with vanilla

make stress PKG=./pkg/kv/kvserver TESTS=TestChangeReplicasLeaveAtomicRacesWithMerge

1297 runs completed, 1 failures, over 3m49s

erikgrinaker · 2021-12-13T09:20:48Z

I don't believe we've seen this failure before? Any thoughts on why it's cropping up now?

tbg · 2021-12-13T09:23:06Z

I'll bisect it now. Without having checked, my guess would be that the new thing is the snapshot occurring. Maybe we accidentally disable the snapshot delay heuristic now. One of @nvanbenschoten's recent quiescing cleanups might have done it. Pure pure pure speculation. We'll find out. Don't think the bug is new.

tbg · 2021-12-13T09:46:49Z

Still bisecting, but with this diff it fails pretty much every time (not surprising, but still)

diff --git a/pkg/kv/kvserver/store_merge.go b/pkg/kv/kvserver/store_merge.go
index e96737888f..c962e708ae 100644
--- a/pkg/kv/kvserver/store_merge.go
+++ b/pkg/kv/kvserver/store_merge.go
@@ -16,6 +16,8 @@ import (
 	"github.com/cockroachdb/cockroach/pkg/kv/kvserver/readsummary/rspb"
 	"github.com/cockroachdb/cockroach/pkg/roachpb"
 	"github.com/cockroachdb/cockroach/pkg/util/hlc"
+	"github.com/cockroachdb/cockroach/pkg/util/iterutil"
+	"github.com/cockroachdb/cockroach/pkg/util/log"
 	"github.com/cockroachdb/errors"
 )
 
@@ -38,6 +40,33 @@ func (s *Store) MergeRange(
 			newLeftDesc.EndKey, oldLeftDesc.EndKey)
 	}
 
+	// Check that there's never a gap to the right of the pre-merge LHS in replicasByKey.
+	assertCtx, stopAsserting := context.WithCancel(ctx)
+	assertStart := leftRepl.Desc().EndKey
+	assertEnd := assertStart.Next()
+	defer stopAsserting()
+	_ = s.stopper.RunAsyncTask(assertCtx, "force-assertion", func(ctx context.Context) {
+		for {
+			func() {
+				s.mu.Lock()
+				defer s.mu.Unlock()
+				var it replicaOrPlaceholder
+				if err := s.mu.replicasByKey.VisitKeyRange(
+					context.Background(), assertStart, assertEnd, AscendingKeyOrder,
+					func(ctx context.Context, iit replicaOrPlaceholder) error {
+						it = iit
+						return iterutil.StopIteration()
+					}); err != nil {
+					log.Fatalf(ctx, "%v", err)
+				}
+				if it.item != nil {
+					return
+				}
+				log.Fatalf(ctx, "found hole in keyspace desipte pending merge")
+			}()
+		}
+	})
+
 	rightRepl, err := s.GetReplica(rightDesc.RangeID)
 	if err != nil {
 		return err

tbg · 2021-12-13T15:17:45Z

It's hard to really bisect this down since the test manages to hit various other fatal errors, for example

F211213 15:08:54.680980 5224 kv/kvserver/store_raft.go:511 [n4,s4,r44/4:/{Table/Max-Max}] 1 unable to acquire split lock: %!v(PANIC=SafeFormatter method: runtime error: invalid memory address or nil pointer dereference)

As far as I can tell the problem (leaving the keyspace unguarded) has existed for some time and definitely predates #72471, which I initially thought introduced this. I'm going to focus my energy on fixing things up so that this test does not fatal on a 20 node cluster in hours.

erikgrinaker · 2021-12-13T15:22:55Z

Interesting, thanks for digging into this. Have you tried a repro on a release branch, i.e. 21.2?

tbg · 2021-12-13T15:25:21Z

Good idea, will try that tomorrow.

tbg · 2021-12-14T09:58:42Z

release-21.2:

make stress PKG=./pkg/kv/kvserver TESTS=TestChangeReplicasLeaveAtomicRacesWithMerg STRESSFLAGS='-timeout 8m' 2>&1 | tee stress.log
[...]
F211214 09:54:38.686702 4857 kv/kvserver/store_split.go:154  [n2,s2,r44/2:{/Table/Max-\xfaa}] 1  [n2,s2,r44/2:{/Table/Max-\xfaa}]: failed to update Store after split: unable to add replica [n2,s2,r46/2:{\xfaa-/Max}]: [n2,s2]: cannot addReplicaInternalLocked; range [n2,s2,r46/2:{\xfaa-/Max}] has overlapping range r46:{\xfaa-/Max} [(n1,s1):1, (n2,s2):2, (n4,s4):3, next=4, gen=12, sticky=9223372036.854775807,0]

1443 runs completed, 1 failures, over 4m21s

In cockroachdb#73721 we saw the following assertion fire: > kv/kvserver/replica_raftstorage.go:932 [n4,s4,r46/3:{-}] 1 unable to > remove placeholder: corrupted replicasByKey map: <nil> and [...] This is because `MergeRange` removes from the Store's in-memory map the right-hand side Replica before extending the left-hand side, leaving a gap for a snapshot to sneak in. A similar problem exists when a snapshot widens the existing range (i.e. the snapshot reflects the results of a merge). This commit closes both gaps. I verified the fix by inserting this code & calling it at the top of `(*Store).MergeRange` as well as `applySnapshot`: ```go func (s *Store) assertNoHole(ctx context.Context, from, to roachpb.RKey) func() { caller := string(debug.Stack()) if from.Equal(roachpb.RKeyMax) { // There will be a hole to the right of RKeyMax but it's just the end of // the addressable keyspace. return func() {} } // Check that there's never a gap to the right of the pre-merge LHS in replicasByKey. ctx, stopAsserting := context.WithCancel(ctx) _ = s.stopper.RunAsyncTask(ctx, "force-assertion", func(ctx context.Context) { for ctx.Err() == nil { func() { s.mu.Lock() defer s.mu.Unlock() var it replicaOrPlaceholder if err := s.mu.replicasByKey.VisitKeyRange( context.Background(), from, to, AscendingKeyOrder, func(ctx context.Context, iit replicaOrPlaceholder) error { it = iit return iterutil.StopIteration() }); err != nil { log.Fatalf(ctx, "%v", err) } if it.item != nil { return } log.Fatalf(ctx, "found hole in keyspace [%s,%s), during:\n%s", from, to, caller) }() } }) return stopAsserting } ``` ```go // (*Store).applySnapshot { var from, to roachpb.RKey if isInitialSnap { // For uninitialized replicas, there must be a placeholder that covers // the snapshot's bounds, so basically check that. A synchronous check // here would be simpler but this works well enough. d := inSnap.placeholder.Desc() from, to = d.StartKey, d.EndKey } else { // For snapshots to existing replicas, from and to usually match (i.e. // nothing is asserted) but if the snapshot spans a merge then we're // going to assert that we're transferring the keyspace from the subsumed // replicas to this replica seamlessly. d := r.Desc() from, to = d.EndKey, inSnap.Desc.EndKey } defer r.store.assertNoHole(ctx, from, to)() } // (*Store).MergeRange defer s.assertNoHole(ctx, leftRepl.Desc().EndKey, newLeftDesc.EndKey)() ``` The bug reproduced, before this fix, in `TestStoreRangeMergeTimestampCache` and `TestChangeReplicasLeaveAtomicRacesWithMerge`, covering both the snapshot and merge trigger cases. I'm not too happy to merge this without the same kind of active test coverage, but the above has a chance of false positives (if Replica gets removed while assertion loop still running) and it's unclear when exactly we would enable it (behind the `crdbtest` tag perhaps)? I am dissatisfied with a few things I realized (or rather, rediscovered) while working on this, but since this PR needs to be backported possibly to all past versions, I am refraining from any refactors. Nevertheless, here's what annoyed me: - There is no unified API for managing the store's tracked replicas. As a result, there are lots of callers meddling with the `replicasByKey`, `uninitializedRanges`, etc, maps, adding complexity. - the `replicasByKey` btree contains initialized `Replicas` and uses their key bounds. This makes for a fairly complex locking story, and in particular it's easy to deadlock when holding any replica's lock and accessing the btree. It could be easier, in conjunction with the above point, to make the btree not hold the `Replica` directly, and to mutate the btree in a critical section with calling `(*Replica).setDescLocked`. Release note (bug fix): A bug was fixed that, in very rare cases, could result in a node terminating with fatal error "unable to remove placeholder: corrupted replicasByKey map". To avoid potential data corruption, users affected by this crash should not restart the node, but instead decommission it in absentia and have it rejoin the cluster under a new NodeID.

In cockroachdb#73721 we saw the following assertion fire: > kv/kvserver/replica_raftstorage.go:932 [n4,s4,r46/3:{-}] 1 unable to > remove placeholder: corrupted replicasByKey map: <nil> and [...] This is because `MergeRange` removes from the Store's in-memory map the right-hand side Replica before extending the left-hand side, leaving a gap for a snapshot to sneak in. A similar problem exists when a snapshot widens the existing range (i.e. the snapshot reflects the results of a merge). This commit closes both gaps. I verified the fix by calling the newly-introduced (but promptly disabled, see comment within) `(*Store).maybeAssertNoHole` from both `(*Store).MergeRange` and `applySnapshot`. The bug reproduced via this assertion, before this fix, in `TestStoreRangeMergeTimestampCache` and `TestChangeReplicasLeaveAtomicRacesWithMerge`, covering both the snapshot and merge trigger cases. I'm not too happy to merge this without the same kind of active test coverage, but since this will require a backport, it's better not to rattle the cage too much. I filed cockroachdb#74384 to clean this up fully. Release note (bug fix): A bug was fixed that, in very rare cases, could result in a node terminating with fatal error "unable to remove placeholder: corrupted replicasByKey map". To avoid potential data corruption, users affected by this crash should not restart the node, but instead decommission it in absentia and have it rejoin the cluster under a new NodeID.

73734: kvserver: avoid unprotected keyspace during MergeRange r=erikgrinaker a=tbg In #73721 we saw the following assertion fire: > kv/kvserver/replica_raftstorage.go:932 [n4,s4,r46/3:{-}] 1 unable to > remove placeholder: corrupted replicasByKey map: <nil> and [...] This is because `MergeRange` removes from the Store's in-memory map the right-hand side Replica before extending the left-hand side, leaving a gap for a snapshot to sneak in. A similar problem exists when a snapshot widens the existing range (i.e. the snapshot reflects the results of a merge). This commit closes both gaps. I verified the fix by calling the newly-introduced (but promptly disabled, see comment within) `(*Store).maybeAssertNoHole` from both `(*Store).MergeRange` and `applySnapshot`. The bug reproduced via this assertion, before this fix, in `TestStoreRangeMergeTimestampCache` and `TestChangeReplicasLeaveAtomicRacesWithMerge`, covering both the snapshot and merge trigger cases. I'm not too happy to merge this without the same kind of active test coverage, but since this will require a backport, it's better not to rattle the cage too much. I filed #74384 to clean this up fully. Release note (bug fix): A bug was fixed that, in very rare cases, could result in a node terminating with fatal error "unable to remove placeholder: corrupted replicasByKey map". To avoid potential data corruption, users affected by this crash should not restart the node, but instead decommission it in absentia and have it rejoin the cluster under a new NodeID. Co-authored-by: Tobias Grieger <[email protected]>

tbg · 2022-01-04T12:41:07Z

We need to backport #73734 to release-21.2. That PR is currently baking on master. 21.2.3 is just getting released. The plan is that, once 21.2.4's sha has been chosen, we backport to release-21.2 to maximize bake time for 21.2.5.

tbg · 2022-01-04T12:42:14Z

The backport skews with #72471, btw. Just a note for future self.

In cockroachdb#73721 we saw the following assertion fire: > kv/kvserver/replica_raftstorage.go:932 [n4,s4,r46/3:{-}] 1 unable to > remove placeholder: corrupted replicasByKey map: <nil> and [...] This is because `MergeRange` removes from the Store's in-memory map the right-hand side Replica before extending the left-hand side, leaving a gap for a snapshot to sneak in. A similar problem exists when a snapshot widens the existing range (i.e. the snapshot reflects the results of a merge). This commit closes both gaps. I verified the fix by calling the newly-introduced (but promptly disabled, see comment within) `(*Store).maybeAssertNoHole` from both `(*Store).MergeRange` and `applySnapshot`. The bug reproduced via this assertion, before this fix, in `TestStoreRangeMergeTimestampCache` and `TestChangeReplicasLeaveAtomicRacesWithMerge`, covering both the snapshot and merge trigger cases. I'm not too happy to merge this without the same kind of active test coverage, but since this will require a backport, it's better not to rattle the cage too much. I filed cockroachdb#74384 to clean this up fully. Release note (bug fix): A bug was fixed that, in very rare cases, could result in a node terminating with fatal error "unable to remove placeholder: corrupted replicasByKey map". To avoid potential data corruption, users affected by this crash should not restart the node, but instead decommission it in absentia and have it rejoin the cluster under a new NodeID.

tbg · 2022-06-21T19:16:53Z

To follow up on the above, we never backported to 21.2 and and this point it is unlikely that we will.

erikgrinaker · 2023-03-19T20:46:28Z

21.2 is no longer supported, closing this out.

cockroach-teamcity added branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. labels Dec 12, 2021

tbg mentioned this issue Dec 13, 2021

kvserver: avoid unprotected keyspace during MergeRange #73734

Merged

tbg mentioned this issue Jan 3, 2022

kvserver: simplify replicasByKey handling #74384

Closed

tbg changed the title ~~kv/kvserver: TestChangeReplicasLeaveAtomicRacesWithMerge failed~~ release-21.2: snapshot can sneak into RHS of split Jan 4, 2022

tbg removed the branch-master Failures and bugs on the master branch. label Jan 4, 2022

tbg self-assigned this Jan 4, 2022

tbg removed their assignment May 31, 2022

erikgrinaker closed this as completed Mar 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

release-21.2: snapshot can sneak into RHS of split #73721

release-21.2: snapshot can sneak into RHS of split #73721

cockroach-teamcity commented Dec 12, 2021 •

edited by cockroach-jira-scripts

Loading

tbg commented Dec 13, 2021

tbg commented Dec 13, 2021

tbg commented Dec 13, 2021 •

edited

Loading

erikgrinaker commented Dec 13, 2021

tbg commented Dec 13, 2021 •

edited

Loading

tbg commented Dec 13, 2021

tbg commented Dec 13, 2021

erikgrinaker commented Dec 13, 2021

tbg commented Dec 13, 2021

tbg commented Dec 14, 2021

tbg commented Jan 4, 2022

tbg commented Jan 4, 2022

tbg commented Jun 21, 2022

erikgrinaker commented Mar 19, 2023

release-21.2: snapshot can sneak into RHS of split #73721

release-21.2: snapshot can sneak into RHS of split #73721

Comments

cockroach-teamcity commented Dec 12, 2021 • edited by cockroach-jira-scripts Loading

tbg commented Dec 13, 2021

tbg commented Dec 13, 2021

tbg commented Dec 13, 2021 • edited Loading

erikgrinaker commented Dec 13, 2021

tbg commented Dec 13, 2021 • edited Loading

tbg commented Dec 13, 2021

tbg commented Dec 13, 2021

erikgrinaker commented Dec 13, 2021

tbg commented Dec 13, 2021

tbg commented Dec 14, 2021

tbg commented Jan 4, 2022

tbg commented Jan 4, 2022

tbg commented Jun 21, 2022

erikgrinaker commented Mar 19, 2023

cockroach-teamcity commented Dec 12, 2021 •

edited by cockroach-jira-scripts

Loading

tbg commented Dec 13, 2021 •

edited

Loading

tbg commented Dec 13, 2021 •

edited

Loading