kvserver: split with uninitialized RHS can race with raft changes to Term and Vote #75918
Labels
A-kv-replication
Relating to Raft, consensus, and coordination.
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
This came up in the review for #75761 (review)
During a split, if the right replica has been removed and added back (as an uninitialized replica), we load the HardState so that we can clear all the RHS state (including the HardState) and write back the HardState we have read. This peculiar dance accommodates the fact that our clearing of key ranges is coarse (if we could spare clearing the HardState we wouldn't need to read and write it).
cockroach/pkg/kv/kvserver/store_split.go
Lines 84 to 106 in a950be5
We know HardState.Commit cannot advance since the RHS cannot apply a snapshot yet. But there is nothing preventing a concurrent change to HardState.{Term,Vote} that we would accidentally undo here.
Discussion:
[tbg] We hold rightRepl.raftMu (this is not totally clear from looking at this method, but look how we access rightRepl.raftMu.stateLoader above, consider adding a comment; I think the lock is acquired in maybeAcquireSplitLock), and you need to hold that lock to mutate the HardState. We loaded the HardState just above, and are writing it again, so everything is still there.
[sumeer] maybeAcquireSplitLock calls getOrCreateReplica with the replicaID that we are trying to create. My reading of tryGetOrCreateReplica is that if it finds a Replica that is newer, it will return nil
cockroach/pkg/kv/kvserver/store_create_replica.go
Lines 118 to 124 in c4f15d6
https://github.com/cockroachdb/cockroach/blob/master/pkg/kv/kvserver/replica_raft.go#L1828-L1832 without acquiring any lock.
Then in the code here we will look up the Replica using the RangeID and will find this newer Replica which isn't locked.
[tbg]
Ouch, yes, you are right. Throw in a rightRepl.raftMu.AssertHeld() and hopefully some test will fail under race. Then
hopefully fixes that failure?
cc: @tbg @erikgrinaker
Jira issue: CRDB-12877
The text was updated successfully, but these errors were encountered: