Skip to content

Commit

Permalink
storage: stop-gap HardState clobbering on split
Browse files Browse the repository at this point in the history
Before this fix,

```
make stressrace PKG=./storage TESTS=TestStoreRangeDownReplicate \
  STRESSFLAGS='-p 16 -maxfails 1 -stderr'
```

failed in a few hundred iterations (and in <20 with some assertions and
shortcuts added in my WIP which I am going to publish as well).

See cockroachdb#7600.
  • Loading branch information
tbg committed Jul 7, 2016
1 parent ee996f9 commit cff9487
Showing 1 changed file with 29 additions and 0 deletions.
29 changes: 29 additions & 0 deletions storage/replica_command.go
Original file line number Diff line number Diff line change
Expand Up @@ -2260,6 +2260,35 @@ func (r *Replica) splitTrigger(
// Node that this needs to go into deltaMS (which is the total difference
// in bytes reported to the store in the end). We compute the RHS' stats
// from it below.
{
// TODO(tschottdorf): This is subtle. We are about to commit our batch
// and communicate the split to the Store, but we are in fact already
// writing to the "other" Range's key space (and in particular to the
// Raft state) in this batch. Between committing and telling the Store,
// we could race with an uninitialized version of our new Replica which
// might have been created by an incoming message from another node
// which already processed the split.
// This other version might be queued up (or currently in the process
// of) updating its state due to a vote cast or heartbeat responded
// to, and we can not have it do anything after this point (we are
// about to absorb its HardState).
//
// This is a stop-gap to achieve this: If there is an uninitialized
// Replica, disable it now (but leave it in the map). The split will
// then delete it after we committed.
// See #7600.
r.store.mu.Lock()
if uninitRHS := r.store.mu.uninitReplicas[split.NewDesc.RangeID]; uninitRHS != nil {
uninitRHS.mu.Lock()
uninitRHS.mu.internalRaftGroup = nil
uninitRHS.mu.destroyed = errors.Errorf(
"replica %d (range %d) disabled by split trigger",
uninitRHS.mu.replicaID, uninitRHS.RangeID,
)
uninitRHS.mu.Unlock()
}
r.store.mu.Unlock()
}
deltaMS, err = writeInitialState(batch, deltaMS, split.NewDesc)
if err != nil {
return errors.Errorf("unable to write initial state: %s", err)
Expand Down

0 comments on commit cff9487

Please sign in to comment.