Skip to content

Commit

Permalink
stop: break deadlock between Stopper.mu and Replica.mu
Browse files Browse the repository at this point in the history
Fixes cockroachdb#63761. As of cockroachdb#61279, it was possible for us to deadlock due to
inconsistent lock orderings between Stopper.mu and Replica.mu. We were
previously holding onto Stopper.mu while executing all closers,
including those that may acquire other locks. Because closers can be
defined anywhere (and may consequently grab any in-scope lock), we
should order Stopper.mu to come after all other locks in the system.

The closer added in cockroachdb#61279 iterated over all non-destroyed replicas,
locking Replica.mu to check for the replica's destroy status (in
Store.VisitReplicas). This deadlocked with the lease acquisition code
path that first grabs an exclusive lock over Replica.mu (see
InitOrJoinRequest), and uses the stopper to kick off an async task
acquiring the lease. The stopper internally locks Stopper.mu to check
whether or not it was already stopped.

Release note: None
  • Loading branch information
irfansharif committed Apr 19, 2021
1 parent 5f2c296 commit 48eedde
Show file tree
Hide file tree
Showing 2 changed files with 13 additions and 2 deletions.
3 changes: 2 additions & 1 deletion pkg/kv/kvserver/store.go
Original file line number Diff line number Diff line change
Expand Up @@ -470,7 +470,8 @@ type Store struct {

// Locking notes: To avoid deadlocks, the following lock order must be
// obeyed: baseQueue.mu < Replica.raftMu < Replica.readOnlyCmdMu < Store.mu
// < Replica.mu < Replica.unreachablesMu < Store.coalescedMu < Store.scheduler.mu.
// < Replica.mu < Replica.unreachablesMu < Store.coalescedMu
// < Store.scheduler.mu < Stopper.mu.
// (It is not required to acquire every lock in sequence, but when multiple
// locks are held at the same time, it is incorrect to acquire a lock with
// "lesser" value in this sequence after one with "greater" value).
Expand Down
12 changes: 11 additions & 1 deletion pkg/util/stop/stopper.go
Original file line number Diff line number Diff line change
Expand Up @@ -473,8 +473,18 @@ func (s *Stopper) Stop(ctx context.Context) {
s.Quiesce(ctx)

s.mu.Lock()
closers := s.mu.closers
defer s.mu.Unlock()
for _, c := range s.mu.closers {

// We can't hold on to Stopper.mu while executing closers. Closers are
// allowed to be defined anywhere, and are allowed to grab any lock
// available. As far as lock ordering is concerned, Stopper.mu comes after
// everything else to avoid any deadlocks (see comment on Store.mu).
//
// There's no concern around new closers being added; we've marked this
// stopper as `stopping` above, so any attempts to do so will be refused
// (see refuseRLocked).
for _, c := range closers {
c.Close()
}
}
Expand Down

0 comments on commit 48eedde

Please sign in to comment.