Skip to content

Commit

Permalink
stop: break deadlock between Stopper.mu and Replica.mu
Browse files Browse the repository at this point in the history
Fixes cockroachdb#63761. As of cockroachdb#61279, it was possible for us to deadlock due to
inconsistent lock orderings between Stopper.mu and Replica.mu. We were
previously holding onto Stopper.mu while executing all closers,
including those that may acquire other locks. Because closers can be
defined anywhere (and may consequently grab any in-scope lock), we
should order Stopper.mu to come after all other locks in the system.

The closer added in cockroachdb#61279 iterated over all non-destroyed replicas,
locking Replica.mu to check for the replica's destroy status (in
Store.VisitReplicas). This deadlocked with the lease acquisition code
path that first grabs an exclusive lock over Replica.mu (see
InitOrJoinRequest), and uses the stopper to kick off an async task
acquiring the lease. The stopper internally locks Stopper.mu to check
whether or not it was already stopped.

Release note: None
  • Loading branch information
irfansharif committed Apr 19, 2021
1 parent 244b2dc commit 5e3f8c6
Showing 1 changed file with 3 additions and 2 deletions.
5 changes: 3 additions & 2 deletions pkg/util/stop/stopper.go
Original file line number Diff line number Diff line change
Expand Up @@ -472,8 +472,9 @@ func (s *Stopper) Stop(ctx context.Context) {

s.Quiesce(ctx)

s.mu.Lock()
defer s.mu.Unlock()
// Run the closers without holding s.mu. There's no concern around new
// closers being added; we've marked this stopper as `stopping` above, so
// any attempts to do so will be refused.
for _, c := range s.mu.closers {
c.Close()
}
Expand Down

0 comments on commit 5e3f8c6

Please sign in to comment.