Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release-21.1: stop: break deadlock between Stopper.mu and Replica.mu #63871

Merged

Conversation

irfansharif
Copy link
Contributor

@irfansharif irfansharif commented Apr 19, 2021

Backport 1/1 commits from #63867.

/cc @cockroachdb/release


Fixes #63761. As of #61279, it was possible for us to deadlock due to
inconsistent lock orderings between Stopper.mu and Replica.mu. We were
previously holding onto Stopper.mu while executing all closers,
including those that may acquire other locks. Because closers can be
defined anywhere (and may consequently grab any in-scope lock), we
should order Stopper.mu to come after all other locks in the system.

The closer added in #61279 iterated over all non-destroyed replicas,
locking Replica.mu to check for the replica's destroy status (in
Store.VisitReplicas). This deadlocked with the lease acquisition code
path that first grabs an exclusive lock over Replica.mu (see
InitOrJoinRequest), and uses the stopper to kick off an async task
acquiring the lease. The stopper internally locks Stopper.mu to check
whether or not it was already stopped.

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

Fixes cockroachdb#63761. As of cockroachdb#61279, it was possible for us to deadlock due to
inconsistent lock orderings between Stopper.mu and Replica.mu. We were
previously holding onto Stopper.mu while executing all closers,
including those that may acquire other locks. Because closers can be
defined anywhere (and may consequently grab any in-scope lock), we
should order Stopper.mu to come after all other locks in the system.

The closer added in cockroachdb#61279 iterated over all non-destroyed replicas,
locking Replica.mu to check for the replica's destroy status (in
Store.VisitReplicas). This deadlocked with the lease acquisition code
path that first grabs an exclusive lock over Replica.mu (see
InitOrJoinRequest), and uses the stopper to kick off an async task
acquiring the lease. The stopper internally locks Stopper.mu to check
whether or not it was already stopped.

Release note: None
@irfansharif
Copy link
Contributor Author

Flaked on #63844.

@irfansharif irfansharif requested review from tbg and andreimatei April 20, 2021 04:48
@tbg tbg changed the title stop: break deadlock between Stopper.mu and Replica.mu release-20.1: stop: break deadlock between Stopper.mu and Replica.mu Apr 20, 2021
@tbg tbg changed the title release-20.1: stop: break deadlock between Stopper.mu and Replica.mu release-21.1: stop: break deadlock between Stopper.mu and Replica.mu Apr 20, 2021
Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I re-triggered CI for you. Thanks for the fix.

@irfansharif irfansharif merged commit 0eb5ba6 into cockroachdb:release-21.1 Apr 20, 2021
@irfansharif irfansharif deleted the backport21.1-63867 branch April 20, 2021 14:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants