Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(servstate): don't hold both servicesLock and state lock at once #359

Merged
merged 2 commits into from
Feb 15, 2024

Conversation

benhoyt
Copy link
Contributor

@benhoyt benhoyt commented Feb 13, 2024

This avoids the 3-lock deadlock described in
#314. Other goroutines
may be holding the state lock and waiting for the services lock, so
it's problematic to acquire both locks at once. Break that part of the cycle.

We could do this inside serviceForStart/serviceForStop by manually
calling Unlock() sooner, but that's error-prone, so continue using defer,
but have the caller write the task log (which needs the state lock) after
the services lock is released.

This is in preference to the more invasive change in #356

This avoids the 3-lock deadlock described in
canonical#314. Other goroutines may be
holding the state lock and waiting for the services lock, so it's
problematic to acquire both locks at once. Break that part of the
cycle.

We could do this inside serviceForStart/serviceForStop by manually
calling Unlock() sooner, but that's error-prone, so continue using
defer, but have the caller write the task log (which needs the state
lock) after the services lock is released.
Copy link
Member

@hpidcock hpidcock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like a reasonable compromise, but I'd like to see something like #356 actually solve the problem. The nature of logging is that it can occur in a variety of places, and as you can see here, it is not the most convenient of things to pass around.

Copy link
Contributor

@flotter flotter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot find any issues with the proposal.

@benhoyt benhoyt mentioned this pull request Feb 14, 2024
This test consistently FAILs without the fix in this PR, but
consistently PASSes with the fix in this PR. The repro is basically
as per the instructions at
canonical#314 (comment)
Copy link
Member

@hpidcock hpidcock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TestDeadlock looks reasonable too. Tested locally.

@benhoyt benhoyt merged commit eab0015 into canonical:master Feb 15, 2024
15 checks passed
@benhoyt benhoyt deleted the better-task-logging branch February 15, 2024 01:47
benhoyt added a commit that referenced this pull request Feb 15, 2024
…359)

* fix(servstate): don't hold both servicesLock and state lock at once

This avoids the 3-lock deadlock described in
#314. Other goroutines may be
holding the state lock and waiting for the services lock, so it's
problematic to acquire both locks at once. Break that part of the
cycle.

We could do this inside serviceForStart/serviceForStop by manually
calling Unlock() sooner, but that's error-prone, so continue using
defer, but have the caller write the task log (which needs the state
lock) after the services lock is released.

* Add regression test for the deadlock issue

This test consistently FAILs without the fix in this PR, but
consistently PASSes with the fix in this PR. The repro is basically
as per the instructions at
#314 (comment)
benhoyt added a commit to benhoyt/pebble that referenced this pull request Feb 20, 2024
…anonical#359)

* fix(servstate): don't hold both servicesLock and state lock at once

This avoids the 3-lock deadlock described in
canonical#314. Other goroutines may be
holding the state lock and waiting for the services lock, so it's
problematic to acquire both locks at once. Break that part of the
cycle.

We could do this inside serviceForStart/serviceForStop by manually
calling Unlock() sooner, but that's error-prone, so continue using
defer, but have the caller write the task log (which needs the state
lock) after the services lock is released.

* Add regression test for the deadlock issue

This test consistently FAILs without the fix in this PR, but
consistently PASSes with the fix in this PR. The repro is basically
as per the instructions at
canonical#314 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants