Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

volumewatcher: prevent panic on nil volume #15101

Merged
merged 1 commit into from
Nov 1, 2022
Merged

Conversation

tgross
Copy link
Member

@tgross tgross commented Nov 1, 2022

Fixes #15095

If a GC claim is written and then volume is deleted before the volumewatcher enters its run loop, we panic on the nil-pointer access. Simply doing a nil-check at the top of the loop reveals a race condition around shutting down the loop just as a new update is coming in.

Have the parent volumeswatcher send an initial update on the channel before returning, so that we're still holding the lock. Update the watcher's Stop method to set the running state, which lets us avoid having a second context and makes stopping synchronous. This reduces the cases we have to handle in the run loop.

Updated the tests now that we'll safely return from the goroutine and stop the runner in a larger set of cases. Ran the tests with the -race detection flag and fixed up any problems found here as well, and tested this on my local democratic-csi cluster.

@tgross tgross added backport/1.2.x backport to 1.1.x release line backport/1.3.x backport to 1.3.x release line backport/1.4.x backport to 1.4.x release line type/bug theme/storage labels Nov 1, 2022
@tgross tgross added this to the 1.4.3 milestone Nov 1, 2022
@tgross tgross force-pushed the b-panic-volumewatcher branch from e87a3f3 to e32fbed Compare November 1, 2022 19:21
@tgross tgross marked this pull request as ready for review November 1, 2022 19:22
@tgross tgross requested review from shoenig and jrasell November 1, 2022 19:22
If a GC claim is written and then volume is deleted before the `volumewatcher`
enters its run loop, we panic on the nil-pointer access. Simply doing a
nil-check at the top of the loop reveals a race condition around shutting down
the loop just as a new update is coming in.

Have the parent `volumeswatcher` send an initial update on the channel before
returning, so that we're still holding the lock. Update the watcher's `Stop`
method to set the running state, which lets us avoid having a second context and
makes stopping synchronous. This reduces the cases we have to handle in the run
loop.

Updated the tests now that we'll safely return from the goroutine and stop the
runner in a larger set of cases. Ran the tests with the `-race` detection flag
and fixed up any problems found here as well.
@tgross tgross force-pushed the b-panic-volumewatcher branch from e32fbed to 3474c27 Compare November 1, 2022 20:32
Copy link
Member

@shoenig shoenig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@tgross tgross merged commit ffbae78 into main Nov 1, 2022
@tgross tgross deleted the b-panic-volumewatcher branch November 1, 2022 20:53
@tgross tgross mentioned this pull request Nov 1, 2022
tgross added a commit that referenced this pull request Nov 1, 2022
….2.x

volumewatcher: prevent panic on nil volume (#15101)

If a GC claim is written and then volume is deleted before the `volumewatcher`
enters its run loop, we panic on the nil-pointer access. Simply doing a
nil-check at the top of the loop reveals a race condition around shutting down
the loop just as a new update is coming in.

Have the parent `volumeswatcher` send an initial update on the channel before
returning, so that we're still holding the lock. Update the watcher's `Stop`
method to set the running state, which lets us avoid having a second context and
makes stopping synchronous. This reduces the cases we have to handle in the run
loop.

Updated the tests now that we'll safely return from the goroutine and stop the
runner in a larger set of cases. Ran the tests with the `-race` detection flag
and fixed up any problems found here as well.
tgross added a commit that referenced this pull request Nov 1, 2022
….2.x (#15104)

volumewatcher: prevent panic on nil volume (#15101)

If a GC claim is written and then volume is deleted before the `volumewatcher`
enters its run loop, we panic on the nil-pointer access. Simply doing a
nil-check at the top of the loop reveals a race condition around shutting down
the loop just as a new update is coming in.

Have the parent `volumeswatcher` send an initial update on the channel before
returning, so that we're still holding the lock. Update the watcher's `Stop`
method to set the running state, which lets us avoid having a second context and
makes stopping synchronous. This reduces the cases we have to handle in the run
loop.

Updated the tests now that we'll safely return from the goroutine and stop the
runner in a larger set of cases. Ran the tests with the `-race` detection flag
and fixed up any problems found here as well.

Co-authored-by: Tim Gross <[email protected]>
@github-actions
Copy link

github-actions bot commented Mar 2, 2023

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 2, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
backport/1.2.x backport to 1.1.x release line backport/1.3.x backport to 1.3.x release line backport/1.4.x backport to 1.4.x release line theme/storage type/bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

panic in volume watcher
2 participants