Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bugfix] userspace convertor: Fix race condition on error #186

Merged
merged 1 commit into from
Apr 13, 2023

Conversation

estebanreyl
Copy link
Member

I recently discovered that there is a race condition on the user space convertor (during build) when an error occurs which can lead to a Panic. To Summarize:

  1. All the goroutines are scheduled
  2. Error occurs on one of the subroutines (To repro for example I returned an error 30% of the time in the Download function)
  3. The error is sent into the retErr channel.
  4. The retErr goroutine receives the retErr and cancels the context.
  5. The upload goroutines which are all waiting for either ctx.Done() or their respective conversion channel all end.
  6. The main build goroutine moves past uploaded.Wait() which results in ending the function as an error was returned. In turn this calls the deferred close on errCh, note that that at this point not all goroutines are guaranteed to be done.
  7. A goroutine for example one of the download ones reaches an error state (because context cancelled most likely)
  8. This goroutine will now try to sendToChannel errCh or check for ctx.Done and since selects have no order guarantee in Golang however either can happen, if its the send on errCh this will result in a panic as the channel is already closed.

To fix it one way was to add more values to the uploaded waitgroup and make sure all goroutines finish but instead I opted to use sync/errgroup https://pkg.go.dev/golang.org/x/sync/errgroup which seems built for this exact kind of scenario.

Any feedback is appreciated.

Copy link
Contributor

@WaberZhuang WaberZhuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks a lot for this bugfix! sync/errgroup looks quite suitable for this scenario.

Copy link
Member

@liulanzheng liulanzheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@liulanzheng liulanzheng merged commit 41d63a9 into containerd:main Apr 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants