Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Capture exception if image is missing on run #4621

Merged
merged 3 commits into from
Oct 17, 2023

Conversation

mdegat01
Copy link
Contributor

@mdegat01 mdegat01 commented Oct 11, 2023

Proposed change

Follow-up to #4619 . Although we believe #4610 should be fixed now, we don't have explicit confirmation that the race condition is the source. In addition there are edge cases where the image can go missing unexpectedly (failure/outtage in the middle of an ha supervisor repair, docker update clearing out images, user pruning images with docker cli, etc.)

"{x} cannot be started because the image doesn't exist" is not an error we should ever really be showing users. There is nothing they can do about this except submit an issue here and be told to run ha supervisor repair. There is however, more supervisor can do to prevent them from ever seeing this error. Namely we can do exactly what docker run does - if the image with the specified tag cannot be found locally, pull it.

This PR adds a retry step to the run for all docker objects to handle ImageNotFound. Since this should not happen it first reports the error to sentry so we know there is an issue to address. But it tries to handle this issue for users and only let them know if we also fail to pull the image at that time.

EDIT: No longer re-pulling and retrying on run. For now we are just capturing the exception and reporting it to sentry. Although the fixup for image missing is still implemented and marked autofix. So while users will see the error, it will at least try to fix itself afterwards.

Type of change

  • Dependency upgrade
  • Bugfix (non-breaking change which fixes an issue)
  • New feature (which adds functionality to the supervisor)
  • Breaking change (fix/feature causing existing functionality to break)
  • Code quality improvements to existing code or addition of tests

Additional information

  • This PR fixes or closes issue: fixes #
  • This PR is related to issue:
  • Link to documentation pull request:
  • Link to cli pull request:

Checklist

  • The code change is tested and works locally.
  • Local tests pass. Your PR cannot be merged unless tests pass
  • There is no commented out code in this PR.
  • I have followed the development checklist
  • The code has been formatted using Black (black --fast supervisor tests)
  • Tests have been added to verify that the new code works.

If API endpoints of add-on configuration are added/changed:

@mdegat01 mdegat01 added the refactor A code change that neither fixes a bug nor adds a feature label Oct 11, 2023
@mdegat01 mdegat01 requested a review from pvizeli October 11, 2023 18:44
@mdegat01 mdegat01 requested a review from agners October 11, 2023 18:45
@mdegat01 mdegat01 force-pushed the pull-image-on-run-if-missing branch from 313e68f to 6189105 Compare October 12, 2023 15:08
docker_container = await self.sys_run_in_executor(
self.sys_docker.run, self.image, **kwargs
)
except DockerNotFound as err:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not do that, but the rest of the PR looks good

# If image is missing, capture the exception as this shouldn't happen
# Try to keep things working for user by pulling image and retrying once
capture_exception(err)
await self.install(self.version)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pvizeli what do you mean by "Let's not do that" exactly?

IMHO adding the sentry capture is fine here, but maybe remove the automatic install for now, until we have a good understand in which cases the image can be missing and they warrant adding it automatically?

Suggested change
await self.install(self.version)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, that is what I mean

Copy link
Contributor Author

@mdegat01 mdegat01 Oct 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get it. Why would we want to show users this error?

If we never capture any sentry events from this point in the code then we can remove this code path. But if it ever goes down this code path I wouldn't want users to be inconvenienced...

Isn't the rule for supervisor we should make decisions and fix issues for users whenever possible and only involve them if there's no alternative?

@mdegat01 mdegat01 force-pushed the pull-image-on-run-if-missing branch from 1217491 to a66e207 Compare October 16, 2023 20:20
@mdegat01 mdegat01 changed the title Retry run if image missing and handle fixup Capture exception if image is missing on run Oct 16, 2023
@pvizeli pvizeli merged commit 77fd1b4 into main Oct 17, 2023
22 checks passed
@pvizeli pvizeli deleted the pull-image-on-run-if-missing branch October 17, 2023 11:55
@github-actions github-actions bot locked and limited conversation to collaborators Oct 19, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
cla-signed Hacktoberfest refactor A code change that neither fixes a bug nor adds a feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants