-
Notifications
You must be signed in to change notification settings - Fork 659
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Capture exception if image is missing on run #4621
Conversation
313e68f
to
6189105
Compare
docker_container = await self.sys_run_in_executor( | ||
self.sys_docker.run, self.image, **kwargs | ||
) | ||
except DockerNotFound as err: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not do that, but the rest of the PR looks good
supervisor/docker/interface.py
Outdated
# If image is missing, capture the exception as this shouldn't happen | ||
# Try to keep things working for user by pulling image and retrying once | ||
capture_exception(err) | ||
await self.install(self.version) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pvizeli what do you mean by "Let's not do that" exactly?
IMHO adding the sentry capture is fine here, but maybe remove the automatic install for now, until we have a good understand in which cases the image can be missing and they warrant adding it automatically?
await self.install(self.version) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, that is what I mean
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get it. Why would we want to show users this error?
If we never capture any sentry events from this point in the code then we can remove this code path. But if it ever goes down this code path I wouldn't want users to be inconvenienced...
Isn't the rule for supervisor we should make decisions and fix issues for users whenever possible and only involve them if there's no alternative?
1217491
to
a66e207
Compare
Proposed change
Follow-up to #4619 . Although we believe #4610 should be fixed now, we don't have explicit confirmation that the race condition is the source. In addition there are edge cases where the image can go missing unexpectedly (failure/outtage in the middle of an
ha supervisor repair
, docker update clearing out images, user pruning images with docker cli, etc.)"{x} cannot be started because the image doesn't exist" is not an error we should ever really be showing users. There is nothing they can do about this except submit an issue here and be told to runha supervisor repair
. There is however, more supervisor can do to prevent them from ever seeing this error. Namely we can do exactly whatdocker run
does - if the image with the specified tag cannot be found locally, pull it.This PR adds a retry step to therun
for all docker objects to handleImageNotFound
. Since this should not happen it first reports the error to sentry so we know there is an issue to address. But it tries to handle this issue for users and only let them know if we also fail to pull the image at that time.EDIT: No longer re-pulling and retrying on run. For now we are just capturing the exception and reporting it to sentry. Although the fixup for image missing is still implemented and marked autofix. So while users will see the error, it will at least try to fix itself afterwards.
Type of change
Additional information
Checklist
black --fast supervisor tests
)If API endpoints of add-on configuration are added/changed: