-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
✨ CAPD: set container restartPolicy to unless-stopped #5021
Conversation
If the docker engine is restarted, either through a service restart or a full system reboot, the containers created by CAPD are left in a stopped state. This adds the restartPolicy setting to the Docker container creation so that containers will be automatically restarted on service restart unless the user has explicitly stopped the container prior to the restart. Signed-off-by: Sean McGinnis <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stmcginnis, thanks for this PR, getting CAPD cluster to survive restarts is super nice!
NetworkMode: dockercontainer.NetworkMode(runConfig.Network), | ||
Tmpfs: runConfig.Tmpfs, | ||
PortBindings: nat.PortMap{}, | ||
RestartPolicy: dockercontainer.RestartPolicy{Name: "unless-stopped"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: what is the CAPD behaviour when a container is stopped? Is it going to fail (and possibly surface and error) or is it going to recreate a new container?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may depend on when and how the stop is done.
If the entire service is stopped, then everything goes down and is happy when it starts back up.
Testing locally once everything is in a steady state, it looks like things are still happy with just stopping the MachineDeployment containers while leaving others running. I think this may be because we don't actually look at the Status of the container if it's been found, but we are setting the actual status here:
cluster-api/test/infrastructure/docker/docker/util.go
Lines 98 to 103 in c90f376
for _, cntr := range containers { | |
name := cntr.Name | |
cluster := clusterLabelKey | |
image := cntr.Image | |
status := cntr.Status | |
visit(cluster, types.NewNode(name, image, "undetermined").WithStatus(status)) |
Stopping the controlplane container does result in these errors in the logs:
capi-kubeadm… │ E0727 19:51:03.476496 8 reflector.go:138] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1.ClusterRole: failed to list *v1.ClusterRole: context deadline exceeded
But again we just keep going.
I would imagine if a container is stopped at some point during initial deployment that it could cause an issue.
It's a really good question that we should probably think about. We may want to try to reconcile state if we find one of the containers has an "exited" status. Let me know if you think I should file an issue for that. I think it's a separate concern as it's a behavior we have today even without this change.
So at least with this, if someone has not manually stopped a container, a restart will get it all back. The other two restartPolicy options are always
and on-failure
. on-failure
would only restart if the container exited with a non-0 exit code, so that would not change the behavior if someone were to manually stop the container. And always
only would restart the container on Docker Engine restart if it was manually stopped.
Thanks for the detailed explanation Let's file an issue requiring to properly surface "a container already exists, but it is stopped"; I don't think we should automatically remediate, but let's make sure the user is not required to look at the controller logs to find out what is going on. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: fabriziopandini The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What this PR does / why we need it:
If the docker engine is restarted, either through a service restart or a
full system reboot, the containers created by CAPD are left in a stopped
state.
This adds the restartPolicy setting to the Docker container creation so
that containers will be automatically restarted on service restart
unless the user has explicitly stopped the container prior to the
restart.
Which issue(s) this PR fixes:
Fixes #5020