Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ CAPD: set container restartPolicy to unless-stopped #5021

Merged
merged 1 commit into from
Jul 28, 2021
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 6 additions & 5 deletions test/infrastructure/container/docker.go
Original file line number Diff line number Diff line change
Expand Up @@ -383,11 +383,12 @@ func (d *docker) RunContainer(ctx context.Context, runConfig *RunContainerInput,
// privileges, but this flag also changes some mounts that are necessary
// including some ones docker would otherwise do by default.
// for now this is what we want. in the future we may revisit this.
Privileged: true,
SecurityOpt: []string{"seccomp=unconfined"}, // ignore seccomp
NetworkMode: dockercontainer.NetworkMode(runConfig.Network),
Tmpfs: runConfig.Tmpfs,
PortBindings: nat.PortMap{},
Privileged: true,
SecurityOpt: []string{"seccomp=unconfined"}, // ignore seccomp
NetworkMode: dockercontainer.NetworkMode(runConfig.Network),
Tmpfs: runConfig.Tmpfs,
PortBindings: nat.PortMap{},
RestartPolicy: dockercontainer.RestartPolicy{Name: "unless-stopped"},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: what is the CAPD behaviour when a container is stopped? Is it going to fail (and possibly surface and error) or is it going to recreate a new container?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may depend on when and how the stop is done.

If the entire service is stopped, then everything goes down and is happy when it starts back up.

Testing locally once everything is in a steady state, it looks like things are still happy with just stopping the MachineDeployment containers while leaving others running. I think this may be because we don't actually look at the Status of the container if it's been found, but we are setting the actual status here:

for _, cntr := range containers {
name := cntr.Name
cluster := clusterLabelKey
image := cntr.Image
status := cntr.Status
visit(cluster, types.NewNode(name, image, "undetermined").WithStatus(status))

Stopping the controlplane container does result in these errors in the logs:

capi-kubeadm… │ E0727 19:51:03.476496       8 reflector.go:138] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1.ClusterRole: failed to list *v1.ClusterRole: context deadline exceeded

But again we just keep going.

I would imagine if a container is stopped at some point during initial deployment that it could cause an issue.

It's a really good question that we should probably think about. We may want to try to reconcile state if we find one of the containers has an "exited" status. Let me know if you think I should file an issue for that. I think it's a separate concern as it's a behavior we have today even without this change.

So at least with this, if someone has not manually stopped a container, a restart will get it all back. The other two restartPolicy options are always and on-failure. on-failure would only restart if the container exited with a non-0 exit code, so that would not change the behavior if someone were to manually stop the container. And always only would restart the container on Docker Engine restart if it was manually stopped.

}
networkConfig := network.NetworkingConfig{}

Expand Down