-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
restart=always not properly working with podman stop #18259
Comments
The weird thing about this one, IIRC (from memory; I haven't looked at all log files), is that if it flakes once, it will also fail on all retries. Any explanation for that? |
The problem seems to be caused by play kube defaulting to restart policy always. Thus the short lived container is consistently restarted. If the container is started while the infra is stopped this error happens. |
Ohhhh.... I remember something like that. Could my fix in #18169 be used as inspiration for fixing this? By adding a magic restart-no string to the yaml? |
I could but then again the above command is simple reproducer with (for me) a valid yaml. The fact that restart always is the default seems odd to me but OK I do not care about k8s defaults. Regardless of this the above error shows clearly that there is a bug somewhere, we do not lock correctly. An explicit podman stop must always ignore the restart policy and stop the container until it is started by a user again. |
I too find the restart-forever policy troublesome, but am not qualified to comment on it. However: what does Maybe a better solution would be for |
Sure there ways to make it work for me in the tests, but the real problem is this a valid bug any user can run into. If |
Commit 1ab833f improved the situation but it is still not enough. If you run short lived containers with --restart=always podman is basically permanently restarting them. To only way to stop this is podman stop. However podman stop does not do anything when the container is already in a not running state. While this makes sense we should still mark the container as explicitly stopped by the user. Together with the change in shouldRestart() which now checks for StoppedByUser this makes sure the cleanup process is not going to start it back up again. A simple reproducer is: ``` podman run --restart=always --name test -d alpine true podman stop test ``` then check if the container is still running, the behavior is very flaky, it took me like 20 podman stop tries before I finally hit the correct window were it was stopped permanently. With this patch it worked on the first try. Fixes containers#18259 [NO NEW TESTS NEEDED] This is super flaky and hard to correctly test in CI. MY ginkgo v2 work seems to trigger this in play kube tests so that should catch at least some regressions. Also this may be something that should be tested at podman test days by users (containers#17912). Signed-off-by: Paul Holzinger <[email protected]>
Commit 1ab833f improved the situation but it is still not enough. If you run short lived containers with --restart=always podman is basically permanently restarting them. To only way to stop this is podman stop. However podman stop does not do anything when the container is already in a not running state. While this makes sense we should still mark the container as explicitly stopped by the user. Together with the change in shouldRestart() which now checks for StoppedByUser this makes sure the cleanup process is not going to start it back up again. A simple reproducer is: ``` podman run --restart=always --name test -d alpine true podman stop test ``` then check if the container is still running, the behavior is very flaky, it took me like 20 podman stop tries before I finally hit the correct window were it was stopped permanently. With this patch it worked on the first try. Fixes containers#18259 [NO NEW TESTS NEEDED] This is super flaky and hard to correctly test in CI. MY ginkgo v2 work seems to trigger this in play kube tests so that should catch at least some regressions. Also this may be something that should be tested at podman test days by users (containers#17912). Signed-off-by: Paul Holzinger <[email protected]>
Issue Description
Another bug found in my ginkgov2 work.
podman kube play
seems to configure the pod/container in a different way which causespodman stop --all
to fail. It is a race condition I see quite a lot in the CI logs, however I can reproduce locally in about 1 out of 3 tries.Steps to reproduce the issue
Steps to reproduce the issue
bin/podman kube play t.yaml && bin/podman stop --all && bin/podman rm -fa
Note this is a flake so you may need to run into several times until it fails.
Describe the results you received
podman stop fails and exits with 125
Describe the results you expected
Podman stop should work.
podman info output
latest main branch
Podman in a container
No
Privileged Or Rootless
None
Upstream Latest Release
Yes
Additional environment details
No response
Additional information
CI log: https://api.cirrus-ci.com/v1/artifact/task/5261073197039616/html/int-podman-debian-12-root-host-boltdb.log.html#t--podman-generate-kube-privileged-container--1
search for
cannot get namespace path unless container
Interestingly enough I was not able to reproduce with pod create and run, CI logs also seems to only show it with play kube.
The text was updated successfully, but these errors were encountered: