restart=always not properly working with podman stop #18259

Luap99 · 2023-04-18T17:13:39Z

Issue Description

Another bug found in my ginkgov2 work. podman kube play seems to configure the pod/container in a different way which causes podman stop --all to fail. It is a race condition I see quite a lot in the CI logs, however I can reproduce locally in about 1 out of 3 tries.

Steps to reproduce the issue

create yaml file

$ cat > t.yaml <<EOF
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: "2023-04-18T16:58:12Z"
  labels:
    app: test
  name: test
spec:
  containers:
  - command:
    - ip
    - a
    image: docker.io/library/alpine:latest
    name: laughinghaibt
EOF

run bin/podman kube play t.yaml && bin/podman stop --all && bin/podman rm -fa
Note this is a flake so you may need to run into several times until it fails.

Describe the results you received

podman stop fails and exits with 125

Pod:
2cde8e2fc327ebc2444339244ead6b7bb31693c99e8d4f45cb8fc4711b239822
Container:
01552f3338e7a0f9d04b6519d213983dccceefdb4ab00a5708cda081173b0311

4bc7eecf0816790bec3675f2f68950f8a49b0430a855f7a5706209b927f249c1
Error: cannot get namespace path unless container 4bc7eecf0816790bec3675f2f68950f8a49b0430a855f7a5706209b927f249c1 is running: container is stopped

Describe the results you expected

Podman stop should work.

podman info output

latest main branch

Podman in a container

No

Privileged Or Rootless

None

Upstream Latest Release

Yes

Additional environment details

No response

Additional information

CI log: https://api.cirrus-ci.com/v1/artifact/task/5261073197039616/html/int-podman-debian-12-root-host-boltdb.log.html#t--podman-generate-kube-privileged-container--1
search for cannot get namespace path unless container

Interestingly enough I was not able to reproduce with pod create and run, CI logs also seems to only show it with play kube.

The text was updated successfully, but these errors were encountered:

edsantiago · 2023-04-18T17:18:56Z

The weird thing about this one, IIRC (from memory; I haven't looked at all log files), is that if it flakes once, it will also fail on all retries. Any explanation for that?

Luap99 · 2023-04-18T17:20:23Z

The problem seems to be caused by play kube defaulting to restart policy always. Thus the short lived container is consistently restarted. If the container is started while the infra is stopped this error happens.

edsantiago · 2023-04-18T17:27:58Z

Ohhhh.... I remember something like that. Could my fix in #18169 be used as inspiration for fixing this? By adding a magic restart-no string to the yaml?

Luap99 · 2023-04-18T17:33:35Z

I could but then again the above command is simple reproducer with (for me) a valid yaml. The fact that restart always is the default seems odd to me but OK I do not care about k8s defaults.

Regardless of this the above error shows clearly that there is a bug somewhere, we do not lock correctly. An explicit podman stop must always ignore the restart policy and stop the container until it is started by a user again.

edsantiago · 2023-04-18T20:37:12Z

I too find the restart-forever policy troublesome, but am not qualified to comment on it. However: what does podman stop -a even mean on a forever-spinning container? If it happens when the container is stopped, stop -a will not see the container. If it happens when the container is up, "stop" will presumably stop it, but the container will start right back up. (This is trivial to reproduce with podman run -d --restart=always ...).

Maybe a better solution would be for generate_kube_test.go:AfterEach() to do pod rm -f -a?

Luap99 · 2023-04-19T07:54:22Z

Sure there ways to make it work for me in the tests, but the real problem is this a valid bug any user can run into.

If podman stop is run on a container with restart always it must do that and ignore the restart. If it does not do that we have a big problem.

Commit 1ab833f improved the situation but it is still not enough. If you run short lived containers with --restart=always podman is basically permanently restarting them. To only way to stop this is podman stop. However podman stop does not do anything when the container is already in a not running state. While this makes sense we should still mark the container as explicitly stopped by the user. Together with the change in shouldRestart() which now checks for StoppedByUser this makes sure the cleanup process is not going to start it back up again. A simple reproducer is: ``` podman run --restart=always --name test -d alpine true podman stop test ``` then check if the container is still running, the behavior is very flaky, it took me like 20 podman stop tries before I finally hit the correct window were it was stopped permanently. With this patch it worked on the first try. Fixes containers#18259 [NO NEW TESTS NEEDED] This is super flaky and hard to correctly test in CI. MY ginkgo v2 work seems to trigger this in play kube tests so that should catch at least some regressions. Also this may be something that should be tested at podman test days by users (containers#17912). Signed-off-by: Paul Holzinger <[email protected]>

Luap99 added kind/bug Categorizes issue or PR as related to a bug. flakes Flakes from Continuous Integration labels Apr 18, 2023

Luap99 mentioned this issue Apr 18, 2023

update to ginkgo v2 #18163

Merged

Luap99 changed the title ~~pod started with kube play throws weird error on podman stop --all~~ restart=always not properly working with podman stop --all Apr 19, 2023

Luap99 changed the title ~~restart=always not properly working with podman stop --all~~ restart=always not properly working with podman stop Apr 19, 2023

Luap99 self-assigned this Apr 19, 2023

Luap99 mentioned this issue Apr 19, 2023

libpod: stop containers with --restart=always #18267

Merged

openshift-merge-robot closed this as completed in #18267 Apr 20, 2023

github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Aug 26, 2023

github-actions bot locked as resolved and limited conversation to collaborators Aug 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

restart=always not properly working with podman stop #18259

restart=always not properly working with podman stop #18259

Luap99 commented Apr 18, 2023 •

edited

Loading

edsantiago commented Apr 18, 2023

Luap99 commented Apr 18, 2023 •

edited

Loading

edsantiago commented Apr 18, 2023

Luap99 commented Apr 18, 2023

edsantiago commented Apr 18, 2023

Luap99 commented Apr 19, 2023

restart=always not properly working with podman stop #18259

restart=always not properly working with podman stop #18259

Comments

Luap99 commented Apr 18, 2023 • edited Loading

Issue Description

Steps to reproduce the issue

Describe the results you received

Describe the results you expected

podman info output

Podman in a container

Privileged Or Rootless

Upstream Latest Release

Additional environment details

Additional information

edsantiago commented Apr 18, 2023

Luap99 commented Apr 18, 2023 • edited Loading

edsantiago commented Apr 18, 2023

Luap99 commented Apr 18, 2023

edsantiago commented Apr 18, 2023

Luap99 commented Apr 19, 2023

Luap99 commented Apr 18, 2023 •

edited

Loading

Luap99 commented Apr 18, 2023 •

edited

Loading