StopSignal SIGTERM failed to stop container in 10 seconds #20196

edsantiago · 2023-09-28T19:41:18Z

Weird new flake, where "new" means "since ExitCleanly() started checking stderr":

# podman [options] run --http-proxy=false -d quay.io/libpod/alpine:latest top
89a002f65e0abc37edb7dd74e7402aa5d3c8e44b348be9b8173b0b4b62c84e7e
# podman [options] stop --ignore foobar 89a002f65e0abc37edb7dd74e7402aa5d3c8e44b348be9b8173b0b4b62c84e7e
time="2023-09-28T10:44:05-05:00" level=warning msg="StopSignal SIGTERM failed to stop container lucid_dubinsky in 10 seconds, resorting to SIGKILL"

Happens almost exclusively in this one test. (The other one, the "pod something", is probably because entrypoint defaults to sh. Probably.).

This shouldn't happen, because top is quick to exit upon signal. And okay, maybe not always, maybe 2-3 seconds, but ten??

I've tried reproducing, with no luck:

$ while :;do cid=$(bin/podman --events-backend=file  run --http-proxy=false  -d quay.io/libpod/alpine:latest top);bin/podman stop --ignore foobar $cid;bin/podman rm $cid;done

Anything obvious I've missed?

fedora-37 : int podman fedora-37 rootless host sqlite
- 09-22 12:10 in Podman stop podman stop --ignore bogus container
fedora-38 : int podman fedora-38 root host sqlite
- 09-28 12:00 in Podman stop podman stop --ignore bogus container
fedora-39 : int podman fedora-39 root host sqlite
- 09-27 21:49 in Podman stop podman stop --ignore bogus container
fedora-39 : int podman fedora-39 rootless host boltdb
- 09-27 15:37 in Podman stop podman stop --ignore bogus container
fedora-39 : int podman fedora-39 rootless host sqlite
- 09-27 08:58 in Podman stop podman stop --ignore bogus container
fedora-39? : int podman fedora-39? root host boltdb
- 09-28 13:13 in Podman stop podman stop --ignore bogus container
rawhide : int podman rawhide root host sqlite
- 09-27 21:53 in Podman stop podman stop --ignore bogus container
- 09-27 15:38 in Podman prune podman system prune with running, exited pod and volume prune set true

Seen in: fedora-37/fedora-38/fedora-39/fedora-39?/rawhide root/rootless boltdb/sqlite

The text was updated successfully, but these errors were encountered:

rhatdan · 2023-09-29T12:51:20Z

Could it be that the system is so slow and overburdened that it is taking more then 10 seconds?

edsantiago · 2023-10-02T12:04:26Z

It could be, but by my count there are >20 other podman stop commands in e2e tests that do not have -t, and I would expect to see took-too-long errors in at least one of those. What I'm seeing is consistently in the --ignore test.

github-actions · 2023-11-02T00:06:29Z

A friendly reminder that this issue had no activity for 30 days.

edsantiago · 2023-11-02T00:52:00Z

fedora-37 : int podman fedora-37 rootless host sqlite
- 09-22-2023 12:10 in Podman stop podman stop --ignore bogus container
fedora-38 : int podman fedora-38 root host sqlite
- 10-31 17:22 in Podman stop podman stop --ignore bogus container
- 10-31 11:36 in Podman prune podman system prune pods
- 10-26 19:05 in Podman stop podman stop --ignore bogus container
- 10-26 19:05 in Podman stop podman stop container by id
- 10-25 18:39 in Podman stop podman stop --ignore bogus container
- 09-28-2023 12:00 in Podman stop podman stop --ignore bogus container
fedora-38 : int podman fedora-38 rootless host boltdb
- PR CI: test overlay and vfs #20161
  - 10-17 15:44 in Podman stop podman stop --ignore bogus container
  - 10-17 15:44 in Podman stop podman stop container by id
  - 10-17 15:44 in Podman stop podman stop single container by short id
fedora-38 : int podman fedora-38 rootless host sqlite
- 10-11 10:52 in Podman stop podman stop --ignore bogus container
fedora-39 : int podman fedora-39 root host sqlite
- 09-27-2023 21:49 in Podman stop podman stop --ignore bogus container
fedora-39 : int podman fedora-39 rootless host boltdb
- 09-27-2023 15:37 in Podman stop podman stop --ignore bogus container
fedora-39 : int podman fedora-39 rootless host sqlite
- 09-27-2023 08:58 in Podman stop podman stop --ignore bogus container
fedora-39β : int podman fedora-39β root host boltdb
- 10-25 08:38 in Podman stop podman stop --ignore bogus container
- 09-28-2023 13:13 in Podman stop podman stop --ignore bogus container
rawhide : int podman rawhide root host sqlite
- 09-27-2023 21:53 in Podman stop podman stop --ignore bogus container
- 09-27-2023 15:38 in Podman prune podman system prune with running, exited pod and volume prune set true
rawhide : int podman rawhide rootless host sqlite
- 10-18 23:04 in Podman stop podman stop --ignore bogus container

Seen in: int podman fedora-37+fedora-38+fedora-39+fedora-39β+rawhide root+rootless host boltdb+sqlite

edsantiago · 2023-11-28T20:51:49Z

Still happening

fedora-37 : int podman fedora-37 rootless host sqlite
- PR Ed's pet PR with no flake retries #17831
  - 09-22-2023 12:10 in Podman stop podman stop --ignore bogus container
fedora-38 : int podman fedora-38 root host boltdb
- PR Ed's pet PR with no flake retries #17831
  - 11-28 15:37 in Podman pod stop podman pod stop single pod by name
  - 11-06 11:53 in Podman stop podman stop container by id
  - 11-06 11:53 in Podman stop podman stop --ignore bogus container
fedora-38 : int podman fedora-38 root host sqlite
- PR Ed's pet PR with no flake retries #17831
  - 10-31 17:22 in Podman stop podman stop --ignore bogus container
  - 10-31-2023 11:36 in Podman prune podman system prune pods
  - 10-26-2023 19:05 in Podman stop podman stop --ignore bogus container
  - 10-26-2023 19:05 in Podman stop podman stop container by id
  - 10-25-2023 18:39 in Podman stop podman stop --ignore bogus container
  - 09-28-2023 12:00 in Podman stop podman stop --ignore bogus container
fedora-38 : int podman fedora-38 rootless host boltdb
- PR CI: test overlay and vfs #20161
  - 10-17-2023 15:44 in Podman stop podman stop --ignore bogus container
  - 10-17-2023 15:44 in Podman stop podman stop container by id
  - 10-17-2023 15:44 in Podman stop podman stop single container by short id
- PR Ed's pet PR with no flake retries #17831
  - 11-06 17:53 in Podman stop podman stop --ignore bogus container
  - 11-03 22:07 in Podman prune podman system prune with running, exited pod and volume prune set true
fedora-38 : int podman fedora-38 rootless host sqlite
- PR Ed's pet PR with no flake retries #17831
  - 10-11-2023 10:52 in Podman stop podman stop --ignore bogus container
fedora-39 : int podman fedora-39 root host sqlite
- PR Ed's pet PR with no flake retries #17831
  - 09-27-2023 21:49 in Podman stop podman stop --ignore bogus container
fedora-39 : int podman fedora-39 rootless host boltdb
- PR Ed's pet PR with no flake retries #17831
  - 09-27-2023 15:37 in Podman stop podman stop --ignore bogus container
fedora-39 : int podman fedora-39 rootless host sqlite
- PR Ed's pet PR with no flake retries #17831
  - 09-27-2023 08:58 in Podman stop podman stop --ignore bogus container
fedora-39β : int podman fedora-39β root host boltdb
- PR Ed's pet PR with no flake retries #17831
  - 10-25-2023 08:38 in Podman stop podman stop --ignore bogus container
  - 09-28-2023 13:13 in Podman stop podman stop --ignore bogus container
rawhide : int podman rawhide root host sqlite
- PR Ed's pet PR with no flake retries #17831
  - 11-06 07:35 in Podman stop podman stop --ignore bogus container
  - 09-27-2023 21:53 in Podman stop podman stop --ignore bogus container
  - 09-27-2023 15:38 in Podman prune podman system prune with running, exited pod and volume prune set true
rawhide : int podman rawhide rootless host sqlite
- PR Ed's pet PR with no flake retries #17831
  - 10-18-2023 23:04 in Podman stop podman stop --ignore bogus container

Seen in: int podman fedora-37+fedora-38+fedora-39+fedora-39β+rawhide root+rootless host boltdb+sqlite

Luap99 · 2023-12-06T10:26:09Z

[+1003s] not ok 298 [160] podman volume rm --force
         # (from function `bail-now' in file test/system/[helpers.bash, line 227](https://github.com/containers/podman/blob/3367fd6095bd2db5c0e60e72abeeae71f43b4e8f/test/system/helpers.bash#L227),
         #  from function `die' in file test/system/[helpers.bash, line 790](https://github.com/containers/podman/blob/3367fd6095bd2db5c0e60e72abeeae71f43b4e8f/test/system/helpers.bash#L790),
         #  from function `run_podman' in file test/system/[helpers.bash, line 435](https://github.com/containers/podman/blob/3367fd6095bd2db5c0e60e72abeeae71f43b4e8f/test/system/helpers.bash#L435),
         #  in test file test/system/[160-volumes.bats, line 158](https://github.com/containers/podman/blob/3367fd6095bd2db5c0e60e72abeeae71f43b4e8f/test/system/160-volumes.bats#L158))
         #   `run_podman volume rm myvol --force' failed
         #
<+     > # # podman rm -t 0 --all --force --ignore
         #
<+051ms> # # podman ps --all --external --format {{.ID}} {{.Names}}
         #
<+052ms> # # podman images --all --format {{.Repository}}:{{.Tag}} {{.ID}}
<+046ms> # quay.io/libpod/testimage:20221018 f5a99120db64
         #
<+026ms> # # podman volume rm -a
         #
<+050ms> # # podman run -d --volume myvol:/myvol quay.io/libpod/testimage:20221018 top
<+376ms> # 7e8d417666f6f695e575285be058534f41ec6f0f3858fdda7cd632da9bda66fe
         #
<+011ms> # # podman volume rm myvol
<+045ms> # Error: volume myvol is being used by the following container(s): 7e8d417666f6f695e575285be058534f41ec6f0f3858fdda7cd632da9bda66fe: volume is being used
<+005ms> # [ rc=2 (expected) ]
         #
<+014ms> # # podman volume rm myvol --force
<+0010s> # time="2023-12-05T11:50:40-06:00" level=warning msg="StopSignal SIGTERM failed to stop container frosty_hopper in 10 seconds, resorting to SIGKILL"
         # myvol
         # #/vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
         # #| FAIL: Command succeeded, but issued unexpected warnings
         # #\^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
         #
<+011ms> # # podman rm -a --volumes

Also seen is sys tests: https://api.cirrus-ci.com/v1/artifact/task/6106357493923840/html/sys-podman-fedora-38-root-host-boltdb.log.html

edsantiago · 2023-12-06T12:21:14Z

Thanks. I have one other system test failure, on November 22. Here is the catalog so far:

Seen in: int(29)+sys(1) podman(30) fedora-37(1)+fedora-38(20)+fedora-39(3)+fedora-39β(2)+rawhide(4) root(19)+rootless(11) host(30) boltdb(16)+sqlite(14)

That is: never (yet) seen in remote nor in containerized; no difference between boltdb/sqlite; and, mostly VFS but not all.

Luap99 · 2023-12-06T12:27:01Z

Keep in mind the logrus errors/warnings are on the server side (unless they are logged on the client which most of them aren't) so it makes sense that you do not see these in remote CI logs.
Maybe we can teach CI to capture server side logs and error for unexpected stderr there as well. At least in e2e that should be possible as we have one server per test. System test is unlikely to work as the server is spawned outside of the testing setup.

edsantiago · 2023-12-06T12:38:00Z

Eek. Yes, e2e tests run one server per test, but most tests run a number of podman commands, some with ExitCleanly(), some with Exit(N). I could envision setting up a per-test tracker of those calls, checking it in teardown, and doing a warning check iff there are zero Exit() checks... but I'm not sure it's worth the effort or complexity? Worth discussing, though. Thanks for the idea.

Luap99 · 2023-12-06T13:08:50Z

I have idea what could be wrong, as pid 1 the program must register signal handlers for SIGTERM otherwise it will ignore it by default. This is what top is doing but because signal handlers are of course part of program and can only be installed after to has been started it could mean that podman stop was run before top was given enough time to install said handlers.
It is a very small race but given we have a bunch of weird CI flakes I could imagine that this might be the cause.

edsantiago · 2023-12-06T13:17:11Z

Good hypothesis. Oh how I hate podman run -d. Oh how I wish someone would take #18952 seriously.

Luap99 · 2023-12-06T13:40:47Z

A naive reproducer: podman run -d --name test alpine sh -c 'sleep 0.05; top' && podman stop test && podman rm test
Of course not the same thing but the sleep 0.05 alone is enough to make it flaky on my laptop.

I guess we need the same podman logs fix to make sure top printed already output before we run podman stop.

Luap99 · 2023-12-06T13:42:37Z

I guess we need the same podman logs fix to make sure top printed already output before we run podman stop.

Or in cases were we really do not care about the stop behaviour we could just create the container with --stop-signal KILL to always kill right away.

A number of tests start a container then immediately run podman stop. This frequently flakes with: StopSignal SIGTERM failed to stop [...] in 10 seconds, resorting to SIGKILL Likely reason: container is still initializing, and its process has not yet set up its signal handlers. Solution: if possible (containers running "top"), wait for "Mem:" to indicate that top is running. If not possible (pods / catatonit), sleep half a second. Intended to fix some of the flakes cataloged in containers#20196 but I'm leaving that open in case we see more. These are hard to identify just by looking in the code. Signed-off-by: Ed Santiago <[email protected]>

edsantiago · 2024-02-06T18:13:42Z

#21011 has had no effect. This is still one of the most prevalent flakes I'm seeing, and not just in my no-retry PR:

fedora-38 : int podman fedora-38 root host boltdb
- 02-06 08:25 in Podman run podman run --replace
- 02-05 22:49 in Podman prune podman system prune - with dangling images true
- 02-04 18:35 in Podman run podman run with restart policy does not restart on manual stop
- 01-31 11:45 in Podman events podman events with a type
- 01-17 13:38 in Podman events podman events with a type
- 01-15 09:54 in Podman stop podman stop --cidfile
fedora-38 : int podman fedora-38 rootless host boltdb
- 02-06 11:23 in Podman run podman run --replace
- 02-06 10:02 in Podman rm podman rm -fa with dependencies
- 02-06 10:02 in Podman run podman run with restart policy does not restart on manual stop
- 02-03 09:53 in Podman restart podman restart --cidfile
- 02-01 14:56 in Podman run podman run --replace
- 01-28 16:05 in Podman restart podman restart --cidfile
- 01-16 19:11 in Podman stop podman stop --cidfile
fedora-38 : sys podman fedora-38 root host boltdb
- PR [v4.9] Bump to c/common v0.57.4, buildkit v0.12.5, c/buidah v1.33.5 #21485
  - 02-02 13:07 in [sys] [050] podman stop print IDs or raw input
- PR reduce binary size for programs using the bindings #21364
  - 01-25 12:32 in [sys] [160] podman volume rm --force
- PR [v4.9] Bump Buildah to v1.33.3, c/common to v0.57.2, c/image to v5.29.1 #21302
  - 01-18 18:13 in [sys] [050] podman stop print IDs or raw input
- PR [v4.8] systests: cp: add wait_for_ready #21220
  - 01-10 09:57 in [sys] [050] podman stop print IDs or raw input

x	x	x	x	x	x
int(13)	podman(17)	fedora-38(17)	root(10)	host(17)	boltdb(17)
sys(4)			rootless(7)

Seems interesting that it's only f38.

I will start writing PRs to run stop -t0.

Luap99 · 2024-02-06T19:00:44Z

I wouldn't say #21011 has had no effect, I have not looked at all logs but the ones I have all are suspect the the mentioned race so the fix was juts not complete. If anything is to believe I would argue the fixed worked as none of the old test names are mentioned in you new comment.

mheon · 2024-02-14T20:21:56Z

Looking to expand/generalize the solution here - I think most of these are fairly obvious fixes, but podman run --replace is not. We're not directly invoking podman stop there, it's getting invoked from within podman run as part of removing the old container. I don't think the logic for doing that is incorrect, we want to obey the usual stop timeout, but that means we have a race we can't close. Maybe we should ignore that warning message just for run --replace?

edsantiago · 2024-02-14T20:26:16Z

You mean, ignore in tests? Yes, that's actually my plan. I have a PR in the works to clean up more of these flakes, but (sigh) other more urgent issues keep coming up. Maybe I'll just submit what I have now as a stepping stone.

edsantiago · 2024-02-14T20:28:27Z

#21661

Continuing to see CI failures of the form "StopSignal SIGTERM failed to stop container in 10 seconds". Work around those, either by adding "-t0" to podman stop, or by using Expect(Exit(0)) instead of ExitCleanly(). Addresses, but does not close, containers#20196 Signed-off-by: Ed Santiago <[email protected]>

bonjour-py · 2024-03-08T20:11:33Z

i have same thing here. podman version is 4.3.1
it seems the program can not receive any StopSignal .

rhatdan · 2024-03-08T20:16:15Z

This just means your script is ignoring the SIGTERM signal.

chris42 · 2024-04-17T10:54:37Z

I am not sure, if this helps and is correct here: I just had a similar issue, with a container "ignoring" SIGTERM. However when checking, the developer figured, that without giving an explicit --stop-signal=SIGTERM, podman assigned "37" as stop signal to the container. Which did nothing for the included scripts.
You find the issue here (in German): jens-maus/RaspberryMatic#2717

Hence I would advice to check which stop signal actually is set in the container, before trying to amend the running processes. I had this issue on Podman 4.9.3 in a rootful container. My rootless ones are all getting "15" per default correctly, no clue where the 37 comes from.

edsantiago added the flakes Flakes from Continuous Integration label Sep 28, 2023

github-actions bot added the stale-issue label Nov 2, 2023

edsantiago removed the stale-issue label Nov 2, 2023

edsantiago changed the title ~~podman stop --ignore: failed to stop container in 10 seconds~~ StopSignal SIGTERM failed to stop container in 10 seconds Nov 28, 2023

edsantiago mentioned this issue Dec 13, 2023

CI: safer podman-stop tests #21011

Merged

edsantiago mentioned this issue Feb 14, 2024

More test tweaks to avoid "StopSignal ... 10 seconds" warning #21661

Merged

mheon mentioned this issue Feb 15, 2024

Add a helper for stopping pods and containers in E2E #21671

Merged

Luap99 closed this as completed Jun 15, 2024

Luap99 mentioned this issue Jul 19, 2024

[v4.9-rhel] CVE-2024-6104 & CVE-2024-37298 fixes #23312

Merged

stale-locking-app bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 14, 2024

stale-locking-app bot locked as resolved and limited conversation to collaborators Sep 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

StopSignal SIGTERM failed to stop container in 10 seconds #20196

StopSignal SIGTERM failed to stop container in 10 seconds #20196

edsantiago commented Sep 28, 2023

rhatdan commented Sep 29, 2023

edsantiago commented Oct 2, 2023

github-actions bot commented Nov 2, 2023

edsantiago commented Nov 2, 2023 •

edited

Loading

edsantiago commented Nov 28, 2023

Luap99 commented Dec 6, 2023

edsantiago commented Dec 6, 2023

Luap99 commented Dec 6, 2023

edsantiago commented Dec 6, 2023

Luap99 commented Dec 6, 2023

edsantiago commented Dec 6, 2023

Luap99 commented Dec 6, 2023

Luap99 commented Dec 6, 2023

edsantiago commented Feb 6, 2024

Luap99 commented Feb 6, 2024

mheon commented Feb 14, 2024

edsantiago commented Feb 14, 2024

edsantiago commented Feb 14, 2024

bonjour-py commented Mar 8, 2024

rhatdan commented Mar 8, 2024

chris42 commented Apr 17, 2024

StopSignal SIGTERM failed to stop container in 10 seconds #20196

StopSignal SIGTERM failed to stop container in 10 seconds #20196

Comments

edsantiago commented Sep 28, 2023

rhatdan commented Sep 29, 2023

edsantiago commented Oct 2, 2023

github-actions bot commented Nov 2, 2023

edsantiago commented Nov 2, 2023 • edited Loading

edsantiago commented Nov 28, 2023

Luap99 commented Dec 6, 2023

edsantiago commented Dec 6, 2023

Luap99 commented Dec 6, 2023

edsantiago commented Dec 6, 2023

Luap99 commented Dec 6, 2023

edsantiago commented Dec 6, 2023

Luap99 commented Dec 6, 2023

Luap99 commented Dec 6, 2023

edsantiago commented Feb 6, 2024

Luap99 commented Feb 6, 2024

mheon commented Feb 14, 2024

edsantiago commented Feb 14, 2024

edsantiago commented Feb 14, 2024

bonjour-py commented Mar 8, 2024

rhatdan commented Mar 8, 2024

chris42 commented Apr 17, 2024

edsantiago commented Nov 2, 2023 •

edited

Loading