Container Timeout #6412

npmccallum · 2020-05-27T21:34:42Z

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind feature

Description

I want to use podman run --rm ... to run a container that is removed on exit. I also want the container to be forcibly killed if it is still running after n seconds.

Yes, I can write a podman state manager to call podman stop. But that requires me to do a lot of work. It is also racy and bug prone.

I could use the timeout command (from coreutils) to send a signal. But podman has only two modes for signal handling. The default mode forwards the signal to pid 1 in the container. However, pid 1 could just ignore the signal to bypass the timeout. If I use --sig-proxy=false then podman doesn't forward the signal to the container. But it also doesn't stop the container. Therefore, the container bypasses the timeout.

I tried looking at the --timeout option for conmon, but that doesn't do what we need.

I see two ways forward.

Add support for --sig-proxy=stop. This mode would not proxy the signal to the container and would instead terminate the container (and, implicitly, do the --rm). Then timeout support could be implemented using the timeout utility from coreutils.
Add a new option for --timeout=n which would cause podman run --rm --timeout=30 to forcibly shut down the container and remove it after 30 seconds. I think I would prefer this option since it doesn't require another process.

The text was updated successfully, but these errors were encountered:

rhatdan · 2020-05-28T09:50:58Z

Would the container get the SIGSTOP (Actually --stop-signal value) signal or SIGKILL?
Most likely I just answered my own question. Since if you want SIGKILL you can just add --stop-signal=kill

If I restart the container, does it run for another 30 seconds?

rhatdan · 2020-05-28T09:51:10Z

@mheon @vrothberg @baude WDYT?

rhatdan · 2020-05-28T09:52:14Z

We would need to wire this into conmon. since it is the only thing left running when you run with --detatch.
@haircommander FYI

vrothberg · 2020-05-28T11:50:16Z

This sounds completely reasonable to me.
Naming nit: to prevent confusion, I suggest to name it --rm-timeout.

mheon · 2020-05-28T13:19:45Z

My concern would be the complexity of stopping the container exclusively from within Conmon. I don't really want to implement the full logic of podman stop again in C (SIGTERM, timeout, SIGKILL, timeout, plus handling for containers without a PID namespace through runtime kill --all). If we can keep it simple (single SIGKILL, or a SIGTERM + simple fixed timeout + SIGKILL, and no containers without a PID namespace) it sounds a lot more reasonable.

haircommander · 2020-05-28T13:23:05Z

theoretically, we could also pass conmon a list of args for a podman call, like we do for exit command

haircommander · 2020-05-28T13:27:23Z

another thing to note is that conmon doesn't technically know when a container starts. It knows when the container starts logging things, and starts behaving like it's started, but this is not precise. We'd have to have podman send data down a pipe to tell conmon "hey, I started the container", and then we'd start the timeout.

rhatdan · 2020-05-28T13:50:43Z

I don't see this as an --rm-timeout. I don't think --rm is required.
If I want to run a container for one hour then I could do
podman create --timeout=360 ...
podman start ...
podman start ...

rhatdan · 2020-05-28T13:51:51Z

Doesn't conmon get the contents of the --stop-signal?

npmccallum · 2020-05-28T14:01:06Z

Would the container get the SIGSTOP (Actually --stop-signal value) signal or SIGKILL?
Most likely I just answered my own question. Since if you want SIGKILL you can just add --stop-signal=kill

$ podman run --stop-signal=kill --rm -it fedora
# trap '' TERM
#

kill -TERM $PODMAN_PID

Podman forwards the SIGTERM to PID 1. PID 1 swallows the SIGTERM. If I send SIGKILL to podman, the --rm is never performed and the container is still running.

It is therefore my understanding that --stop-signal=kill only changes the signal sent to PID 1 during podman stop.

If I restart the container, does it run for another 30 seconds?

This is a singleton container running untrusted code. We never want to restart it. That would allow the untrusted code to persist across the timeout.

rhatdan · 2020-06-09T20:02:28Z

@vrothberg Could you take this one on?

vrothberg · 2020-06-10T15:00:33Z

@vrothberg Could you take this one on?

This looks like larger chunk of work as it spans across libpod and conmon. I think that I should rather work on the parallel-copy detection over in c/image. WDYT?

rhatdan · 2020-06-10T21:10:54Z

Sounds good, we can give this to @ashley-cui @QiWang19 @ParkerVR or @ryanchpowell Or anyone else who wants to grab it.

github-actions · 2020-07-23T00:11:57Z

A friendly reminder that this issue had no activity for 30 days.

rhatdan · 2020-07-23T11:56:24Z

@QiWang19 PTAL

github-actions · 2020-08-23T00:14:45Z

A friendly reminder that this issue had no activity for 30 days.

rhatdan · 2020-08-24T14:30:04Z

@QiWang19 Did you get a chance to look at this?

QiWang19 · 2020-08-24T14:39:55Z

@QiWang19 Did you get a chance to look at this?

I haven't started working on this now but can add it to my list

github-actions · 2020-10-11T00:19:36Z

A friendly reminder that this issue had no activity for 30 days.

kblin · 2021-01-12T19:38:22Z

Hi folks,

I'd also love to have this feature for a system running long-running number-crunchy jobs.
For my use case, the number-crunchy jobs report status via stdout, and I have a process that forks and executes podman run --detach=false ... and then consumes the output from the container to forward the status to a database. I can of course set a timer and have that trigger podman kill after the timeout expires, but having a timeout built in would be so much nicer.

rhatdan · 2021-01-13T13:00:59Z

Interested in opening a PR for this feature?

kblin · 2021-01-13T13:26:07Z

I'm not sure I understand the architecture good enough to know where to start. Does this go into podman? Conmon?

rhatdan · 2021-01-13T16:39:12Z

Both. You would need a way to trigger the command within podman. Basically add a --timeout flag (and maybe --timeout-signal), that conmon would know to kill the container.
Then you would need an option in podman to activate it. podman run --timeout 20m ...

This would cause conmon to send run with --timeout 20m And after 20 minutes, conmon would send a stop signal to pid1, and 10 seconds later send the kill signal.

kblin · 2021-01-14T12:08:15Z

After staring at the code for a couple of hours, I'm still not quite sure where I'd pass a new --timeout flag added to podman run, not to mention that I can't figure out why my

flags.UintVar(&runOpts.Timeout, "timeout", 0, "Stop the container after [timeout] seconds, or 0 to not time out the container")

ends up generating a --time and not a --timeout parameter after running make.

I also still have no idea where in conmon I'd send the stop signal to the container's pid 1.

That's about all the time I had to spend on something I can script with a timed callback to run podman kill on my side. If this is a "good first issue", I don't think I want to see the other ones. 😉

rhatdan · 2021-01-14T14:13:07Z

Some first issues are meatier then others, thanks for trying.

kblin · 2021-01-14T18:12:21Z

If nobody picked it up by the next time I have a day or two to spare I might give it another shot. 🙂

rhatdan · 2021-01-14T19:55:32Z

Sounds good.

github-actions · 2021-02-14T00:16:45Z

A friendly reminder that this issue had no activity for 30 days.

rhatdan · 2021-02-15T15:42:59Z

@kblin Did you ever get a chance to look at this?

kblin · 2021-02-15T16:59:35Z

I didn't have the time to spare yet.

github-actions · 2021-03-18T00:17:32Z

A friendly reminder that this issue had no activity for 30 days.

github-actions · 2021-04-22T00:07:59Z

A friendly reminder that this issue had no activity for 30 days.

This option allows users to specify the maximum amount of time to run before conmon sends the kill signal to the container. Fixes: containers#6412 Signed-off-by: Daniel J Walsh <[email protected]>

openshift-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label May 27, 2020

rhatdan assigned vrothberg Jun 9, 2020

vrothberg removed their assignment Jun 22, 2020

github-actions bot added the stale-issue label Jul 23, 2020

rhatdan assigned QiWang19 Jul 23, 2020

rhatdan removed the stale-issue label Jul 23, 2020

github-actions bot added the stale-issue label Aug 23, 2020

vrothberg removed the stale-issue label Sep 10, 2020

github-actions bot added the stale-issue label Oct 11, 2020

vrothberg removed the stale-issue label Dec 18, 2020

rhatdan unassigned QiWang19 Jan 13, 2021

rhatdan added the Good First Issue This issue would be a good issue for a first time contributor to undertake. label Jan 13, 2021

github-actions bot added the stale-issue label Feb 14, 2021

rhatdan removed the stale-issue label Feb 15, 2021

github-actions bot added the stale-issue label Mar 18, 2021

rhatdan removed the stale-issue label Mar 22, 2021

github-actions bot added the stale-issue label Apr 22, 2021

rhatdan removed the stale-issue label Apr 22, 2021

rhatdan mentioned this issue Apr 22, 2021

Add podman run --timeout option #10119

Merged

openshift-merge-robot closed this as completed in #10119 Apr 27, 2021

github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 22, 2023

github-actions bot locked as resolved and limited conversation to collaborators Sep 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Container Timeout #6412

Container Timeout #6412

npmccallum commented May 27, 2020

rhatdan commented May 28, 2020

rhatdan commented May 28, 2020

rhatdan commented May 28, 2020

vrothberg commented May 28, 2020

mheon commented May 28, 2020

haircommander commented May 28, 2020

haircommander commented May 28, 2020

rhatdan commented May 28, 2020

rhatdan commented May 28, 2020

npmccallum commented May 28, 2020

rhatdan commented Jun 9, 2020

vrothberg commented Jun 10, 2020

rhatdan commented Jun 10, 2020

github-actions bot commented Jul 23, 2020

rhatdan commented Jul 23, 2020

github-actions bot commented Aug 23, 2020

rhatdan commented Aug 24, 2020

QiWang19 commented Aug 24, 2020

github-actions bot commented Oct 11, 2020

kblin commented Jan 12, 2021

rhatdan commented Jan 13, 2021

kblin commented Jan 13, 2021

rhatdan commented Jan 13, 2021

kblin commented Jan 14, 2021

rhatdan commented Jan 14, 2021

kblin commented Jan 14, 2021

rhatdan commented Jan 14, 2021

github-actions bot commented Feb 14, 2021

rhatdan commented Feb 15, 2021

kblin commented Feb 15, 2021

github-actions bot commented Mar 18, 2021

github-actions bot commented Apr 22, 2021

Container Timeout #6412

Container Timeout #6412

Comments

npmccallum commented May 27, 2020

rhatdan commented May 28, 2020

rhatdan commented May 28, 2020

rhatdan commented May 28, 2020

vrothberg commented May 28, 2020

mheon commented May 28, 2020

haircommander commented May 28, 2020

haircommander commented May 28, 2020

rhatdan commented May 28, 2020

rhatdan commented May 28, 2020

npmccallum commented May 28, 2020

rhatdan commented Jun 9, 2020

vrothberg commented Jun 10, 2020

rhatdan commented Jun 10, 2020

github-actions bot commented Jul 23, 2020

rhatdan commented Jul 23, 2020

github-actions bot commented Aug 23, 2020

rhatdan commented Aug 24, 2020

QiWang19 commented Aug 24, 2020

github-actions bot commented Oct 11, 2020

kblin commented Jan 12, 2021

rhatdan commented Jan 13, 2021

kblin commented Jan 13, 2021

rhatdan commented Jan 13, 2021

kblin commented Jan 14, 2021

rhatdan commented Jan 14, 2021

kblin commented Jan 14, 2021

rhatdan commented Jan 14, 2021

github-actions bot commented Feb 14, 2021

rhatdan commented Feb 15, 2021

kblin commented Feb 15, 2021

github-actions bot commented Mar 18, 2021

github-actions bot commented Apr 22, 2021