podman run --rm : Error forwarding signal 23 to container, plus stacktrace #5034

edsantiago · 2020-01-31T12:37:20Z

On rawhide:

# podman run --rm alpine date
Fri Jan 31 12:34:45 UTC 2020
ERRO[0001] Error forwarding signal 23 to container 272087a3133469f8c302991b3b85327f56bd35d0a6536f6127b280238466b693: container has already been removed
ERRO[0001] Error forwarding signal 23 to container 272087a3133469f8c302991b3b85327f56bd35d0a6536f6127b280238466b693: container has already been removed
panic: close of closed channel

goroutine 72 [running]:
panic(0x5619d0bc2ec0, 0x5619d0e1e700)
        /usr/lib/golang/src/runtime/panic.go:1060 +0x424 fp=0xc0003e5ed0 sp=0xc0003e5e28 pc=0x5619cf4c8c24
runtime.closechan(0xc00007e9c0)
        /usr/lib/golang/src/runtime/chan.go:359 +0x253 fp=0xc0003e5f10 sp=0xc0003e5ed0 pc=0x5619cf49acf3
[100% reproducible, so I see no need to print the many hundreds of lines]

podman-1.7.1-0.66.dev.git36af283.fc32.x86_64
conmon-2.0.11-0.3.dev.gitccfdbb6.fc32.x86_64
5.5.0-0.rc6.git3.1.fc32.x86_64

The text was updated successfully, but these errors were encountered:

vrothberg · 2020-01-31T13:13:25Z

Looking into it. Installing a rawhide VM atm.

vrothberg · 2020-01-31T13:19:42Z

Very curious to see a SIGURG flying around O.o

When stopping signal handling (e.g., to properly handle ^C) we are also closing the signal channel. We should really return from the go-routine instead of continuing and risking double-closing the channel which leads to a panic. Fixes: containers#5034 Signed-off-by: Valentin Rothberg <[email protected]>

edsantiago · 2020-02-05T13:12:44Z

I'm seeing this again on rawhide, but (1) on podman run [without --rm] and podman rm (not just run --rm) and (2) not reliably:

   $ podman run --name lcdcF3pEMqYnCsuPJXcHQv7b6IEK2U quay.io/libpod/alpine_labels:latest sleep 30
   $ podman rm -f lcdcF3pEMqYnCsuPJXcHQv7b6IEK2U
   4dbac6142c5d57aa113a2bf5374f264a4ac18a7bf71ec4272959ce2d4a645b05
   time="2020-02-05T07:50:45-05:00" level=error msg="Error forwarding signal 23 to container 4dbac6142c5d57aa113a2bf5374f264a4ac18a7bf71ec4272959ce2d4a645b05: container 4dbac6142c5d57aa113a2bf5374f264a4ac18a7bf71ec4272959ce2d4a645b05 does not exist in database: no such container"
   #/vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
   #| FAIL: exit code is 0; expected 137
   #\^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

# $ podman run quay.io/libpod/alpine_labels:latest false
# time="2020-02-05T07:24:23-05:00" level=error msg="Error forwarding signal 23 t
o container b742ce2565d3744172745d7f0eceaf6af7aecf0f2bc68750f539830d594cb30f: can only kill running containers. b742ce2565d3744172745d7f0eceaf6af7aecf0f2bc68750f539830d594cb30f is in state stopped: container state improper"
# [ rc=1 (expected) ]
# #/vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
# #|     FAIL: podman run false - output
# #| expected: '[no output]'
# #|   actual: 'time="2020-02-05T07:24:23-05:00" level=error msg="Error forwarding signal 23 to container b742ce2565d3744172745d7f0eceaf6af7aecf0f2bc68750f539830d594cb30f: can only kill running containers. b742ce2565d3744172745d7f0eceaf6af7aecf0f2bc68750f539830d594cb30f is in state stopped: container state improper"'
# #\^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Somewhat reliable reproducer:

# (sleep 3;podman rm -f foo) &
[1] 99371
# podman run --name foo alpine sleep 30
[pause here for 13 seconds: the sleep-3, plus the 10-second "rm" delay]
7a173ef58c8eba71cb7609d21dae7703efac7e766f4a57c07349e01a6bbb887c
ERRO[0012] Error forwarding signal 23 to container 7a173ef58c8eba71cb7609d21dae7703efac7e766f4a57c07349e01a6bbb887c: container has already been removed
[1]+  Done                    ( sleep 3; podman rm -f foo )

podman-1.8.0-0.3.dev.git5092c07.fc32.x86_64
crun-0.12-1.fc32.x86_64

vrothberg · 2020-02-05T13:19:07Z

FYI @mheon @baude

I am currently deep in the CVE backporting and won't find time to tackle this today.

mheon · 2020-02-05T14:48:32Z

Are we sure what's sending these SIGURGs?

There is a narrow timing window when they could be coming in, after the container dies but before we retrieve the exit code and exit, which would cause this.

edsantiago · 2020-02-05T14:49:29Z

I'm not sure of anything. I have no idea what's going on here. What info can I provide?

mheon · 2020-02-05T14:50:58Z

Running something in the container that catches signals and prints when they're received, then seeing if a container in steady-state is getting any, might help; I'm wondering if this is something caused by the container exiting, or we just have something hitting us with SIGURG

edsantiago · 2020-02-05T14:56:30Z

Oh, interesting:

# podman run alpine sh -c 'for i in $(seq 1 40|grep -v 17);do trap "echo got: $i" $i; done; echo READY; while ! test -e /stop; do sleep 0.05; done'
READY
[pause, then I decide to ^C]
^Cgot: 2
got: 23
got: 23
^Cgot: 2
^\got: 3
^\got: 3
got: 23

edsantiago · 2020-02-05T15:23:25Z

And:

# podman run alpine sh -c 'for i in $(seq 3 40|grep -v 17);do trap "echo got: $i" $i; done; echo READY; while ! test -e /stop; do sleep 0.05; done'
READY
[wait 2 seconds, then ^C]
^CERRO[0002] Error forwarding signal 23 to container 59e2607049414b43514651244cd5ceffb23cc8b0fdfb057089d78d4548d9969a: can only kill running containers. 59e2607049414b43514651244cd5ceffb23cc8b0fdfb057089d78d4548d9969a is in state stopped: container state improper

(Note that in the above I deliberately do not trap 2)

edsantiago · 2020-02-17T21:47:12Z

Well, here's an interesting difference:

# strace -ff -o /tmp/foo.strace podman run --rm alpine date

On f30:

# grep 'kill.*URG' /tmp/foo.strace.*
[nothing]

On rawhide:

# grep 'kill.*URG' /tmp/foo.strace.*
/tmp/foo.strace.57521:tgkill(57520, 57520, SIGURG)            = 0
/tmp/foo.strace.57521:tgkill(57520, 57520, SIGURG)            = 0
/tmp/foo.strace.57521:tgkill(57520, 57522, SIGURG)            = 0
/tmp/foo.strace.57521:tgkill(57520, 57618, SIGURG)            = 0
/tmp/foo.strace.57521:tgkill(57520, 57618, SIGURG)            = 0
/tmp/foo.strace.57521:tgkill(57520, 57618, SIGURG)            = 0
/tmp/foo.strace.57521:tgkill(57520, 57549, SIGURG)            = 0
/tmp/foo.strace.57521:tgkill(57520, 57549, SIGURG)            = 0
/tmp/foo.strace.57521:tgkill(57520, 57549, SIGURG)            = 0
/tmp/foo.strace.57521:tgkill(57520, 57549, SIGURG)            = 0
/tmp/foo.strace.57521:tgkill(57520, 57549, SIGURG)            = 0
/tmp/foo.strace.57521:tgkill(57520, 57549, SIGURG)            = 0
/tmp/foo.strace.57521:tgkill(57520, 57549, SIGURG)            = 0
/tmp/foo.strace.57549:kill(57520, SIGURG)                     = 0
/tmp/foo.strace.57557:tgkill(57556, 57556, SIGURG)            = 0
/tmp/foo.strace.57631:tgkill(57629, 57629, SIGURG)            = 0
/tmp/foo.strace.57631:tgkill(57629, 57629, SIGURG)            = 0

rhatdan · 2020-02-18T13:51:17Z

Could this be golang-1.14-0.rc1.0.fc32.x86_64

vrothberg · 2020-02-18T14:07:47Z

It could very well be golang. We had issues with 1.13 alpha and beta at the times as wel.

rhatdan · 2020-02-18T19:28:48Z

Hopefully it gets updated in F32. Which is built with the alpha version.

edsantiago · 2020-03-16T20:39:12Z

@baude, I built podman from master (2b2996d); f32 with golang-1.14-1.fc32.

Good news: cannot reproduce the problem using any of the above setups.

Bad news: I get a weird error on podman rm -f:

# ./bin/podman run alpine sh -c 'sleep 20'

Now, from another shell: podman rm -f -a. Back to the above shell, ten seconds later:

ERRO[0013] Cannot get exit code: died not found: unable to find event
# echo $?
127

The number in brackets is often different: 0013, 0012, 0017.

github-actions · 2020-04-16T00:04:03Z

A friendly reminder that this issue had no activity for 30 days.

vrothberg · 2020-04-16T10:03:50Z

This has been fixed in the meanwhile, closing.

vrothberg mentioned this issue Jan 31, 2020

sigproxy: return after closing the channel #5036

Merged

openshift-merge-robot closed this as completed in #5036 Jan 31, 2020

edsantiago reopened this Feb 5, 2020

mheon added the kind/bug Categorizes issue or PR as related to a bug. label Feb 6, 2020

mheon mentioned this issue Mar 12, 2020

race condition when using --rm? #5483

Closed

github-actions bot added the stale-issue label Apr 16, 2020

vrothberg closed this as completed Apr 16, 2020

github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 23, 2023

github-actions bot locked as resolved and limited conversation to collaborators Sep 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

podman run --rm : Error forwarding signal 23 to container, plus stacktrace #5034

podman run --rm : Error forwarding signal 23 to container, plus stacktrace #5034

edsantiago commented Jan 31, 2020

vrothberg commented Jan 31, 2020

vrothberg commented Jan 31, 2020

edsantiago commented Feb 5, 2020

vrothberg commented Feb 5, 2020

mheon commented Feb 5, 2020

edsantiago commented Feb 5, 2020

mheon commented Feb 5, 2020

edsantiago commented Feb 5, 2020

edsantiago commented Feb 5, 2020

edsantiago commented Feb 17, 2020

rhatdan commented Feb 18, 2020

vrothberg commented Feb 18, 2020

rhatdan commented Feb 18, 2020

edsantiago commented Mar 16, 2020

github-actions bot commented Apr 16, 2020

vrothberg commented Apr 16, 2020

podman run --rm : Error forwarding signal 23 to container, plus stacktrace #5034

podman run --rm : Error forwarding signal 23 to container, plus stacktrace #5034

Comments

edsantiago commented Jan 31, 2020

vrothberg commented Jan 31, 2020

vrothberg commented Jan 31, 2020

edsantiago commented Feb 5, 2020

vrothberg commented Feb 5, 2020

mheon commented Feb 5, 2020

edsantiago commented Feb 5, 2020

mheon commented Feb 5, 2020

edsantiago commented Feb 5, 2020

edsantiago commented Feb 5, 2020

edsantiago commented Feb 17, 2020

rhatdan commented Feb 18, 2020

vrothberg commented Feb 18, 2020

rhatdan commented Feb 18, 2020

edsantiago commented Mar 16, 2020

github-actions bot commented Apr 16, 2020

vrothberg commented Apr 16, 2020