Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

podman run --rm : Error forwarding signal 23 to container, plus stacktrace #5034

Closed
edsantiago opened this issue Jan 31, 2020 · 16 comments · Fixed by #5036
Closed

podman run --rm : Error forwarding signal 23 to container, plus stacktrace #5034

edsantiago opened this issue Jan 31, 2020 · 16 comments · Fixed by #5036
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. stale-issue

Comments

@edsantiago
Copy link
Member

On rawhide:

# podman run --rm alpine date
Fri Jan 31 12:34:45 UTC 2020
ERRO[0001] Error forwarding signal 23 to container 272087a3133469f8c302991b3b85327f56bd35d0a6536f6127b280238466b693: container has already been removed
ERRO[0001] Error forwarding signal 23 to container 272087a3133469f8c302991b3b85327f56bd35d0a6536f6127b280238466b693: container has already been removed
panic: close of closed channel

goroutine 72 [running]:
panic(0x5619d0bc2ec0, 0x5619d0e1e700)
        /usr/lib/golang/src/runtime/panic.go:1060 +0x424 fp=0xc0003e5ed0 sp=0xc0003e5e28 pc=0x5619cf4c8c24
runtime.closechan(0xc00007e9c0)
        /usr/lib/golang/src/runtime/chan.go:359 +0x253 fp=0xc0003e5f10 sp=0xc0003e5ed0 pc=0x5619cf49acf3
[100% reproducible, so I see no need to print the many hundreds of lines]

podman-1.7.1-0.66.dev.git36af283.fc32.x86_64
conmon-2.0.11-0.3.dev.gitccfdbb6.fc32.x86_64
5.5.0-0.rc6.git3.1.fc32.x86_64

@vrothberg
Copy link
Member

Looking into it. Installing a rawhide VM atm.

@vrothberg
Copy link
Member

Very curious to see a SIGURG flying around O.o

vrothberg added a commit to vrothberg/libpod that referenced this issue Jan 31, 2020
When stopping signal handling (e.g., to properly handle ^C) we are also
closing the signal channel.  We should really return from the go-routine
instead of continuing and risking double-closing the channel which leads
to a panic.

Fixes: containers#5034
Signed-off-by: Valentin Rothberg <[email protected]>
@edsantiago
Copy link
Member Author

I'm seeing this again on rawhide, but (1) on podman run [without --rm] and podman rm (not just run --rm) and (2) not reliably:

   $ podman run --name lcdcF3pEMqYnCsuPJXcHQv7b6IEK2U quay.io/libpod/alpine_labels:latest sleep 30
   $ podman rm -f lcdcF3pEMqYnCsuPJXcHQv7b6IEK2U
   4dbac6142c5d57aa113a2bf5374f264a4ac18a7bf71ec4272959ce2d4a645b05
   time="2020-02-05T07:50:45-05:00" level=error msg="Error forwarding signal 23 to container 4dbac6142c5d57aa113a2bf5374f264a4ac18a7bf71ec4272959ce2d4a645b05: container 4dbac6142c5d57aa113a2bf5374f264a4ac18a7bf71ec4272959ce2d4a645b05 does not exist in database: no such container"
   #/vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
   #| FAIL: exit code is 0; expected 137
   #\^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
# $ podman run quay.io/libpod/alpine_labels:latest false
# time="2020-02-05T07:24:23-05:00" level=error msg="Error forwarding signal 23 t
o container b742ce2565d3744172745d7f0eceaf6af7aecf0f2bc68750f539830d594cb30f: can only kill running containers. b742ce2565d3744172745d7f0eceaf6af7aecf0f2bc68750f539830d594cb30f is in state stopped: container state improper"
# [ rc=1 (expected) ]
# #/vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
# #|     FAIL: podman run false - output
# #| expected: '[no output]'
# #|   actual: 'time="2020-02-05T07:24:23-05:00" level=error msg="Error forwarding signal 23 to container b742ce2565d3744172745d7f0eceaf6af7aecf0f2bc68750f539830d594cb30f: can only kill running containers. b742ce2565d3744172745d7f0eceaf6af7aecf0f2bc68750f539830d594cb30f is in state stopped: container state improper"'
# #\^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Somewhat reliable reproducer:

# (sleep 3;podman rm -f foo) &
[1] 99371
# podman run --name foo alpine sleep 30
[pause here for 13 seconds: the sleep-3, plus the 10-second "rm" delay]
7a173ef58c8eba71cb7609d21dae7703efac7e766f4a57c07349e01a6bbb887c
ERRO[0012] Error forwarding signal 23 to container 7a173ef58c8eba71cb7609d21dae7703efac7e766f4a57c07349e01a6bbb887c: container has already been removed
[1]+  Done                    ( sleep 3; podman rm -f foo )

podman-1.8.0-0.3.dev.git5092c07.fc32.x86_64
crun-0.12-1.fc32.x86_64

@edsantiago edsantiago reopened this Feb 5, 2020
@vrothberg
Copy link
Member

FYI @mheon @baude

I am currently deep in the CVE backporting and won't find time to tackle this today.

@mheon
Copy link
Member

mheon commented Feb 5, 2020

Are we sure what's sending these SIGURGs?

There is a narrow timing window when they could be coming in, after the container dies but before we retrieve the exit code and exit, which would cause this.

@edsantiago
Copy link
Member Author

I'm not sure of anything. I have no idea what's going on here. What info can I provide?

@mheon
Copy link
Member

mheon commented Feb 5, 2020

Running something in the container that catches signals and prints when they're received, then seeing if a container in steady-state is getting any, might help; I'm wondering if this is something caused by the container exiting, or we just have something hitting us with SIGURG

@edsantiago
Copy link
Member Author

Oh, interesting:

# podman run alpine sh -c 'for i in $(seq 1 40|grep -v 17);do trap "echo got: $i" $i; done; echo READY; while ! test -e /stop; do sleep 0.05; done'
READY
[pause, then I decide to ^C]
^Cgot: 2
got: 23
got: 23
^Cgot: 2
^\got: 3
^\got: 3
got: 23

@edsantiago
Copy link
Member Author

And:

# podman run alpine sh -c 'for i in $(seq 3 40|grep -v 17);do trap "echo got: $i" $i; done; echo READY; while ! test -e /stop; do sleep 0.05; done'
READY
[wait 2 seconds, then ^C]
^CERRO[0002] Error forwarding signal 23 to container 59e2607049414b43514651244cd5ceffb23cc8b0fdfb057089d78d4548d9969a: can only kill running containers. 59e2607049414b43514651244cd5ceffb23cc8b0fdfb057089d78d4548d9969a is in state stopped: container state improper

(Note that in the above I deliberately do not trap 2)

@mheon mheon added the kind/bug Categorizes issue or PR as related to a bug. label Feb 6, 2020
@edsantiago
Copy link
Member Author

Well, here's an interesting difference:

# strace -ff -o /tmp/foo.strace podman run --rm alpine date

On f30:

# grep 'kill.*URG' /tmp/foo.strace.*
[nothing]

On rawhide:

# grep 'kill.*URG' /tmp/foo.strace.*
/tmp/foo.strace.57521:tgkill(57520, 57520, SIGURG)            = 0
/tmp/foo.strace.57521:tgkill(57520, 57520, SIGURG)            = 0
/tmp/foo.strace.57521:tgkill(57520, 57522, SIGURG)            = 0
/tmp/foo.strace.57521:tgkill(57520, 57618, SIGURG)            = 0
/tmp/foo.strace.57521:tgkill(57520, 57618, SIGURG)            = 0
/tmp/foo.strace.57521:tgkill(57520, 57618, SIGURG)            = 0
/tmp/foo.strace.57521:tgkill(57520, 57549, SIGURG)            = 0
/tmp/foo.strace.57521:tgkill(57520, 57549, SIGURG)            = 0
/tmp/foo.strace.57521:tgkill(57520, 57549, SIGURG)            = 0
/tmp/foo.strace.57521:tgkill(57520, 57549, SIGURG)            = 0
/tmp/foo.strace.57521:tgkill(57520, 57549, SIGURG)            = 0
/tmp/foo.strace.57521:tgkill(57520, 57549, SIGURG)            = 0
/tmp/foo.strace.57521:tgkill(57520, 57549, SIGURG)            = 0
/tmp/foo.strace.57549:kill(57520, SIGURG)                     = 0
/tmp/foo.strace.57557:tgkill(57556, 57556, SIGURG)            = 0
/tmp/foo.strace.57631:tgkill(57629, 57629, SIGURG)            = 0
/tmp/foo.strace.57631:tgkill(57629, 57629, SIGURG)            = 0

@rhatdan
Copy link
Member

rhatdan commented Feb 18, 2020

Could this be golang-1.14-0.rc1.0.fc32.x86_64

@vrothberg
Copy link
Member

It could very well be golang. We had issues with 1.13 alpha and beta at the times as wel.

@rhatdan
Copy link
Member

rhatdan commented Feb 18, 2020

Hopefully it gets updated in F32. Which is built with the alpha version.

@edsantiago
Copy link
Member Author

@baude, I built podman from master (2b2996d); f32 with golang-1.14-1.fc32.

Good news: cannot reproduce the problem using any of the above setups.

Bad news: I get a weird error on podman rm -f:

# ./bin/podman run alpine sh -c 'sleep 20'

Now, from another shell: podman rm -f -a. Back to the above shell, ten seconds later:

ERRO[0013] Cannot get exit code: died not found: unable to find event
# echo $?
127

The number in brackets is often different: 0013, 0012, 0017.

@github-actions
Copy link

A friendly reminder that this issue had no activity for 30 days.

@vrothberg
Copy link
Member

This has been fixed in the meanwhile, closing.

@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 23, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. stale-issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants