-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pkg/rootless: correctly handle proxy signals on reexec #18681
Conversation
@giuseppe @edsantiago PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: giuseppe, Luap99 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
I haven't reviewed the code, but I tested it on a 1mt VM and it passes Failing because of missing tests. Maybe you could tweak |
Yeah I am thinking about something like that. I would prefer to not change the existing test but I think I can add something like this to 550-pause-process. |
New pause-related flake filed, #18685 |
@edsantiago updated with tests for both code paths I changed. I confirmed that both fail on main right now and pass with this patch. |
Might want to
(Also, much lower pri, might want to fix the duplication in that message; I'll file an issue for it later) Other failures are flakes, including unlinkat-ebusy. I restarted them before realizing that the third flake (this one) was a hard failure. |
yes I also saw that, I just waited so you can take a look at the tests before I repush. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, nice catch!
There are quite a lot of places in podman were we have some signal handlers, most notably libpod/shutdown/handler.go. However when we rexec we do not want any of that and just send all signals we get down to the child obviously. So before we install our signal handler we must first reset all others with signal.Reset(). Also while at it fix a problem were the joinUserAndMountNS() code path would not forward signals at all. This code path is used when you have running containers but the pause process was killed. Fixes containers#16091 Given that signal handlers run in different goroutines parallel it would explain why it flakes sometimes in CI. However to my understanding this flake can only happen when the pause process is dead before we run the podman command. So the question still is what kills the pause process? Signed-off-by: Paul Holzinger <[email protected]>
Am looking at tests, and confused, but nothing that should block this. (Confusion: I'm having trouble understanding the difference between the two new tests, and how the 2nd new test differs from the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/lgtm |
There are quite a lot of places in podman were we have some signal
handlers, most notably libpod/shutdown/handler.go.
However when we rexec we do not want any of that and just send all
signals we get down to the child obviously. So before we install our
signal handler we must first reset all others with signal.Reset().
Also while at it fix a problem were the joinUserAndMountNS() code path
would not forward signals at all. This code path is used when you have
running containers but the pause process was killed.
Fixes #16091
Given that signal handlers run in different goroutines parallel it would
explain why it flakes sometimes in CI. However to my understanding this
flake can only happen when the pause process is dead before we run the
podman command. So the question still is what kills the pause process?
Does this PR introduce a user-facing change?