pkg/rootless: correctly handle proxy signals on reexec #18681

Luap99 · 2023-05-25T10:06:33Z

There are quite a lot of places in podman were we have some signal
handlers, most notably libpod/shutdown/handler.go.

However when we rexec we do not want any of that and just send all
signals we get down to the child obviously. So before we install our
signal handler we must first reset all others with signal.Reset().

Also while at it fix a problem were the joinUserAndMountNS() code path
would not forward signals at all. This code path is used when you have
running containers but the pause process was killed.

Fixes #16091
Given that signal handlers run in different goroutines parallel it would
explain why it flakes sometimes in CI. However to my understanding this
flake can only happen when the pause process is dead before we run the
podman command. So the question still is what kills the pause process?

Does this PR introduce a user-facing change?

Fixed a bug in the rootless podman reexec logic were signals were not forwarded correctly.

Luap99 · 2023-05-25T10:06:50Z

@giuseppe @edsantiago PTAL

giuseppe

LGTM

openshift-ci · 2023-05-25T10:41:16Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: giuseppe, Luap99

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [Luap99,giuseppe]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

edsantiago · 2023-05-25T10:55:22Z

I haven't reviewed the code, but I tested it on a 1mt VM and it passes \o/.

Failing because of missing tests. Maybe you could tweak 032.bats to kill the rootless pause process, if running?

Luap99 · 2023-05-25T11:07:14Z

Failing because of missing tests. Maybe you could tweak 032.bats to kill the rootless pause process, if running?

Yeah I am thinking about something like that. I would prefer to not change the existing test but I think I can add something like this to 550-pause-process.

edsantiago · 2023-05-25T11:52:43Z

So the question still is what kills the pause process?

New pause-related flake filed, #18685

Luap99 · 2023-05-25T13:14:35Z

@edsantiago updated with tests for both code paths I changed. I confirmed that both fail on main right now and pass with this patch.

edsantiago · 2023-05-25T14:37:54Z

Might want to skip_if_remote:

Error: cannot use command "podman-remote unshare" with the remote podman client

(Also, much lower pri, might want to fix the duplication in that message; I'll file an issue for it later)

Other failures are flakes, including unlinkat-ebusy. I restarted them before realizing that the third flake (this one) was a hard failure.

Luap99 · 2023-05-25T14:46:07Z

Might want to skip_if_remote:

yes I also saw that, I just waited so you can take a look at the tests before I repush.

vrothberg

LGTM, nice catch!

There are quite a lot of places in podman were we have some signal handlers, most notably libpod/shutdown/handler.go. However when we rexec we do not want any of that and just send all signals we get down to the child obviously. So before we install our signal handler we must first reset all others with signal.Reset(). Also while at it fix a problem were the joinUserAndMountNS() code path would not forward signals at all. This code path is used when you have running containers but the pause process was killed. Fixes containers#16091 Given that signal handlers run in different goroutines parallel it would explain why it flakes sometimes in CI. However to my understanding this flake can only happen when the pause process is dead before we run the podman command. So the question still is what kills the pause process? Signed-off-by: Paul Holzinger <[email protected]>

edsantiago · 2023-05-25T14:52:46Z

Am looking at tests, and confused, but nothing that should block this.

(Confusion: I'm having trouble understanding the difference between the two new tests, and how the 2nd new test differs from the single pause process test. But again, that is my lack of comprehension, and not a blocker nor a request to change anything)

TomSweeneyRedHat

LGTM

TomSweeneyRedHat · 2023-05-27T21:15:16Z

/lgtm

openshift-ci bot added release-note approved Indicates a PR has been approved by an approver from all required OWNERS files. labels May 25, 2023

giuseppe approved these changes May 25, 2023

View reviewed changes

Luap99 force-pushed the reexec-signals branch from c39801b to e2c8b55 Compare May 25, 2023 13:13

vrothberg reviewed May 25, 2023

View reviewed changes

Luap99 force-pushed the reexec-signals branch from e2c8b55 to 6bc52c9 Compare May 25, 2023 14:49

edsantiago mentioned this pull request May 26, 2023

[systest] subnet is already used on the host or by another config #18693

Closed

TomSweeneyRedHat reviewed May 27, 2023

View reviewed changes

openshift-ci bot assigned TomSweeneyRedHat May 27, 2023

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label May 27, 2023

openshift-merge-robot merged commit e7dc507 into containers:main May 27, 2023

Luap99 deleted the reexec-signals branch May 28, 2023 16:18

edsantiago mentioned this pull request Jun 7, 2023

unlinkat/EBUSY/hosed is back (Jan 2023) #17216

Closed

github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Aug 27, 2023

github-actions bot locked as resolved and limited conversation to collaborators Aug 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pkg/rootless: correctly handle proxy signals on reexec #18681

pkg/rootless: correctly handle proxy signals on reexec #18681

Luap99 commented May 25, 2023

Luap99 commented May 25, 2023

giuseppe left a comment

openshift-ci bot commented May 25, 2023

edsantiago commented May 25, 2023

Luap99 commented May 25, 2023

edsantiago commented May 25, 2023

Luap99 commented May 25, 2023

edsantiago commented May 25, 2023

Luap99 commented May 25, 2023

vrothberg left a comment

edsantiago commented May 25, 2023

TomSweeneyRedHat left a comment

TomSweeneyRedHat commented May 27, 2023

pkg/rootless: correctly handle proxy signals on reexec #18681

pkg/rootless: correctly handle proxy signals on reexec #18681

Conversation

Luap99 commented May 25, 2023

Does this PR introduce a user-facing change?

Luap99 commented May 25, 2023

giuseppe left a comment

Choose a reason for hiding this comment

openshift-ci bot commented May 25, 2023

edsantiago commented May 25, 2023

Luap99 commented May 25, 2023

edsantiago commented May 25, 2023

Luap99 commented May 25, 2023

edsantiago commented May 25, 2023

Luap99 commented May 25, 2023

vrothberg left a comment

Choose a reason for hiding this comment

edsantiago commented May 25, 2023

TomSweeneyRedHat left a comment

Choose a reason for hiding this comment

TomSweeneyRedHat commented May 27, 2023