Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rootless: make sure we only use a single pause process #18083

Merged
merged 2 commits into from
Apr 11, 2023

Conversation

Luap99
Copy link
Member

@Luap99 Luap99 commented Apr 6, 2023

Currently --tmpdir changes the location of the pause.pid file. this causes issues because the c code in pkg/rootless does not know about that. I tried to fix this[1] by fixing the c code to not use the shortcut. While this fix worked it will result in many pause processes leaking in the integrration tests.

Commit ab88632 added this behavior but following the disccusion it was never the intention that we end up having more than one pause process. The issues that was trying to fix was caused by somthing else AFAICT, the main problem seems to be that the pause.pid file parent directory may not be created when we try to create the pid file so it failed with ENOENT. This patch fixes it by creating this directory always and revert the change to no longer depend on the tmpdir value.

With this commit we now always use XDG_RUNTIME_DIR/libpod/tmp/pause.pid for all podman processes. This allows the c shortcut to work reliably and should therefore improve perfomance over my other approach.

A system test is added to ensure we see the right behavior and that podman system migrate actually stops the pause process. I do not see a better way.

This should fix the issues with namespace missmatches that we can see in CI as flakes.

[1] #18057

Fixes #17903

Does this PR introduce a user-facing change?

None

@openshift-ci openshift-ci bot added release-note-none approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Apr 6, 2023
Copy link
Member

@giuseppe giuseppe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@vrothberg vrothberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/hold

@openshift-ci openshift-ci bot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm Indicates that a PR is ready to be merged. labels Apr 6, 2023
Copy link
Member

@edsantiago edsantiago left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM on first pass, but tests need work.

# Use podman system migrate to stop the currently running pause process
run_podman system migrate

run pgrep "podman pause"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably be pgrep -u $(id -u) ...

assert "$output" == "$tmpdir_userns" "podman should use the same userns"

run pgrep "podman pause"
assert "$output" == "$(cat $XDG_RUNTIME_DIR/libpod/tmp/pause.pid)" "pause pid written to correct location"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't work: system tests do not guarantee having XDG set, and ISTR that fedora gating tests run using su which leaves it unset.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that the fedora tests are run in a different setup concerns me?! Using su to switch users without proper env vars and systemd login session is just asking for unnecessary trouble.

Could I just import the systemd helpers to define it?

# podman initializes this if unset, but systemctl doesn't
if [ -z "$XDG_RUNTIME_DIR" ]; then
if is_rootless; then
export XDG_RUNTIME_DIR=/run/user/$(id -u)
fi
fi

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It concerns me too, but this is a lose-lose situation: we need to make sure podman works in cases where XDG is unset, because that's a valid real-world situation and we need to catch those bugs before customers do. So, we test XDG environments in CI, and do almost-last-minute non-XDG tests in gating. If you can think of a better way to catch all conditions, I'd love to hear ideas.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so just wrap this check with if [ -n "$XDG_RUNTIME_DIR" ]; then so I only test it when it is defined?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[untested]

   local xdg_default="/run/user/$(id -u)"
   local pidfile="${XDG_RUNTIME_DIR:-xdg_default}/libpod/tmp/pause.pid" 
   assert "$output" = "$(<$pidfile)" ...

Could be done in one line but that makes my head hurt

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well the thing here is that podman will also fall back to /tmp/podman-run-$UID if /run/user/$UID is not writeable, which is often the case when you do not have a systemd session.

It is likely better to not assume much about podman and just do not check when we know it is set. In the end the test does not need to care about where the pid lives just that it works and we already verified this above.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.

@edsantiago
Copy link
Member

...but don't push yet. Tests are more than halfway complete, I think we should see how they do.

@edsantiago
Copy link
Member

...such as, Debian. Maybe Debian doesn't set XDG? I think your fix would address that.

@edsantiago
Copy link
Member

edsantiago commented Apr 6, 2023

ok, tests complete. Timing results below, because I have a vague memory that you thought something-something might affect CI times but now I can't find that comment.

type distro user DB local remote container
int debian-12 root 26:34 36:55
int fedora-36 root 31:52 36:34 30:19
int fedora-37 root 30:04 35:04 28:25
int fedora-37 root sqlite 30:50 34:01
int debian-12 rootless 25:00
int fedora-36 rootless 30:59
int fedora-37 rootless 28:09
int fedora-37 rootless sqlite 28:22
sys debian-12 root 26:28 18:13
sys fedora-36 root 31:14 20:37
sys fedora-37 root 29:47 20:58
sys fedora-37-aarch64 root 32:35 22:03
sys debian-12 rootless !27:21
sys fedora-36 rootless 29:23
sys fedora-37 rootless 29:34 19:27

[EDIT: Debian was the only failure]

@Luap99
Copy link
Member Author

Luap99 commented Apr 6, 2023

I asked for #18057, this PR should not be any slower than before

@Luap99 Luap99 force-pushed the pause-single-process branch from f2b0cd6 to 3dd19bb Compare April 6, 2023 14:22
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Apr 6, 2023
@edsantiago
Copy link
Member

Weird: Debian is still failing.

Also, this is weird too: in my hammer-at-sqlite PR, I see four "no logs from conmon" failures (#10927). That's very unusual; the most I've ever seen before is two in one run. I'm commenting on this PR, but it could just as easily be #18085.

@edsantiago
Copy link
Member

Logs downloaded and checked. Lots of flakes, but none of them especially worrisome. (Mostly "no logs from conmon", and one "failed to unmount NS" (presumably handled in #18085)).

@Luap99
Copy link
Member Author

Luap99 commented Apr 6, 2023

Ok I guess I need to hack/get_ci_vm to see what the debian env looks like.

@Luap99
Copy link
Member Author

Luap99 commented Apr 6, 2023

/hold
The test is not working correctly because the podman pause process can also be catatonit -P so pgrep obviously didn't found it.

The question is also why is fedora using podman pause but debian catatonit -P?

/* Attempt to execv catatonit to keep the pause process alive. */
execl (LIBEXECPODMAN "catatonit", "catatonit", "-P", NULL);
execl ("/usr/bin/catatonit", "catatonit", "-P", NULL);
/* and if the catatonit executable could not be found, fallback here... */
prctl (PR_SET_NAME, "podman pause", NULL, NULL, NULL);

On debian we have /usr/bin/catatonit while on fedora it is /usr/libexec/podman/catatonit.
The LIBEXECPODMAN "catatonit" is suspicious, I do not see an extra / added between the dir and catatonit.

While this is a bug and can be fixed easily I am not so sure if my test would make much sense given that I would need to grep for catatonit -P and well this could also be run for whatever reasons by other commands, at least we have no direct way of controlling it.

Anyway it is end of the day here so I will continue next week.

@edsantiago
Copy link
Member

Friendly suggestion for an alternative 550-pause-process.bats

A little safer in some ways. Tested with podman pause and with catatonit. Tested with podman@main (required disabling the pidfile check), and the using-same-namespace test failed as expected.

@TomSweeneyRedHat
Copy link
Member

LGTM
once the tests get happier

Luap99 added 2 commits April 11, 2023 10:57
Currently --tmpdir changes the location of the pause.pid file. this
causes issues because the c code in pkg/rootless does not know about
that. I tried to fix this[1] by fixing the c code to not use the
shortcut. While this fix worked it will result in many pause processes
leaking in the integrration tests.

Commit ab88632 added this behavior but following the disccusion it was
never the intention that we end up having more than one pause process.
The issues that was trying to fix was caused by somthing else AFAICT,
the main problem seems to be that the pause.pid file parent directory
may not be created when we try to create the pid file so it failed with
ENOENT. This patch fixes it by creating this directory always and revert
the change to no longer depend on the tmpdir value.

With this commit we now always use XDG_RUNTIME_DIR/libpod/tmp/pause.pid
for all podman processes. This allows the c shortcut to work reliably
and should therefore improve perfomance over my other approach.

A system test is added to ensure we see the right behavior and that
podman system migrate actually stops the pause process. Thanks to Ed
Santiago for the improved test to make it work for both `catatonit` and
`podman pause`.

This should fix the issues with namespace missmatches that we can see in
CI as flakes.

[1] containers#18057

Fixes containers#18057

Signed-off-by: Paul Holzinger <[email protected]>
The path was missing a slash between the libexec path and the binary
name. This was never noticed because the code already falls back to a
builtt-in pause process.

Fixes: 71f96c2 ("rootless: define LIBEXECPODMAN")

Signed-off-by: Paul Holzinger <[email protected]>
@Luap99 Luap99 force-pushed the pause-single-process branch from 3dd19bb to 38c217a Compare April 11, 2023 09:06
Copy link
Member

@vrothberg vrothberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Apr 11, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: giuseppe, Luap99, vrothberg

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [Luap99,giuseppe,vrothberg]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@vrothberg
Copy link
Member

@giuseppe @edsantiago PTanotherL

@edsantiago
Copy link
Member

LGTM

@edsantiago
Copy link
Member

Also meant to say, I've had this running (cherry-picked) in #17831 and it doesn't cure the "stop" issue but I have seen a reduction in other flakes.

@Luap99
Copy link
Member Author

Luap99 commented Apr 11, 2023

Yeah for now all I think #17903 should be fixed with it, if other flakes are fixed by this that is welcome too.

Copy link
Member

@vrothberg vrothberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Apr 11, 2023
@edsantiago
Copy link
Member

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 11, 2023
@openshift-merge-robot openshift-merge-robot merged commit 8c4838f into containers:main Apr 11, 2023
@Luap99 Luap99 deleted the pause-single-process branch April 11, 2023 15:08
@edsantiago
Copy link
Member

Belated followup: I downloaded all CI logs, manually viewed all the int flakes, and saw no "FS magic" flakes.

In my no-retries PR, though, fully rebased against this, I hit this in f37 rootless:

$ podman [options] network disconnect aliasTest0d525e717e5b8341b3ec5d17a1ffd9884906485f7350245538d5e15c3aa161ab test
Error: getting rootless network namespace: unknown FS magic on "/run/user/1599/netns/rootless-netns-642818b009b6b2c8e81e": 1021994

Any suggestions on how to diagnose that?

@Luap99
Copy link
Member Author

Luap99 commented Apr 11, 2023

OMG, sometimes I do not see the forest for the trees. While this fix is still correct the major problem is that we are killing the pause process in podman system reset.

if err := r.stopPauseProcess(); err != nil {

We cannot do this when we run parallel testing. #18057 may have fixed this as well but there the problem would be that we leak 100/1000s of pause processes which I rather avoid.
I really see no sane way here, we can either
a) skip all reset tests for rootless or
b) make reset not kill the pause process

Regardless I will reopen the issue then, this is clearly not fixed.

@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 2, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 2, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. release-note-none
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pod stats: unknown FS magic on "/run/user/4902/netns/netns-etc-etc"
6 participants