Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement --sdnotify cmdline option to control sd-notify behavior #6693

Merged

Conversation

goochjj
Copy link
Contributor

@goochjj goochjj commented Jun 19, 2020

--sdnotify conmon-only|container|none

With "conmon-only", we send the MAINPID, and clear the NOTIFY_SOCKET so the OCI
runtime doesn't pass it into the container. We also advertise "ready" when the
OCI runtime finishes to advertise the service as ready.

With "container", we send the MAINPID, and leave the NOTIFY_SOCKET so the OCI
runtime passes it into the container for initialization, and let the container advertise further metadata.

The "none" option does what it's always done in the past - passes NOTIFY_SOCKET
to conmon's process and the OCI runtime processes, and does no manipulation or "help".

This removes the need for hardcoded CID and PID files in the command line, and
the PIDFile directive, as the pid is advertised directly through sd-notify.

Signed-off-by: Joseph Gooch [email protected]

Includes #6689
References #6688

@openshift-ci-robot
Copy link
Collaborator

Hi @goochjj. Thanks for your PR.

I'm waiting for a containers member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jun 19, 2020
@goochjj
Copy link
Contributor Author

goochjj commented Jun 19, 2020

The more I think about this I'm not sure "none" is a valid option, though it was essentially requested... Since "none" returns a broken situation. (MAINPID never advertised properly)... Unless there's a case when runc or crun advertises the correct MAINPID.

@goochjj goochjj force-pushed the libpod-sd-notify-cmdline branch 2 times, most recently from da77ae1 to 44de538 Compare June 19, 2020 18:32
@mheon
Copy link
Member

mheon commented Jun 19, 2020

I think none is "I am not being managed by systemd, please ignore" - we've seen cases where Podman is run as a child of something actually managed by systemd, inherits NOTIFY_SOCKET from the parent (arguably a bug in the parent, but still), and then blocks forever as it tries to notify on an already-used socket.

@goochjj
Copy link
Contributor Author

goochjj commented Jun 19, 2020

I think none is "I am not being managed by systemd, please ignore" - we've seen cases where Podman is run as a child of something actually managed by systemd, inherits NOTIFY_SOCKET from the parent (arguably a bug in the parent, but still), and then blocks forever as it tries to notify on an already-used socket.

That's a reasonable interpretation. In that case I should probably block NOTIFY_SOCKET from being passed down to the OCI runtime and conmon... correct? That way everything will proceed as if the NOTIFY_SOCKET isn't there.

vs. what it does now, which means "libpod does nothing with NOTIFY_SOCKET other than pass it on", which is pretty similar to "container".

@goochjj
Copy link
Contributor Author

goochjj commented Jun 19, 2020

I think none is "I am not being managed by systemd, please ignore" - we've seen cases where Podman is run as a child of something actually managed by systemd, inherits NOTIFY_SOCKET from the parent (arguably a bug in the parent, but still), and then blocks forever as it tries to notify on an already-used socket.

That's a reasonable interpretation. In that case I should probably block NOTIFY_SOCKET from being passed down to the OCI runtime and conmon... correct? That way everything will proceed as if the NOTIFY_SOCKET isn't there.

vs. what it does now, which means "libpod does nothing with NOTIFY_SOCKET other than pass it on", which is pretty similar to "container".

In which case I think "ignore" is a better description.

@goochjj goochjj force-pushed the libpod-sd-notify-cmdline branch 2 times, most recently from 094ce6f to 0330cc7 Compare June 19, 2020 20:21
@mheon
Copy link
Member

mheon commented Jun 19, 2020

Ignore works for me - and yes, we should not forward the socket in that case

@goochjj goochjj force-pushed the libpod-sd-notify-cmdline branch from 0330cc7 to 86e2bad Compare June 19, 2020 21:36
@rhatdan
Copy link
Member

rhatdan commented Jun 20, 2020

Doesn't Podman have to respond to the ignore and set the ready state for systemd if it sees the flag and is set to ignore? IE Systemd will never mark the unit file as ready unless someone says it is ready.

@goochjj
Copy link
Contributor Author

goochjj commented Jun 20, 2020

Doesn't Podman have to respond to the ignore and set the ready state for systemd if it sees the flag and is set to ignore? IE Systemd will never mark the unit file as ready unless someone says it is ready.

That's true - however, currently Podman doesn't respond at all. Runc/crun do. Ignore is only relevant when some other service is the responsible party.

The default is "container", which is the current behavior, which would mean Runc/Crun need to respond. This PR just makes sure the correct MAINPID is set during the conversation.

conmon-only means Podman is the responsible party - the one responsible for sending READY=1, and keeps runc/crun out of the loop.

ignore would only be used if some other service is calling podman. I.e. some service that is sd-notify enabled, and therefore they are the responsible party. Contrived example:

#!/bin/bash

#I'm a service
systemd-notify --status="Starting up"

#Spawn some containers
podman run --sdnotify ignore --rm -d alpine sleep infinity
podman run --sdnotify ignore --rm -d alpine sleep infinity
podman run --sdnotify ignore --rm -d alpine sleep infinity

#Do other stuff
sleep 1

#And we're good
systemd-notify --ready

Right now, each podman would see the NOTIFY_SOCKET set and assume that it's responsible for telling systemd when it's ready... Which is incorrect, the overall service is the shell script.

The other alternative is people explicitly changing the podman lines to do:

env -u NOTIFY_SOCKET podman run --rm -d alpine sleep infinity

As that's really what ignore is doing.

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 21, 2020
@vrothberg
Copy link
Member

This removes the need for hardcoded CID and PID files in the command line, and
the PIDFile directive, as the pid is advertised directly through sd-notify.

Can you elaborate why this would remove the CID file? I think that's still needed to cleanly stop/remove a container in a unit.

@vrothberg
Copy link
Member

Reviewing now. @giuseppe PTAL as well

Copy link
Member

@vrothberg vrothberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh man, I really like this PR. Thanks a ton!

Just a minor comment regarding input validation. @edsantiago , could you have a look as well? I wonder if you have a suggestion how we could test this in CI.

libpod/container_internal.go Outdated Show resolved Hide resolved
libpod/options.go Show resolved Hide resolved
@mheon
Copy link
Member

mheon commented Jun 22, 2020

Once comments are addressed, this LGTM

@giuseppe giuseppe self-requested a review June 22, 2020 14:24
Copy link
Member

@giuseppe giuseppe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good change! LGTM once the comments above are addressed

@goochjj
Copy link
Contributor Author

goochjj commented Jun 22, 2020

This removes the need for hardcoded CID and PID files in the command line, and
the PIDFile directive, as the pid is advertised directly through sd-notify.

Can you elaborate why this would remove the CID file? I think that's still needed to cleanly stop/remove a container in a unit.

CID file can still be used, it's just SystemD doesn't need it so it doesn't need to be hanging out in /run/%n-cid, it'll revert to being in userdata/

PID file can still be used, it's just SystemD doesn't need it so it doesn't need to be hanging out in /run/%n.pid, it'll revert to being in userdata/.... And we don't need to specify it in PIDFile because systemd will receive the appropriate PID via MAINPID= w/ Type=notify.

If for whatever reason people still want to use Type=forking, they can, they'll need to keep the PIDFile directive and put it somewhere it can be found... Otherwise Type=notify just makes things more clean and less hardcoded.

@goochjj
Copy link
Contributor Author

goochjj commented Jun 22, 2020

Oh man, I really like this PR. Thanks a ton!

Just a minor comment regarding input validation. @edsantiago , could you have a look as well? I wonder if you have a suggestion how we could test this in CI.

I too was struggling with this. I can envision the tests - as I did the testing manually, but it's going to require forking which is currently beyond my ability to do in golang.

We'd need a thread that opens a UNIX dgram socket and captures input. Perhaps (tmpdir)/notify.

  1. Ensure podman ignores the socket
    Create notify socket
    Monitor notify socket
    set env NOTIFY_SOCKET
    Spawn podman run --rm --sd-notify=ignore alpine ls
    Verify podman returns exit 0
    Verify notify socket received absolutely no data.

  2. Verify sd-notify=conmon (heres where it starts to get tricky)
    Create notify socket
    set env NOTIFY_SOCKET
    Proc1) Monitor notify socket
    Proc2) Spawn podman run --rm --sd-notify=conmon alpine ls
    Proc1) wait for receipt of MAINPID=(pid) and verify it points to conmon, descendent of Proc2
    Proc1) Wait for receipt of second MAINPID= and READY=1
    Proc2) verify exit 0

  3. Verify sd-notify=container
    Create notify socket
    set env NOTIFY_SOCKET
    Proc1) Monitor notify socket
    Proc2) Spawn podman run --rm --sd-notify=container alpine ls
    Proc1) wait for receipt of MAINPID=(pid) and verify it points to conmon, descendent of Proc2
    Proc1) sleep 3 seconds
    Proc1) Find the socket - /run/runc/(CID)/notify/notify.sock or /run/crun/(CID)/notify/notify - and send READY=1 to it
    Proc1) Verify you get back MAINPID= and READY=1
    Proc2) verify exit 0

If podman exits in Proc2 before we send the READY in Proc1, then runc/crun didn't wait for a notification.

  1. Test that not specifying sd-notify does the same thing as 3
  2. Test that not setting NOTIFY_SOCKET does the same thing as 1

I'm not sure how to coordinate the test between two threads like that in Go.

@rhatdan
Copy link
Member

rhatdan commented Jul 6, 2020

@giuseppe PTAL

@openshift-ci-robot openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 6, 2020
@goochjj goochjj force-pushed the libpod-sd-notify-cmdline branch from 0469532 to 3dbaae0 Compare July 6, 2020 17:38
@openshift-ci-robot openshift-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 6, 2020
@goochjj goochjj force-pushed the libpod-sd-notify-cmdline branch from 3dbaae0 to c68eb3f Compare July 6, 2020 17:42
goochjj and others added 2 commits July 6, 2020 17:47
--sdnotify container|conmon|ignore
With "conmon", we send the MAINPID, and clear the NOTIFY_SOCKET so the OCI
runtime doesn't pass it into the container. We also advertise "ready" when the
OCI runtime finishes to advertise the service as ready.

With "container", we send the MAINPID, and leave the NOTIFY_SOCKET so the OCI
runtime passes it into the container for initialization, and let the container advertise further metadata.
This is the default, which is closest to the behavior podman has done in the past.

The "ignore" option removes NOTIFY_SOCKET from the environment, so neither podman nor
any child processes will talk to systemd.

This removes the need for hardcoded CID and PID files in the command line, and
the PIDFile directive, as the pid is advertised directly through sd-notify.

Signed-off-by: Joseph Gooch <[email protected]>
@goochjj goochjj force-pushed the libpod-sd-notify-cmdline branch from c68eb3f to 10ad46e Compare July 6, 2020 17:47
@mheon
Copy link
Member

mheon commented Jul 6, 2020

Two flakes, both on the Fedora repos. I'm just going to merge once this goes green.

/lgtm
/hold

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 6, 2020
@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jul 6, 2020
@rhatdan
Copy link
Member

rhatdan commented Jul 6, 2020

/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 6, 2020
@openshift-merge-robot openshift-merge-robot merged commit 1a93857 into containers:master Jul 6, 2020
@rhatdan
Copy link
Member

rhatdan commented Jul 6, 2020

Thanks @goochjj Nice contribution.

edsantiago added a commit to edsantiago/libpod that referenced this pull request Jul 8, 2020
This began as an empty commit intended solely to get CI to
rebuild the VMs using crun 1.14 (up from 1.13). Twin goals:

  1) Be able to test containers#6693 (--sdnotify option); and
  2) Get rid of 'cgroup.freeze' CI flakes.

These are fixed by crun PRs 419 and 423 respectively.

CI failed on the original PR submission, with errors on ginkgo
install. The change to Makefile is intended to address that.

The change to setup_environment.sh is intended to address
a flake we're frequently seeing with the Fedora dnf repos.
This adds a retry in case of a failing dnf command.

Signed-off-by: Ed Santiago <[email protected]>
edsantiago added a commit to edsantiago/libpod that referenced this pull request Aug 14, 2020
Oops. PR containers#6693 (sdnotify) added tests, but they were disabled
due to broken crun on f31. I tried for three weeks to get a
magic CI:IMG PR to update crun on the CI VMs ... but in that
time I forgot to actually enable those new tests.

This PR removes a 'skip', replacing it with a check that systemd
is running plus one more to make sure our runtime is crun. It
looks like sdnotify just doesn't work on Ubuntu (it hangs), and
my guess is that it's a crun/runc issue.

I also changed the test image from fedora:latest to :31, because,
sigh, fedora:latest removed the systemd-notify tool.

WARNING WARNING WARNING: the symptom of a missing systemd-notify
is that podman will hang forever, not even stopped by the timeout
command in podman_run! (Filed: containers#7316). This means that if the
sdnotify-in-container test ever fails, the symptom will be that
Cirrus itself will time out (2 hours?). This is horrible. I
don't know what to do about it other than push for a fix for 7316.

Signed-off-by: Ed Santiago <[email protected]>
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 24, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 24, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. ok-to-test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants