-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement --sdnotify cmdline option to control sd-notify behavior #6693
Implement --sdnotify cmdline option to control sd-notify behavior #6693
Conversation
Hi @goochjj. Thanks for your PR. I'm waiting for a containers member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
The more I think about this I'm not sure "none" is a valid option, though it was essentially requested... Since "none" returns a broken situation. (MAINPID never advertised properly)... Unless there's a case when runc or crun advertises the correct MAINPID. |
da77ae1
to
44de538
Compare
I think |
That's a reasonable interpretation. In that case I should probably block NOTIFY_SOCKET from being passed down to the OCI runtime and conmon... correct? That way everything will proceed as if the NOTIFY_SOCKET isn't there. vs. what it does now, which means "libpod does nothing with NOTIFY_SOCKET other than pass it on", which is pretty similar to "container". |
In which case I think "ignore" is a better description. |
094ce6f
to
0330cc7
Compare
Ignore works for me - and yes, we should not forward the socket in that case |
0330cc7
to
86e2bad
Compare
Doesn't Podman have to respond to the ignore and set the ready state for systemd if it sees the flag and is set to ignore? IE Systemd will never mark the unit file as ready unless someone says it is ready. |
That's true - however, currently Podman doesn't respond at all. Runc/crun do. Ignore is only relevant when some other service is the responsible party. The default is "container", which is the current behavior, which would mean Runc/Crun need to respond. This PR just makes sure the correct MAINPID is set during the conversation. conmon-only means Podman is the responsible party - the one responsible for sending READY=1, and keeps runc/crun out of the loop. ignore would only be used if some other service is calling podman. I.e. some service that is sd-notify enabled, and therefore they are the responsible party. Contrived example:
Right now, each podman would see the NOTIFY_SOCKET set and assume that it's responsible for telling systemd when it's ready... Which is incorrect, the overall service is the shell script. The other alternative is people explicitly changing the podman lines to do:
As that's really what ignore is doing. |
Can you elaborate why this would remove the CID file? I think that's still needed to cleanly stop/remove a container in a unit. |
Reviewing now. @giuseppe PTAL as well |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh man, I really like this PR. Thanks a ton!
Just a minor comment regarding input validation. @edsantiago , could you have a look as well? I wonder if you have a suggestion how we could test this in CI.
Once comments are addressed, this LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good change! LGTM once the comments above are addressed
CID file can still be used, it's just SystemD doesn't need it so it doesn't need to be hanging out in /run/%n-cid, it'll revert to being in userdata/ PID file can still be used, it's just SystemD doesn't need it so it doesn't need to be hanging out in /run/%n.pid, it'll revert to being in userdata/.... And we don't need to specify it in PIDFile because systemd will receive the appropriate PID via MAINPID= w/ Type=notify. If for whatever reason people still want to use Type=forking, they can, they'll need to keep the PIDFile directive and put it somewhere it can be found... Otherwise Type=notify just makes things more clean and less hardcoded. |
I too was struggling with this. I can envision the tests - as I did the testing manually, but it's going to require forking which is currently beyond my ability to do in golang. We'd need a thread that opens a UNIX dgram socket and captures input. Perhaps (tmpdir)/notify.
If podman exits in Proc2 before we send the READY in Proc1, then runc/crun didn't wait for a notification.
I'm not sure how to coordinate the test between two threads like that in Go. |
@giuseppe PTAL |
0469532
to
3dbaae0
Compare
3dbaae0
to
c68eb3f
Compare
--sdnotify container|conmon|ignore With "conmon", we send the MAINPID, and clear the NOTIFY_SOCKET so the OCI runtime doesn't pass it into the container. We also advertise "ready" when the OCI runtime finishes to advertise the service as ready. With "container", we send the MAINPID, and leave the NOTIFY_SOCKET so the OCI runtime passes it into the container for initialization, and let the container advertise further metadata. This is the default, which is closest to the behavior podman has done in the past. The "ignore" option removes NOTIFY_SOCKET from the environment, so neither podman nor any child processes will talk to systemd. This removes the need for hardcoded CID and PID files in the command line, and the PIDFile directive, as the pid is advertised directly through sd-notify. Signed-off-by: Joseph Gooch <[email protected]>
Signed-off-by: Ed Santiago <[email protected]>
c68eb3f
to
10ad46e
Compare
Two flakes, both on the Fedora repos. I'm just going to merge once this goes green. /lgtm |
/hold cancel |
Thanks @goochjj Nice contribution. |
This began as an empty commit intended solely to get CI to rebuild the VMs using crun 1.14 (up from 1.13). Twin goals: 1) Be able to test containers#6693 (--sdnotify option); and 2) Get rid of 'cgroup.freeze' CI flakes. These are fixed by crun PRs 419 and 423 respectively. CI failed on the original PR submission, with errors on ginkgo install. The change to Makefile is intended to address that. The change to setup_environment.sh is intended to address a flake we're frequently seeing with the Fedora dnf repos. This adds a retry in case of a failing dnf command. Signed-off-by: Ed Santiago <[email protected]>
Oops. PR containers#6693 (sdnotify) added tests, but they were disabled due to broken crun on f31. I tried for three weeks to get a magic CI:IMG PR to update crun on the CI VMs ... but in that time I forgot to actually enable those new tests. This PR removes a 'skip', replacing it with a check that systemd is running plus one more to make sure our runtime is crun. It looks like sdnotify just doesn't work on Ubuntu (it hangs), and my guess is that it's a crun/runc issue. I also changed the test image from fedora:latest to :31, because, sigh, fedora:latest removed the systemd-notify tool. WARNING WARNING WARNING: the symptom of a missing systemd-notify is that podman will hang forever, not even stopped by the timeout command in podman_run! (Filed: containers#7316). This means that if the sdnotify-in-container test ever fails, the symptom will be that Cirrus itself will time out (2 hours?). This is horrible. I don't know what to do about it other than push for a fix for 7316. Signed-off-by: Ed Santiago <[email protected]>
--sdnotify conmon-only|container|none
With "conmon-only", we send the MAINPID, and clear the NOTIFY_SOCKET so the OCI
runtime doesn't pass it into the container. We also advertise "ready" when the
OCI runtime finishes to advertise the service as ready.
With "container", we send the MAINPID, and leave the NOTIFY_SOCKET so the OCI
runtime passes it into the container for initialization, and let the container advertise further metadata.
The "none" option does what it's always done in the past - passes NOTIFY_SOCKET
to conmon's process and the OCI runtime processes, and does no manipulation or "help".
This removes the need for hardcoded CID and PID files in the command line, and
the PIDFile directive, as the pid is advertised directly through sd-notify.
Signed-off-by: Joseph Gooch [email protected]
Includes #6689
References #6688