-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFE] Integrate sd_notify with podman health checks #6160
Comments
@vrothberg @giuseppe WDYT |
Apologies for the not replying earlier to the issue. We have focused on getting upcoming release of Podman v2.0.0 into shape. First, I love your idea as it matches our vision of extending Podman with systemd. What's important to mention is that Podman will mount the sd-notify socket into each container if it's set. The idea behind that behaviour is to delegate the messaging to the container itself. This implies that the "healthchecks" logic had to be implemented inside the container. Let's assume we have a database container: when running inside a @rhatdan, that could be an interesting blog post. |
I think there are two uses for notify. One, as @vrothberg pointed out, is managed by the container to signal when it is ready. Another one that could be handled with healtchecks, is systemd watchdog to signal systemd that the service/container is still working and doesn't need to be restarted. This second use case should not be delegated to the container. |
Thanks for your feedback @vrothberg and @giuseppe I am aware that the sd_notify socket es exposed to the container, but IMHO it is not easy to make use of it if the containerized application (e.g. a spring-boot application) is not sd_notify aware. I was thinking about the bash script solution you mentioned, but I couldn't get my head around on how to make it work if I am not in control of the container image build process. At the end of the day, I would love to have something like Kubernetes readiness / liveliness probes which can be tied to systemd which controls the states of the containers. The health check mechanism is a great start and seems to work fine, but it would be great if systemd knew about the state of these health checks. |
I think this had to be a separate unit as watchdogs are sd-notify based. Using that in the same unit would cause the container to mount the socket (and the runtime to wait for a ready message).
@jritter, I think we can achieve that with a second unit running |
wouldn't that be a problem only if the runtime send a healthcheck before the container payload sends the READY=1 (and assuming systemd doesn't handle this case)? I don't think there is a problem if both use the same socket. |
Friendly ping. |
Hi, sorry for the delay. I was thinking about a separate unit B monitoring the podman container service unit A, and then model dependencies based on the systemd status of unit B. It just doesn't feel like a nice solution at the end of the day (lots of error-prone manual wiring and confusing dependencies), in my opinion, it should be possible to wire the service up in a way that systemd knows what's going on under the hood. And to me the mechanism now this is achieved really doesn't matter, I just thought sd_notify might be an option. I had a chat with @mheon regarding this and based on his reaction I thought this might be the way to go. |
A friendly reminder that this issue had no activity for 30 days. |
@jritter @mheon @vrothberg Any further thoughts on this? |
I think that with the recent changes from @goochjj we are a huge leap closer. Need more time (which I currently don't have) to think it through though. |
@vrothberg, I assume you are referring to this PR, right? #6693 That certainly looks interesting, I'll have a closer look at this. Based on this #6693 (comment) example I would run the readiness check after running What I don't like about this approach is the fact that containers have to be started in detached mode, which makes log handling a bit trickier. When running in foreground mode, logs written to stdout are fed straight into the systemd journal, which makes it super easy to query. Any idea on that? Of course |
Couldn't the |
@duritong, according to the manpage |
By definition @jritter can't use ExecStartPost because that wouldn't check service health, it just verifies podman returned. With the sdnotify options now, you can use sdnotify conmon and it'll send READY=1 when podman exits. No opportunity for a health check. With sdnotify container, YOU send READY=1, so it's up to you to send it when appropriate, and signal other states as appropriate. Either way, podman isn't "in charge" of sd-notify... I imagine health checking is an OCI runtime option, so it might be something that could be added in the OCI runtime, but right now it just passes READY through. You could do your own health check at the systemd level. I.e. Script it, use a script w/ a coprocess for the health checking, or exec podman -d and have your "primary" process be one that polls health and returns ready and such. You could do your own health check at the container level - just make sure the process in your container that does healthchecks chats with systemd. i.e.
I'm not sure if NOTIFY_SOCKET is available to health check processes when they're spawned - but I'm stabbing in the dark and assuming the health checks are spawned by the OCI runtime, not by podman. Making that smarter would be harder. That's why really I'd like podman to take sd-notify out of the hands of the runtime entirely... because the BIGGEST problem with what you suggest is that podman will block until READY=1 is received. -d isn't going to fix that. And by block, I mean everything. podman ps, podman inspect, podman exec, podman anything will block until READY=1 is received - because the OCI runtime blocks until the container is "READY". Since you want it tied to health checking, you're assuming there's an init process with a non-trivial startup time, which means you WILL have this issue. See #6688 Ultimately, IMHO this isn't behavior podman is responsible for, and personally it's behavior I wouldn't WANT the OCI runtime or anything else doing. If you want to have a healthcheck that can speak sd-notify, then go right ahead. But I think the use case is too complicated and specialized for automagic behavior. Can you define exactly how "health check response codes" would translate into sd-notify messages? |
I don't think that is true anymore. Before it was true, because the service type needed it, and we needed Systemd to look for the pid file. Now as long as Type=notify works with long running processes as well as daemons, it should make no difference whether you use -d or not. Otherwise your complaint is that "Systemd Type=notify services must fork" which isn't a podman problem. You'll probably end up with an extra pid - i.e. podman sticking around in addition to conmon |
Myentrypoint forks off your spring boot process and execs the health check. |
Hi @goochjj , thanks for chiming in on this. I think you are raising good points. The base of my idea was to take two existing concepts (systemd-notify and podman health checks) and make them able to talk to each other. My ultimate goal however is to be able to model dependencies in systemd based on the service state, for instance when a service A running in a container takes 3 minutes to start, service B which depends on service A should not start until service A is fully up. In an ideal world, all my services within the containers all understand the concept of SD_NOTIFY out of the box, but unfortunately, this is not the case in real life. So I am looking for a solution to run some sort of readiness and liveness checks, and tell systemd about the results, similar to how Kubernetes does its readiness and liveness checks.
Maybe you are right, and health checking is not the responsibility of podman in this screnario. In the Kubernetes world, health checking is done by the kubelet, which is part of the orchestrator and not the container runtime. I guess in the situation we are discussing, systemd takes the role of the orchestrator. Any thoughts on this @vrothberg @giuseppe @rhatdan @goochjj ? Should this maybe be a discussion on the systemd mailing list?
According to man podman-healthcheck-run(1), there are currently 3 return codes defined:
So my approach would be to send READY=1 and WATCHDOG=1 if the health check returns 0, READY=0 otherwise. Another approach that we might look into: podman health checks are triggered periodically by a systemd timer, which run in a separate cgroup. Maybe those could send a systemd-notify notification, this would require NotifyAccess to be set to "all" I guess. |
I'm not sure if I am following here. You are saying that logs also appear in the systemd journal when podman is forking? Here an example where that was not the case, but maybe I am missing an option:
|
I agree, this works in a perfect world where I am in charge of the container image build process as well as sysadmin. In my current situation, I have to operate under the assumption that I cannot modify the container images (e.g. third party software shipped in a container image). Of course, I could just slap a different configuration on top of the image, but this would cause raised eyebrows from a support perspective. |
Nothing I suggested actually changed the image, you're just bindmounting a new entrypoint in place. If that will raise eyebrows, then the people in charge of building the image should integrate operations concerns into the build. That's why it's DevOps, after all. :-D |
READY=0 isn't a thing. https://www.freedesktop.org/software/systemd/man/sd_notify.html
You could DELAY sending READY=1, but you can't send READY=0. I'm against having the OCI runtime integrate health checks into SDNotify, mainly because the OCI runtime's behavior right now is to block until READY=1 is sent, which makes many other things worse. (i.e. podman locks, inability to exec into the container) Delaying the INITIAL READY=1 will deal with startup, but consider the case where the health check never succeeds. It'll lock podman forever. After that, you could have the health check send WATCHDOG=1 and let systemd deal with timeouts, or do WATCHDOG=trigger if... 1 health check fails? multiple health checks fail? IMHO if your intention is to better integrate your containers with systemd... you...should.. do that. Bindmount the notify socket through and implement your own service startup notifications/health checks, as sdnotify was purpose built to allow services to report their own status, I think you should implement it that way.
Since you have a business need to not modify the build, that's another option - perhaps using ExecStartPre. Do be aware that systemd identifies which service it's speaking to by cgroup, so it WOULD have to be part of the same cgroup the unit uses, otherwise the messages you send to the notify socket would not be mapped back to the right unit. (sdnotify protocol uses a single unix socket, and has no protocol to identify which unit is speaking, it resolves the unit from the cgroup of the sending PID) |
IFTFY - try below.
You'll find that podman doesn't block. sdnotify receives communications from conmon, so the maidpid is passed up without needing a pidfile. Notify socket is not passed into the container. Note that doesn't prevent you from bindmounting it into the container and continuing to speak through it, as long as the cgroup resolves properly - I'm a fan of If you don't want podman to send the READY=1, then you're really looking at --sdnotify container - because you still need the MAINPID broadcast from podman, even if it's not going to send the READY=1. Not needed:
Another option, add ExecStartPre= lines to your dependent services.
If any of the pre's without a - fail, then the service startup fails - no need for sdnotify or anything. It does break encapsulation a bit, having dep services needing to know about parent containers. I'd get around that by creating a convention...i.e. /healthcheck.sh is always my healthchecker, that way I don't have to know how to call mysql for a mariadb container. It'll make both parent and dependent service units cleaner. |
Further notes: I prefer I guess you could just use The other thing that supports less magic here is the healthcheck just sets the container to unhealthy. For instance, even Docker wouldn't kill and restart the container, nor does podman. It's left up to the orchestrator to determine the remedial behavior. Podman and OCI runtimes are lower level than container orchestration. So if you're using systemd as your orchestrator... yeah, not sure how podman can help. Docker sends an event when the healthcheck fails, (actually, when a container transitions from healthy to unhealthy), so perhaps what you really need to do is tap into podman's events stream, and trigger systemd actions to remediate. |
Upon looking at the code, the health checks appear to leverage systemd to create a timer to run I'd consider it in-scope to providing "healthy" and "unhealthy" commands that the It still means scripting your "ops". Also, given the commands available to you you could just schedule a systemd timer to do that for you, if that's all the healthchecks are. Additional notes, I did a contrived check here and I didn't see healthy or unhealthy events show up in the podman events stream... So either i'm doing something wrong or podman doesn't do that... Similarly I don't see "healthy" and "unhealthy" in the podman ps output. These would be feature discrepancies between docker and podman that are probably undesirable... That may be actionable as a bug report. I don't think SDNotify is going to be the answer. (Which is probably why it isn't widely deployed) |
@jritter Thoughts? |
Thanks, this looks promising, I'll give this a shot once this new functionality has been released, 2.0.4 doesn't contain it yet as far as I can tell.
Well, running checks on the dependent service also occurred to me, and that's more or less how I do it at the moment. The goal behind this RFE is to be able to model these dependencies without having knowledge of services that another service depends on, and model everything using systemd dependencies. |
Because you don't understand what health checks are, or what they're meant to do. :-D This is old but still relevant: https://developers.redhat.com/blog/2019/04/18/monitoring-container-vitality-and-availability-with-podman You have actually proven that podman AND the health check are working, because the container status moved to unhealthy. That's all the healthcheck commands on If the question is "how do I run related services after I'm sure my parent container is healthy", IMHO, add an ExecStartPre command that runs If the question is "if the container is unhealthy and didn't get restarted".... Correct. Podman takes no action - but for that matter, what would you expect that action to be?
As long as podman run or podman healthcheck doesn't have options to define the above behavior, there can be no reasonable expectation that systemd and podman alone are going to restart your container. You could create your own systemd unit, with your own timer, which can do this. Take the transient unit in the link above as an example and expand from there - except, define a reasonable OnFailure= action to happen in the transient unit service. (And actually define it as a non-transient unit) If there were ANY RFE to come out of this, I'd expect it to be something along the lines of "these config options allow me to add an OnFailure to the healthcheck systemd unit" (i.e. --healthcheck-onfailure=podman-recover@%N) or "we allow a template or more options available to expand the healthcheck systemd unit". Or "Bake some other response into I think as far as the tool, all the pieces are there to do whatever needs to be done... Expecting podman to just DO it seems beyond what I would expect of a container product that doesn't have any running daemons and makes no claims of being a job orchestrator :-D. Maybe some additional logic can be wrapped around containers in a pod - but unless it's configurable, I wouldn't even want that. sdnotify=healthcheck isn't really feasible because conmon is the one doing the SD_NOTIFY, and the healthcheck is happening in the transitory unit. This doesn't sound like a responsibility conmon should have (executing additional healthchecks and acting on the results) especially since a transitory unit is already doing it. So really, you'd want database.service to start database-container.service, database-ready.service... Have database-ready service require+after database-container, have database.service require both (or make database-container and database-ready PartOf database) database-ready calls podman healthcheck run, set the restartSec such that it's just for intervals between healthchecks at service start. It would be a lot simpler (given the above suggestion) to just add the healthcheck to the dependent service in the first place to handle startup sequencing. And finally, I feel this issue should be closed.
|
Also, there appears to be active development on a health-on-failure action, so these ideas likely belong elsewhere (and this should be closed and/or linked to those issues) |
This just merged, so is this enough to satisfy issue. |
Have a look at the following PR: #15687 This is how we envision healthchecks to be used in conjunction with systemd. |
I think we have many pieces in place with the on-failure-actions. There's also a blog on the topic: https://www.redhat.com/sysadmin/podman-edge-healthcheck One thing we can think of is adding a |
I agree. |
I just had another look. We need to get #13627 in before tackling this issue here. The idea when |
There are two types of health checks: |
Support two new wait conditions, "healthy" and "unhealthy". This further paves the way for integrating sdnotify with health checks which is currently being tracked in containers#6160. Fixes: containers#13627 Signed-off-by: Valentin Rothberg <[email protected]>
Add a new "healthy" sdnotify policy that instructs Podman to send the READY message once the container has turned healthy. Fixes: containers#6160 Signed-off-by: Valentin Rothberg <[email protected]>
/kind feature
Description
Recently I have played around multiple times with starting podman containers using systemd. If a service consists of multiple containers, e.g. a web application and a database server, some applications require the start of the service to be orchestrated, i.e. the web application needs to be started after the database container is started and ready.
Dependencies can be modeled in systemd, but I didn't find a possibility to model dependencies based on the readiness of a service running within a container.
It occurred to me that maybe it would be possible to do such an integration by using the sd_notify mechanism in combination with podman health checks, i.e. mark a systemd service of type "notify" as started only when the health check reports "healthy".
Any thoughts on that?
The text was updated successfully, but these errors were encountered: