systemd sd-notify freezes podman #6688

goochjj · 2020-06-19T13:28:49Z

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description

Creating a service using Type=notify freezes everything until the container is ready.

Setting a service as Type=notify sets the NOTIFY_SOCKET, which gets passed through podman properly. Runc and Crun then proxy that NOTIFY_SOCKET through to the container, indicating the container will signal when ready. The whole idea being that starting a container does not equal "a container is ready", this initialization could take seconds, 10's of seconds, or minutes. And it shouldn't matter how long it takes..

The problem is while it's in "starting" status, podman is frozen completely. podman ps doesn't return, podman exec doesn't work, even podman info won't return. One partially initialized container shouldn't freeze everything, and the lack of exec makes it hard to diagnose what's going on inside the container to resolve the sd-notify issue. podman stop and podman kill appear to work, but the container is still stuck.

In addition, the MAINPID isn't set right - but we'll come back to that.

Steps to reproduce the issue:

Create a systemd service. Change type from Type=forking to Type=notify, and remove the
PIDFile. Add NotifyAccess=all.
SSCE

[Unit]
Description=NotifyTesting
Wants=network.target
After=network-online.target

[Service]
Environment=PODMAN_SYSTEMD_UNIT=%n
SyslogIdentifier=%N
ExecStartPre=-/usr/bin/podman stop %N
ExecStartPre=-/usr/bin/podman rm %N
LogExtraFields=CONTAINER_NAME=%N
ExecStart=/usr/bin/podman --log-level=debug run \
  -d --log-driver=journald \
  --init \
  --cgroups no-conmon \
  --net=host \
  --name %N \
  alpine sleep infinity
ExecStop=/usr/bin/podman stop -t 20 %N
Type=notify
NotifyAccess=all
Restart=on-failure
#Restart=always
RestartSec=30s
TimeoutStartSec=20
TimeoutStopSec=25
#KillMode=none
#Type=forking
#PIDFile=/run/podman-pid-%n
Delegate=yes
Slice=machine.slice

[Install]
WantedBy=multi-user.target default.target

Run the service.
Note the status:

notifytest[8595]: time="2020-06-19T13:22:38Z" level=debug msg="Starting container e6043f58bcd610d1e448739f2120447f2880c9b498c65fc3c181e1f453a48ef7 with command

It never gets to started (as expected)

Status:

 Main PID: 12003 (podman)
    Tasks: 22 (limit: 4915)
   Memory: 27.9M
   CGroup: /machine.slice/notifytest.service
           ├─12003 /usr/share/gocode/src/github.com/containers/libpod/bin/podman --log-level=debug run -d --log-driver=journald --init --cgroups no-conmon --net=host --name notifytest alpine sleep infinity
           ├─12065 /usr/libexec/podman/conmon --api-version 1 -c a3e79ea772bdcca69020eca158f059718ff0f4b34dd1c8f8af5e77c6840e60f0 -u a3e79ea772bdcca69020eca158f059718ff0f4b34dd1c8f8af5e77c6840e60f0 -r /usr/bin/runc -b /va>
           └─12084 /usr/bin/runc start a3e79ea772bdcca69020eca158f059718ff0f4b34dd1c8f8af5e77c6840e60f0

Try podman ps, podman exec -it -l sh, or podman info
Freezes.
Release the container
for i in /run/crun/*/notify/notify /run/runc/*/notify/notify.sock; do env NOTIFY_SOCKET=$i systemd-notify --ready; done
Everything releases, but the service STILL fails, because it didn't find MAINPID=conmon's pid.

Output of podman version:

Version:      2.0.0-dev
API Version:  1
Go Version:   go1.13.3
Git Commit:   b27df834c18b08bb68172fa5bd5fd12a5cd54633
Built:        Thu Jun 18 12:19:01 2020
OS/Arch:      linux/amd64

Output of podman info --debug:

host:
  arch: amd64
  buildahVersion: 1.15.0
  cgroupVersion: v1
  conmon:
    package: Unknown
    path: /usr/libexec/podman/conmon
    version: 'conmon version 2.0.18-dev, commit: 954b05a7908c0aeeff007ebd19ff662e20e5f46f'
  cpus: 4
  distribution:
    distribution: photon
    version: "3.0"
  eventLogger: file
  hostname: photon-machine
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 4.19.115-6.ph3-esx
  linkmode: dynamic
  memFree: 5536968704
  memTotal: 8359960576
  ociRuntime:
    name: runc
    package: runc-1.0.0.rc9-2.ph3.x86_64
    path: /usr/bin/runc
    version: |-
      runc version 1.0.0-rc10+dev
      commit: 2a0466958d9af23af2ad12bd79d06ed0af4091e2
      spec: 1.0.2-dev
  os: linux
  remoteSocket:
    path: /run/podman/podman.sock
  rootless: false
  slirp4netns:
    executable: ""
    package: ""
    version: ""
  swapFree: 0
  swapTotal: 0
  uptime: 25h 9m 27.25s (Approximately 1.04 days)
registries:
  search:
  - docker.io
  - registry.fedoraproject.org
  - registry.access.redhat.com
store:
  configFile: /etc/containers/storage.conf
  containerStore:
    number: 4
    paused: 0
    running: 0
    stopped: 4
  graphDriverName: overlay
  graphOptions:
    overlay.mountopt: nodev,metacopy=on
  graphRoot: /var/lib/containers/storage
  graphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "true"
  imageStore:
    number: 16
  runRoot: /var/run/containers/storage
  volumePath: /var/lib/containers/storage/volumes
version:
  APIVersion: 1
  Built: 1592482741
  BuiltTime: Thu Jun 18 12:19:01 2020
  GitCommit: b27df834c18b08bb68172fa5bd5fd12a5cd54633
  GoVersion: go1.13.3
  OsArch: linux/amd64
  Version: 2.0.0-dev

Package info (e.g. output of rpm -q podman or apt list podman):

Compiled from source.

Additional environment details (AWS, VirtualBox, physical, etc.):

The text was updated successfully, but these errors were encountered:

mheon · 2020-06-19T13:35:54Z

I'm fairly certain this is because runc/crun themselves basically freeze midway through container creation until they are notified by SD_NOTIFY. Podman is waiting for final confirmation from the runtime that the container has been successfully created, and it holding the container lock for the container in question while doing so. Changing that may be possible (it would unblock some things like podman ps) but the container is not considered running until the runtime finishes up, so we won't be able to podman exec into the container until after the notify issue has sorted itself.

goochjj · 2020-06-19T13:36:22Z

First fix #6689 - Advertise MAINPID via sd-notify, this eliminates the need for a PIDFile.

In addition, I think there should be a new option to podman run, something like:

--sdnotify conmon-only|container

With conmon-only, we send the MAINPID, and clear the NOTIFY_SOCKET so the OCI runtime doesn't pass it into the container. We could also advertise "ready" when the OCI runtime finishes to advertise the service as ready.

With container, we send the MAINPID, and leave the NOTIFY_SOCKET so the OCI runtime passes it into the container for initialization, and let the container advertise further metadata.

This removes the need for hardcoded CID and PID files in the command line, and the PIDFile directive, as the pid is advertised directly through sd-notify.

goochjj · 2020-06-19T13:37:42Z

I'm fairly certain this is because runc/crun themselves basically freeze midway through container creation until they are notified by SD_NOTIFY. Podman is waiting for final confirmation from the runtime that the container has been successfully created, and it holding the container lock for the container in question while doing so. Changing that may be possible (it would unblock some things like podman ps) but the container is not considered running until the runtime finishes up, so we won't be able to podman exec into the container until after the notify issue has sorted itself.

I'm less concerned with operations on the container in question (i.e. exec into the actual container) than I am things completely unrelated like podman info. Could always use nsenter to poke around the container and troubleshoot.

goochjj · 2020-06-19T13:44:36Z

@vrothberg @giuseppe see --sdnotify proposal above

I did MAINPID only since I thought that was least intrusive. With the above, "Container" would be the default as that's what it does now.

We could add "none" as well, to tell podman not to do anything with the NOTIFY_SOCKET. I just didn't see how systemd would get the correct PID from the container itself... Looks like runc attempts to send a MAINPID, and if it overrides us there's nothing we can do about it anyway... but with testing, whatever runc is sending doesn't work.

vrothberg · 2020-06-19T13:55:24Z

Thanks for the ping, @goochjj!

@giuseppe knows these details of the runtimes and I guess he will know what to do :)

giuseppe · 2020-06-19T14:56:01Z

I'm fairly certain this is because runc/crun themselves basically freeze midway through container creation until they are notified by SD_NOTIFY. Podman is waiting for final confirmation from the runtime that the container has been successfully created, and it holding the container lock for the container in question while doing so. Changing that may be possible (it would unblock some things like podman ps) but the container is not considered running until the runtime finishes up, so we won't be able to podman exec into the container until after the notify issue has sorted itself.

yes I confirm that. Both crun/runc stops on "$RUNTIME start" until the container notifies them.

Why does the container take so long to be ready? Does it actually send any READY=1 notification?

giuseppe · 2020-06-19T14:57:10Z

and also, any other info sent by the container is ignored, e.g. MAINPID. The only notification accepted by the runtimes is READY=1

goochjj · 2020-06-19T14:57:16Z

No, I'm purposely delaying it to test behavior. I don't think we can count on the container init being quick.

For instance, if I were to use this for my tomcat based web apps, it may take 30s-2min to initialize, and then I'd run health checks... I wouldn't want all of podman to be locked while that's happening.

goochjj · 2020-06-19T14:59:14Z

and also, any other info sent by the container is ignored, e.g. MAINPID. The only notification accepted by the runtimes is READY=1

@giuseppe Am I reading this wrong? It looks like at line 166 runc itself is sending the MAINPID.
https://github.com/opencontainers/runc/blob/9748b48742fad065666bcded00f2db5ded7e44c1/notify_socket.go#L140-L170

goochjj · 2020-06-19T15:01:44Z

If runc does send it, it's not making it to systemd, or systemd is throwing it out - at least in my testing.

giuseppe · 2020-06-19T15:14:41Z

``

@giuseppe Am I reading this wrong? It looks like at line 166 runc itself is sending the MAINPID.
https://github.com/opencontainers/runc/blob/9748b48742fad065666bcded00f2db5ded7e44c1/notify_socket.go#L140-L170

yes correct, runc is sending the MAINPID. The MAINPID sent by runc is the pid for the container

giuseppe · 2020-06-19T15:15:53Z

For instance, if I were to use this for my tomcat based web apps, it may take 30s-2min to initialize, and then I'd run health checks... I wouldn't want all of podman to be locked while that's happening.

the assumption is that the container is not ready until it notifies to be so. We should probably fix the healthchecks to raise an error immediately if the container is not running.

goochjj · 2020-06-19T15:49:09Z

``

@giuseppe Am I reading this wrong? It looks like at line 166 runc itself is sending the MAINPID.
https://github.com/opencontainers/runc/blob/9748b48742fad065666bcded00f2db5ded7e44c1/notify_socket.go#L140-L170

yes correct, runc is sending the MAINPID. The MAINPID sent by runc is the pid for the container

Except it doesn't seem to be....

runc --version
runc version 1.0.0-rc10+dev
commit: 2a0466958d9af23af2ad12bd79d06ed0af4091e2
spec: 1.0.2-dev

One screen: socat unix-recvfrom:/run/nottest,fork -
Another screen:
env NOTIFY_SOCKET=/run/nottest podman --log-level debug run --rm -it alpine sh

Another screen for i in /run/crun/*/notify/notify /run/runc/*/notify/notify.sock; do env NOTIFY_SOCKET=$i systemd-notify --ready; done

All that socat shows is "READY=1"

I don't know what the tinker.C part means, maybe it's a spec option or a command line option?

It appears that systemd DOES accept MAINPID multiple times. Each time MAINPID is sent it switches, throwing a cgroups warning if need be. (runc start is apparently forked off, not surprisingly)

I'm fine sending MAINPID at conmon launch, and again after runc start succeeds to reset it if the OCI runtime played tricks on us.

goochjj · 2020-06-19T15:50:38Z

For instance, if I were to use this for my tomcat based web apps, it may take 30s-2min to initialize, and then I'd run health checks... I wouldn't want all of podman to be locked while that's happening.

the assumption is that the container is not ready until it notifies to be so. We should probably fix the healthchecks to raise an error immediately if the container is not running.

I 100% agree with that. I also don't care if podman exec -it container_in_question blocks (except that ctrl-c doesn't work - that should probably be fixed). I DO have an issue with podman info blocking, and podman ps. Wasn't the point of this architecture vs. docker that containers were more independent and less susceptible to this sort of coupling between containers...

mheon · 2020-06-19T17:29:28Z

We take and hold a lock on the container in question for all major operations. This provides strong safety against races (two podman start on the same container launched at the same time will not be an issue). podman ps also grabs the lock (to get the current status of the container) and it looks like info does the same? Normally this isn't a problem, because every one of the locked operations takes less than a second, and then the lock releases and goes to another process. Unfortunately, the decision by the OCI runtime to not exit until SD_NOTIFY has succeeded means that podman start can hang for a very long time in that case. I'm still of the opinion that this is an OCI runtime issue, but it doesn't seem like one that runc has any intention of addressing.

goochjj · 2020-06-23T13:52:08Z

@mheon If you don't like how runc handles it, then it could be taken out of the OCI runtime's responsibility.

Have podman mask out NOTIFY_SOCKET from the OCI runtime, but pass it to conmon
Have conmon create a notify socket, intercept READY and pass it on to the host's notify socket, like runc/crun do now.
Have podman inject the bind mount to the socket conmon is proxying, and set NOTIFY_SOCKET properly in the container's env (not the runtime's env)

I imagine that would do the trick... But it seems hacky to implement behavior in podman because the OCI runtime doesn't do what we want.

It depends on what you think should happen - when you do a podman start, and the container isn't READY, has it started? I'd say yes. Because sd-notify is specifically meant to talk to SYSTEMD, not oci runtimes. The container started.

If you want to verify it's READY, you should use systemctl status to verify it's ready. Or, monitor the unix datagram yourself. IMHO the runtime and podman shouldn't block waiting for READY=1.

mheon · 2020-06-23T14:04:58Z

That is quite tempting, for more than one reason - we've had occasional problems with the implementation in runc having bugs and upstream not merging fixes for said bugs, so entirely removing this from the runtime lets us do this in a more sensible, nonblocking way and circumvent buggy runtimes.

I am a bit concerned about the complexity this would bring to conmon, and the difficulty of modifying the environment of the running container (going to be difficult doing that in a portable fashion).

goochjj · 2020-06-23T14:41:53Z

That is quite tempting, for more than one reason - we've had occasional problems with the implementation in runc having bugs and upstream not merging fixes for said bugs, so entirely removing this from the runtime lets us do this in a more sensible, nonblocking way and circumvent buggy runtimes.

I am a bit concerned about the complexity this would bring to conmon, and the difficulty of modifying the environment of the running container (going to be difficult doing that in a portable fashion).

I imagine it would end up in specgen somewhere - injecting into the container's ENV and Mounts. This is very similar to what I was playing with w.r.t. syslog proxying.

https://github.com/containers/conmon/pull/177/files

It was much longer than I thought it should be mainly because it's C and it's glib and it's nonblocking... And everything in conmon so far was based on stream sockets. If conmon gained the ability to do dgram sockets, it could do specific proxying for NOTIFY_SOCKET and generic proxying for things like /dev/log.

Ultimately I replaced it with an OCI hook shell script that was about 20 lines long, which seemed less complex.

I have less understanding of how best to inject mounts and env in libpod - perhaps in specgen somewhere... It would just be another bind mount that libpod handles (/etc/hosts, /etc/resolv.conf, /run/.containerenv, /run/notify) and it would bind to the exact same directory - userdata. It could even always be /run/notify in the container.

mheon · 2020-06-23T23:34:52Z

Huh. I was under the impression that the OCI runtime automatically hijacked SD_NOTIFY when it found it - if that's not the case, and we can do it ourselves, that's not bad at all. My main fear was that we'd have to trick the runtime into thinking there was no socket.

goochjj · 2020-06-24T00:16:03Z

Well, let's test. We all know what happens when runc and crun receives NOTIFY_SOCKET in its own environment. If runc and crun also intercepted the spec's environment... then this wouldn't happen:

podman run --rm -it -e NOTIFY_SOCKET=/tmp/notify alpine sh
/ # printenv NOTIFY_SOCKET
/tmp/notify
/ # ls -l /tmp
total 0

Which means I can do things myself with my own bind mount....

socat unix-recvfrom:/tmp/notify - &
# podman run --rm -it -e NOTIFY_SOCKET=/tmp/notify -v /tmp/notify:/tmp/notify fedora sh
sh-5.0# systemd-notify --ready
READY=1sh-5.0#

That's runc...

Crun:

# podman --runtime /opt/podman/bin/crun run --rm -it -e NOTIFY_SOCKET=/tmp/notify alpine sh
/ # ls -l /tmp
total 0
/ # printenv NOTIFY_SOCKET
/tmp/notify
/ #

# podman --runtime /opt/podman/bin/crun run --rm -it -e NOTIFY_SOCKET=/tmp/notify -v /tmp/notify:/tmp/notify fedora sh
sh-5.0# systemd-notify --ready
READY=1sh-5.0# exit

Also runc and crun only ever pass READY=1, but I don't see any reason why we couldn't also pass STATUS,RELOADING, STOPPING, ERRNO, or WATCHDOG items.. I assume they're not passed now just because they aren't frequently implemented. I think we'd just need to decide which variables conmon owns (MAINPID being the obvious one. conmon could block MAINPID from the container and only ever send its own). Does it make sense for conmon to set up the watchdog for itself? etc.

I can work up a PoC and we can go from there - probably the first task would be to just get the 3 modes I did in the PR working without the OCI runtime doing any proxying, and only dealing with MAINPID and READY=1. Future expansion could build off of that. Doing it that way should solve the podman locking "problem" introduced by the OCI runtime.

goochjj · 2020-06-24T18:49:39Z

@mheon Given containers/conmon#182, the only changes to libpod are goochjj#1 to handle SD-NOTIFY ourselves

goochjj · 2020-06-24T18:49:54Z

(assuming the other sd-notify work as a baseline)

github-actions · 2020-07-25T00:11:56Z

A friendly reminder that this issue had no activity for 30 days.

rhatdan · 2020-07-25T11:19:04Z

@goochjj @mheon @giuseppe Is this still an issue?

goochjj · 2020-07-25T15:10:13Z

Yes, it is still an issue. The OCI runtime blocks when using sd-notify (container), and podman freezes because the starting container holds the lock.

I've provided a solution.
containers/conmon#182
goochjj#1

rhatdan · 2020-10-07T17:19:08Z

Any progress on this @goochjj ?

goochjj · 2020-10-26T13:44:04Z

It's all in #7607 - which was pending conmon's changes making it into the test environments.

rhatdan · 2020-10-27T18:37:33Z

Any idea if the new conmon is there now?

rhatdan · 2020-12-24T16:29:42Z

Still waiting for this to get merged.

github-actions · 2021-01-25T00:49:07Z

A friendly reminder that this issue had no activity for 30 days.

rhatdan · 2021-01-25T15:04:22Z

Still waiting on this one.

github-actions · 2021-03-08T00:17:18Z

A friendly reminder that this issue had no activity for 30 days.

rhatdan · 2021-03-08T19:14:18Z

Since Conmon is merged, I am going to assume this is fixed. Reopen if I am mistaken.

flyingflo · 2021-07-16T11:55:34Z

Are you sure this is fixed with conmon, only? Don't we need #8508 too?
I'm asking, because I still see this behaviour on F34 with

podman 3.2.2
conmon 2.0.27
crun 0.20.1

I'm trying to get a sd_notify aware service in a podman container to play nicely as a Type=notify systemd unit, preferably with watchdog and status text. I would love this kind of "perfect" integration.

rhatdan · 2021-07-16T12:03:32Z

Do you have a easy reproducer? We need an example of a program that uses sd-notify that we could try.

flyingflo · 2021-07-16T12:11:59Z

I just threw my unpolished experiment into a repo: https://github.com/flyingflo/podman_sdnotify
It also works with a bash script calling systemd-notify, but I used python later, because the PID stays the same.

rhatdan · 2021-07-16T12:59:57Z

I am not sure how this is supposed to work, but the first example I tried got connection refused, with container, then I changed to conmon type and I get.

× sdnotify_py.service - Podman container-sdnotify_py_ctr.service
     Loaded: loaded (/usr/lib/systemd/system/sdnotify_py.service; disabled; vendor preset: disabled)
     Active: failed (Result: exit-code) since Fri 2021-07-16 08:57:12 EDT; 28s ago
       Docs: man:podman-generate-systemd(1)
    Process: 186726 ExecStartPre=/bin/rm -f /run/container-sdnotify_py_ctr.pid /run/container-sdnotify_py_ctr.ctr-id (code=exited, status=0/SUCCESS)
    Process: 186897 ExecStart=/usr/bin/podman run --conmon-pidfile /run/container-sdnotify_py_ctr.pid --cidfile /run/container-sdnotify_py_ctr.ctr-id --cgroups=split --sdnotify=conmon -d --replace -a stderr -a stdout --log-driver=journal>
    Process: 187055 ExecStopPost=/usr/bin/podman rm --ignore -f --cidfile /run/container-sdnotify_py_ctr.ctr-id (code=exited, status=219/CGROUP)
   Main PID: 186897 (code=exited, status=1/FAILURE)
        CPU: 566ms

Jul 16 08:57:12 localhost.localdomain conmon[186897]:   File "/app/sdnotify_py.py", line 15, in <module>
Jul 16 08:57:12 localhost.localdomain conmon[186897]:     n = sdnotify.SystemdNotifier(debug=True)
Jul 16 08:57:12 localhost.localdomain conmon[186897]:   File "/usr/local/lib/python3.9/site-packages/sdnotify/__init__.py", line 37, in __init__
Jul 16 08:57:12 localhost.localdomain conmon[186897]:     if addr[0] == '@':
Jul 16 08:57:12 localhost.localdomain conmon[186897]: TypeError: 'NoneType' object is not subscriptable
Jul 16 08:57:12 localhost.localdomain podman[186926]: 2021-07-16 08:57:12.67782836 -0400 EDT m=+0.058939985 container died ceee2ce16ccfd607d76ad7eecfcb487ed3cb84a990dbe5f2a04232b49ddf535c (image=localhost/sdnotify_py:latest, name=sdnotif>
Jul 16 08:57:12 localhost.localdomain podman[186926]: 2021-07-16 08:57:12.771301398 -0400 EDT m=+0.152413006 container cleanup ceee2ce16ccfd607d76ad7eecfcb487ed3cb84a990dbe5f2a04232b49ddf535c (image=localhost/sdnotify_py:latest, name=sdn>
Jul 16 08:57:12 localhost.localdomain systemd[1]: sdnotify_py.service: Main process exited, code=exited, status=1/FAILURE
Jul 16 08:57:12 localhost.localdomain systemd[1]: sdnotify_py.service: Control process exited, code=exited, status=219/CGROUP
Jul 16 08:57:12 localhost.localdomain systemd[1]: sdnotify_py.service: Failed with result 'exit-code'.

rhatdan · 2021-07-16T13:23:58Z

@haircommander Any idea what is going on here. It looks like conmon is not listening on the notify socket?

haircommander · 2021-07-16T14:09:07Z

gonna be honest, I do not understand the sd notify code haha

@flyingflo are you able to test using that PR? the phrasing "blocked by oci runtime" in the PR implies there is a missing link that we may need...

rhatdan · 2021-07-16T15:16:59Z

Actually @flyingflo your example is working for me. Although you python script is failing after the first sdnotify.

# Inform systemd that we've finished our startup sequence...
n = sdnotify.SystemdNotifier(debug=True)
n.notify("READY=1\nSTATUS=Started")

This looks successful, then systemd or conmon drops the connection

count = 1
while True:
    print("Running... {}".format(count))
    try:
        n.notify("STATUS=Count is {}".format(count))

This notify fails, because systemd or conmon has dropped the connection.

Does this test work differently if you don't run it in a container. Is conmon dropping the connection when it should not?

flyingflo · 2021-07-16T16:54:51Z

With my test, I can clearly observe the behaviour in this actual issue.
The crun process sticks around (observed with systemctl status) until the python script sends READY after 10s.
During this time podman is blocked completely (e.g. podman ps), which is what this issue is about, if I understood.

The second observation is that all writes to the socket fail, after the first write with READY=1, after which crun terminates.
I believe this is because what I see is crun proxying the socket, not conmon.
The intention of @goochjj in #8508 and containers/conmon#182 is that conmon proxies the socket and that the OCI runtime is circumvented.
I'm not sure if that is correct, because I'm looking into this the first time, but I think we need #8508 before containers/conmon#182 is in effect with podman, because it prepares the environments for conmon and the OCI runtime appropriately.

mheon · 2021-07-16T17:03:43Z

Wait - #8508 isn't merged yet?

@vrothberg Didn't we just swap generate systemd to use notify by default? If so, and #8508 is not merged, I think we need an immediate revert of that. The behavior of sdnotify when the OCI runtime is used is not sane.

flyingflo · 2021-07-16T17:15:05Z

I just assumed it is not merged because of the #8508 's state, no further investigation.

For me Type=notify works with podman's --sdnotify=container but only for the READY message, because all others don't get through after the runtime goes away, and --sdnotify=conmon works too, but there is no NOTIFY_SOCKET set in the container (I guess that is what it should do).

rhatdan · 2021-07-16T17:27:05Z

I just updated an pushed it. This PR has been floundering for a while, I guess I will need to pay more attention to it.

flyingflo · 2021-07-16T20:50:33Z

Jul 16 08:57:12 localhost.localdomain conmon[186897]: File "/app/sdnotify_py.py", line 15, in
Jul 16 08:57:12 localhost.localdomain conmon[186897]: n = sdnotify.SystemdNotifier(debug=True)
Jul 16 08:57:12 localhost.localdomain conmon[186897]: File "/usr/local/lib/python3.9/site-packages/sdnotify/init.py", line 37, in init
Jul 16 08:57:12 localhost.localdomain conmon[186897]: if addr[0] == '@':
Jul 16 08:57:12 localhost.localdomain conmon[186897]: TypeError: 'NoneType' object is not subscriptable

@rhatdan this is what you get if the env var NOTIFY_SOCKET is not set in the container. Which is the case with --sdnotify=conmon.

vrothberg · 2021-08-16T13:22:05Z

Wait - #8508 isn't merged yet?

@vrothberg Didn't we just swap generate systemd to use notify by default? If so, and #8508 is not merged, I think we need an immediate revert of that. The behavior of sdnotify when the OCI runtime is used is not sane.

By default, it's setting --sdnotify=conmon where everything's handled on the Podman-side of things.

mheon · 2021-08-16T13:31:36Z

OK Good. We're set then.

openshift-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Jun 19, 2020

goochjj mentioned this issue Jun 19, 2020

Implement --sdnotify cmdline option to control sd-notify behavior #6693

Merged

goochjj mentioned this issue Jun 24, 2020

Refactor I/O and add SD_NOTIFY proxy support containers/conmon#182

Merged

github-actions bot added the stale-issue label Jul 25, 2020

goochjj mentioned this issue Jul 27, 2020

[RFE] Integrate sd_notify with podman health checks #6160

Closed

rhatdan removed the stale-issue label Dec 24, 2020

github-actions bot added the stale-issue label Jan 25, 2021

rhatdan removed the stale-issue label Jan 25, 2021

umohnani8 assigned rhatdan Feb 5, 2021

github-actions bot added the stale-issue label Mar 8, 2021

rhatdan closed this as completed Mar 8, 2021

github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 21, 2023

github-actions bot locked as resolved and limited conversation to collaborators Sep 21, 2023

systemd sd-notify freezes podman #6688

systemd sd-notify freezes podman #6688

Comments

goochjj commented Jun 19, 2020

mheon commented Jun 19, 2020

goochjj commented Jun 19, 2020

goochjj commented Jun 19, 2020

goochjj commented Jun 19, 2020

vrothberg commented Jun 19, 2020

giuseppe commented Jun 19, 2020

giuseppe commented Jun 19, 2020

goochjj commented Jun 19, 2020

goochjj commented Jun 19, 2020

goochjj commented Jun 19, 2020

giuseppe commented Jun 19, 2020

giuseppe commented Jun 19, 2020

goochjj commented Jun 19, 2020

goochjj commented Jun 19, 2020 • edited Loading

mheon commented Jun 19, 2020

goochjj commented Jun 23, 2020

mheon commented Jun 23, 2020

goochjj commented Jun 23, 2020

mheon commented Jun 23, 2020

goochjj commented Jun 24, 2020 • edited Loading

goochjj commented Jun 24, 2020

goochjj commented Jun 24, 2020

github-actions bot commented Jul 25, 2020

rhatdan commented Jul 25, 2020

goochjj commented Jul 25, 2020

rhatdan commented Oct 7, 2020

goochjj commented Oct 26, 2020

rhatdan commented Oct 27, 2020

rhatdan commented Dec 24, 2020

github-actions bot commented Jan 25, 2021

rhatdan commented Jan 25, 2021

github-actions bot commented Mar 8, 2021

rhatdan commented Mar 8, 2021

flyingflo commented Jul 16, 2021

rhatdan commented Jul 16, 2021

flyingflo commented Jul 16, 2021

rhatdan commented Jul 16, 2021

rhatdan commented Jul 16, 2021

haircommander commented Jul 16, 2021

rhatdan commented Jul 16, 2021

flyingflo commented Jul 16, 2021

mheon commented Jul 16, 2021

flyingflo commented Jul 16, 2021

rhatdan commented Jul 16, 2021

flyingflo commented Jul 16, 2021

vrothberg commented Aug 16, 2021

mheon commented Aug 16, 2021

goochjj commented Jun 19, 2020 •

edited

Loading

goochjj commented Jun 24, 2020 •

edited

Loading