Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor I/O and add SD_NOTIFY proxy support #182

Merged
merged 1 commit into from
Sep 16, 2020

Conversation

goochjj
Copy link
Contributor

@goochjj goochjj commented Jun 24, 2020

Refactored all the conn_sock functionality to be more generic. It can deal
with different types of sockets, stream vs dgram, and reuses all the same
callbacks, shutdown and async functionality.

Conmon creates a notify socket which podman bind-mounts into the container,
and passes in via the spec's environment variables. Conmon relays the
READY=1 signal. This is similar to what runc and crun do, but doing it in
conmon and NOT passing NOTIFY_SOCKET to the OCI runtime allows us to start
up properly without runc and crun blocking on the "start" command.

It would also be trivial to add more proxied sockets, i.e. the /dev/log
proof of concept I did would now be super easy, if we wanted to revisit that.

Signed-off-by: Joseph Gooch [email protected]

@goochjj
Copy link
Contributor Author

goochjj commented Jun 24, 2020

@haircommander PTAL

This is another take on what I had done w/ the syslog proxy. You had requested the console sockets and udp sockets share more code - I've done that by moving some flags and state into the structures used to manage the sockets. In addition this implements a SD-NOTIFY proxy using the same constructs.

SD-NOTIFY conversation from here containers/podman#6688

@goochjj goochjj force-pushed the socket-handling-revamp branch from e8747ba to 4032e7c Compare June 24, 2020 18:47
@goochjj goochjj force-pushed the socket-handling-revamp branch 2 times, most recently from 22741d9 to 70ecb8c Compare June 24, 2020 19:02
src/conmon.c Outdated Show resolved Hide resolved
@rhatdan
Copy link
Member

rhatdan commented Jun 25, 2020

I am a little worried about SELinux here, making sure that the container processes are still allowed to write the SD_NOTIFY socket. The path needs to be writeable from the container, and the listener has to be ok.
Most likely this will work, but we should test this on an SELinux enforced system, with the container writing to the socket.

@goochjj
Copy link
Contributor Author

goochjj commented Jun 25, 2020

I am a little worried about SELinux here, making sure that the container processes are still allowed to write the SD_NOTIFY socket. The path needs to be writeable from the container, and the listener has to be ok.
Most likely this will work, but we should test this on an SELinux enforced system, with the container writing to the socket.

Agreed, I need to revisit podman doing the bind mount - specifically setting the mount label appropriately.

Also even though I'm creating the socket as 777, it's getting reset to 750. Haven't figured that out yet.

@goochjj
Copy link
Contributor Author

goochjj commented Jun 25, 2020

I am a little worried about SELinux here, making sure that the container processes are still allowed to write the SD_NOTIFY socket. The path needs to be writeable from the container, and the listener has to be ok.
Most likely this will work, but we should test this on an SELinux enforced system, with the container writing to the socket.

libpod now relabels /run/notify

@goochjj
Copy link
Contributor Author

goochjj commented Jun 25, 2020

I am a little worried about SELinux here, making sure that the container processes are still allowed to write the SD_NOTIFY socket. The path needs to be writeable from the container, and the listener has to be ok.
Most likely this will work, but we should test this on an SELinux enforced system, with the container writing to the socket.

libpod now relabels /run/notify

And I get this

[589620.624683] audit: type=1400 audit(1593097271.023:4559): avc:  denied  { sendto } for  pid=1644255 comm="systemd-notify" path="/var/lib/containers/storage/overlay-containers/7393144d563f1d3acb7a33003cbc43077d62748484963a5d9184406c4a7589e9/userdata/notify/notify.sock" scontext=system_u:system_r:container_init_t:s0:c355,c461 tcontext=unconfined_u:system_r:container_runtime_t:s0 tclass=unix_dgram_socket permissive=0

Do I need to change the tcontext somehow? Suggestions?

src/conn_sock.c Outdated Show resolved Hide resolved
@goochjj
Copy link
Contributor Author

goochjj commented Jul 1, 2020

Note @rhatdan this does work because in podman I'm setting an appropriate label on the bind mount. Tested with and without selinux, root and rootless.

I did hardcode in the NOTIFY_SOCKET inside the container, if we want that to be optional - maybe an additional env variable of something like NOTIFY_SOCKET_INSIDE or something can indicate the destination?

src/conmon.c Outdated
@@ -84,6 +84,16 @@ int main(int argc, char *argv[])
exit(0);
}

char *notify_socket_path = getenv("NOTIFY_SOCKET");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason to expose this via env vs cli?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just that it's how the notify socket is usually passed from systemd et al - but podman is always our parent, so I'm fine with whatever.

Maybe --notify-socket-host and --notify-socket-destdir or something - that can get rid of the hardcoded path of /run/notify/notify.sock in the container, and instead have it be (notify-socket-destdir)/notify.sock in the container.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes I like that idea. then it can be exposed via cli on podman's side if the user wants to change it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we include the commit 52546c919be261e74847981b4db1227537c47345 into the commit that first introduced the change?

Also, I think we need a way to disable the notify socket to be handled by conmon, there are cases when it is helpful to use the handling in the OCI runtime. For example when the user needs $RUNTIME start $CTR to wait for the container to be ready.

src/conn_sock.c Outdated Show resolved Hide resolved
src/conn_sock.c Outdated Show resolved Hide resolved
src/conn_sock.c Outdated Show resolved Hide resolved
src/conn_sock.c Outdated Show resolved Hide resolved
src/conn_sock.h Outdated Show resolved Hide resolved
@haircommander
Copy link
Collaborator

sorry it took so long to finally review. all together it looks great. thank you very much for taking this on.

I'd also like a review from @giuseppe

@goochjj
Copy link
Contributor Author

goochjj commented Jul 1, 2020

Also FYI, even if the notify socket cmdline/env are passed.. right now podman mkdir's the notify folder because I have it set the mount label, and pass the mount label as a bind mount to the spec. If the calling process doesn't create that socket, but a NOTIFY_SOCKET is passed, it'll error with a cryptic error.

It was just an order of operations thing - that the runtime spec and mounts get done before conmon is called.

We could, instead, throw a nice error , like "notify socket passed but no folder created" or something.

Copy link
Member

@giuseppe giuseppe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the patch. I think it is a great idea but I've not managed to get it to work:

$ NOTIFY_SOCKET=/run/user/1000/systemd/notify podman --conmon ~/src/conmon/bin/conmon  run --rm -ti fedora sh -c 'printenv NOTIFY_SOCKET; systemd-notify --ready'
Error: container create failed (no logs from conmon): EOF

src/conn_sock.c Outdated Show resolved Hide resolved
src/conn_sock.c Outdated Show resolved Hide resolved
src/conmon.c Outdated
@@ -84,6 +84,16 @@ int main(int argc, char *argv[])
exit(0);
}

char *notify_socket_path = getenv("NOTIFY_SOCKET");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we include the commit 52546c919be261e74847981b4db1227537c47345 into the commit that first introduced the change?

Also, I think we need a way to disable the notify socket to be handled by conmon, there are cases when it is helpful to use the handling in the OCI runtime. For example when the user needs $RUNTIME start $CTR to wait for the container to be ready.

@goochjj
Copy link
Contributor Author

goochjj commented Jul 2, 2020

thanks for the patch. I think it is a great idea but I've not managed to get it to work:

$ NOTIFY_SOCKET=/run/user/1000/systemd/notify podman --conmon ~/src/conmon/bin/conmon  run --rm -ti fedora sh -c 'printenv NOTIFY_SOCKET; systemd-notify --ready'
Error: container create failed (no logs from conmon): EOF

@giuseppe you need goochjj/podman#1

@goochjj
Copy link
Contributor Author

goochjj commented Jul 2, 2020

@giuseppe re: commit, yes, I would squash this whole PR before merging.

@goochjj
Copy link
Contributor Author

goochjj commented Jul 2, 2020

@giuseppe Re: letting OCI runtime do Notify - I'm ok with that, but I think it basically means extending the --sdnotify option in podman.

from containers/podman#6693:

--sdnotify conmon-only|container|none

With "conmon-only", we send the MAINPID, and clear the NOTIFY_SOCKET so the OCI
runtime doesn't pass it into the container. We also advertise "ready" when the
OCI runtime finishes to advertise the service as ready.

With "container", we send the MAINPID, and leave the NOTIFY_SOCKET so the OCI
runtime passes it into the container for initialization, and let the container advertise further metadata.

The "none" option does what it's always done in the past - passes NOTIFY_SOCKET
to conmon's process and the OCI runtime processes, and does no manipulation or "help".

This removes the need for hardcoded CID and PID files in the command line, and
the PIDFile directive, as the pid is advertised directly through sd-notify.

Perhaps "container" should be split. I'm not sure into what and what. Maybe "ociruntime" and "podman". Then we'd have the groundwork to tweak the options passed to conmon.

The advantage of having conmon do the proxy is exactly to prevent CRUN/RUNC from blocking on start, to fix containers/podman#6688.

src/conmon.c Outdated Show resolved Hide resolved
@mheon
Copy link
Member

mheon commented Aug 17, 2020

@goochjj Can you rebase?

Let's see about getting this moved forward and merged, I'd love to have this in Podman 2.1.x

@goochjj goochjj force-pushed the socket-handling-revamp branch 2 times, most recently from 6a4784b to 1fc2dbf Compare August 18, 2020 11:22
@goochjj
Copy link
Contributor Author

goochjj commented Aug 18, 2020

@haircommander @mheon What's the deal with the tests...

@haircommander
Copy link
Collaborator

f28 and f29 are EOL, and the images haven't been updated. It is on my list this week to update them. sorry for the inconvenience

@goochjj
Copy link
Contributor Author

goochjj commented Aug 18, 2020

Heh no worries, as long as it's not a blocker :-D

I did run make fmt locally

@haircommander
Copy link
Collaborator

hey @goochjj CI is (mostly) fixed, please rebase!

@goochjj goochjj force-pushed the socket-handling-revamp branch from 1fc2dbf to 6270ce7 Compare August 21, 2020 23:01
@rhatdan
Copy link
Member

rhatdan commented Sep 11, 2020

@haircommander @goochjj Other then the static build fail above is this ready to go in? Would like to get this in to close a Podman issue.

@goochjj
Copy link
Contributor Author

goochjj commented Sep 11, 2020

Yep, should be.

@haircommander
Copy link
Collaborator

I think there are some unresolved review comments from @giuseppe

@goochjj
Copy link
Contributor Author

goochjj commented Sep 11, 2020

I think there are some unresolved review comments from @giuseppe

If so you'll need to tell me what needs to happen. Last I can see, I responded to @giuseppe and he didn't respond back.

@haircommander
Copy link
Collaborator

I'm sorry about the nitpicking @goochjj if you are done with carrying this PR, we can have someone else take over it. A lot relies on conmon and the testing is really subpar, so please excuse my anxiety

@goochjj
Copy link
Contributor Author

goochjj commented Sep 11, 2020

Let me go through and see if I can verify the resolved things

@goochjj
Copy link
Contributor Author

goochjj commented Sep 11, 2020

Resolved just about everything except the open "Should we use ENV or a command-line argument"

@goochjj goochjj force-pushed the socket-handling-revamp branch from e202333 to 388b405 Compare September 11, 2020 15:04
@goochjj
Copy link
Contributor Author

goochjj commented Sep 11, 2020

@haircommander As far as I know I've completed all I need to - let me know if there's something I need to look at, or you want to further discuss ENV vs cmdline. I can squash the PR whenever you want.

src/conmon.c Outdated Show resolved Hide resolved
@goochjj goochjj force-pushed the socket-handling-revamp branch from 2eb7590 to 6628524 Compare September 11, 2020 17:40
@haircommander
Copy link
Collaborator

test for cri-o regressions: cri-o/cri-o#4187

@haircommander
Copy link
Collaborator

hold this, there are some failures I need to investigate

@rhatdan
Copy link
Member

rhatdan commented Sep 15, 2020

@haircommander Whats up with this one?

src/conn_sock.c Outdated Show resolved Hide resolved
Refactored all the conn_sock functionality to be more generic. It can deal
with different types of sockets, stream vs dgram, and reuses all the same
callbacks, shutdown and async functionality.

Conmon creates a notify socket which podman bind-mounts into the container,
and passes in via the spec's environment variables.  Conmon relays the
READY=1 signal.  This is similar to what runc and crun do, but doing it in
conmon and NOT passing NOTIFY_SOCKET to the OCI runtime allows us to start
up properly without runc and crun blocking on the "start" command.

It would also be trivial to add more proxied sockets, i.e. the /dev/log
proof of concept I did would now be super easy, if we wanted to revisit that.

Signed-off-by: Joseph Gooch <[email protected]>
@goochjj goochjj force-pushed the socket-handling-revamp branch from 0a36f41 to 28799f9 Compare September 16, 2020 13:54
@haircommander
Copy link
Collaborator

LGTM assuming cri-o/cri-o#4187 passes
@giuseppe PTAL

Copy link
Member

@giuseppe giuseppe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@haircommander haircommander merged commit 59c2817 into containers:master Sep 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants