Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow privileged IMP to linger during flux-imp run to support signal forwarding #188

Merged
merged 3 commits into from
Nov 4, 2024

Conversation

grondo
Copy link
Contributor

@grondo grondo commented Nov 1, 2024

This PR changes the flux-imp run implementation to have the privileged IMP linger and delegate signals to its child, which the calling user would not otherwise have privilege to do.

WIP while I test this out locally.

Problem: Unlike `exec`, `flux-imp run` doesn't linger and instead
directly executes the target command. This then requires the use of
`flux-imp kill` to kill the privlieged process(es) that are a result.

Have the IMP fork() and exec() the run command. The lingering IMP
process can then forward signals delivered to it, including the use
of SIGUSR1 as a surrogate for SIGKILL.
garlick
garlick previously approved these changes Nov 1, 2024
Copy link
Member

@garlick garlick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well jeez, this looks straightforward also! I'll approve assuming your testing pans out!

@grondo
Copy link
Contributor Author

grondo commented Nov 2, 2024

Actually there's a couple issues still remaining here:

  • test is broken because the NO_CHAIN_LINT prereq isn't set. When I fix that the test doesn't work 🤔
  • the imp run implementation calls setuid (geteuid ()); setgid (setegid ()); before calling the run target, which means the instance isn't actually able to deliver signals to the IMP process (so signal forwarding doesn't actually work)

Problem: The IMP run implementation sets the real user and group id
of the process to the effective user and group id in the privileged
parent, but this makes it so that the invoking user can no longer
deliver signals to the IMP parent process.

Move the setuid()/setgid() calls to the child process just before
execve(2) is called. The parent IMP thereby maintains the real uid/gid
of the invoking user and can handle forwarding of signals from that
user to the invoked run command.

Since the parent IMP process no longer has a real userid of 0/root,
update the call that obtains the userid for setting USER and HOME
to use the effective uid.
Problem: There is no test that ensures the privileged IMP lingers to
handle signal delivery with `flux-imp run`.

Add a test to t2002-imp-run.t.
@grondo
Copy link
Contributor Author

grondo commented Nov 2, 2024

I had forgotten that flux-imp run set the real uid/gid to the effective uid/gid before executing the target command. This prevented signals from the calling uid from being delivered. I've moved the reset of the real uid/gid to the child process just before execve(2) and now signaling the flux-imp run process seems to be working.

I also figured out some issues with the test and that appears to now be working as well.

This could definitely use some more real world testing.

@grondo grondo changed the title WIP: allow privileged IMP to linger during flux-imp run to support signal forwarding allow privileged IMP to linger during flux-imp run to support signal forwarding Nov 2, 2024
@grondo
Copy link
Contributor Author

grondo commented Nov 2, 2024

Removed WIP. This seems to be working as designed now, and is much more robust than the flux-imp kill method. Probably would be good if @garlick verifies on his cluster before this gets merged though.

@garlick
Copy link
Member

garlick commented Nov 3, 2024

Excellent! Will try to get that done tomorrow morning.

Copy link
Member

@garlick garlick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried some basic sanity testing on my cluster, sending SIGTERM and SIGKILL to housekeeping with this, and everything worked! LGTM!

@grondo
Copy link
Contributor Author

grondo commented Nov 4, 2024

Thanks! I've set mwp

@mergify mergify bot merged commit d1245cb into flux-framework:master Nov 4, 2024
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants