Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

local connector owner option may be incorrect under flux-proxy(1) #5530

Closed
Tracked by #5531
grondo opened this issue Nov 3, 2023 · 3 comments
Closed
Tracked by #5531

local connector owner option may be incorrect under flux-proxy(1) #5530

grondo opened this issue Nov 3, 2023 · 3 comments

Comments

@grondo
Copy link
Contributor

grondo commented Nov 3, 2023

This is the same issue as described in #4648, but I've decided to open a new issue instead of reopening that one.

The motivating use case here is the execution of a new Flux instance under the system instance which also emulates a system instance for use as a DAT job. That is, a batch or alloc job would be run as the flux user, configured as a multi-user capable instance, and access given to a subset of users.

Users would have to flux proxy into the job to interact with it, and this is not possible currently because flux proxy results in the owner attribute of the connector to be set to the current user (as noted in #4648) and thus jobs are submitted with sign-type=none.

Additionally, it does seem like a bug that the local connector thinks the instance owner it is connected to is the same uid.

@garlick
Copy link
Member

garlick commented Nov 3, 2023

The local connector is using the SO_PEERCRED peer identity (via usock_get_cred()) of the proxy command to get the owner. It may need to grow some smarts to detect when the other end of the socket is not the broker, and then return FLUX_USERID_UNKNOWN for the owner handle option.

Then the flux_job_submit() would see that getuid() != owner and use proper signing.

@grondo
Copy link
Contributor Author

grondo commented Nov 3, 2023

But then any time flux proxy is used the signatures would be required.

Is too onerous to fetch the security.owner attribute from the instance and cache the result in the handle (or are attributes already cached)? This could be done only if usock_get_cred() reports cred.uid == getuid(). I guess one issue here is that the instance could lie about the value of attributes, whereas SO_PEERCRED can be trusted since it is provided by the OS. But I'm struggling to think of what kind of vulnerability could be introduced by this.

Actually, flux-proxy is already doing an owner check, I wonder if there's some way to capitalize off that...

I guess the other thing to try is to see the impact of just always signing jobspec.

@garlick
Copy link
Member

garlick commented Nov 4, 2023

Hmm, maybe we could have the broker and flux-proxy both store the owner userid in an environment variable like FLUX_OWNER. Then the connector could use SO_PEERCRED to get the peer pid. If the peer uid matches getuid() then pull FLUX_OWNER out of the peer pid's environment and check that?

Edit: oh duh, or just check FLUX_OWNER in the inherited environment :-)

garlick added a commit to garlick/flux-core that referenced this issue Nov 6, 2023
Problem: the SO_PEERCRED method of determing the flux owner
to decide whether job requests must be signed using a strong
(but slower) signing mech doesn't work with flux-proxy(1).

Check an environment variable FLUX_OWNER instead.  This can be set
- in broker subprocesses (job, initial program, flux-exec(1))
- in a proxy shell by flux-proxy(1) after checking the security.owner
  broker attribute

As noted in the code comment, signing with mech=none enables single
user Flux instances that were compiled --with-security (as would likely
be the case for all packaged flux deployments) to function without
configuring munge.

In addition, each munge_encode(3) and munge_decode(3) call includes a
synchronous request-response round-trip to munged(8), which has a minor
impact on job throughput and thus should be avoided when unnecessary.

Fixes flux-framework#5530
garlick added a commit to garlick/flux-core that referenced this issue Nov 6, 2023
Problem: flux-proxy(1) checks security.owner and, without --force,
fails with an error if the owner does not match getuid().

Drop this code as the reason for it no longer exists.  flux_job_submit()
now checks security.owner instead of relying on the unix domain peer
credential, so it works as expected over flux-proxy.

Update sharness test.

Fixes flux-framework#5530
garlick added a commit to garlick/flux-core that referenced this issue Nov 6, 2023
Problem: flux-proxy(1) refuses to connect as a guest without --force.

Drop this code as the reason for it no longer exists.  flux_job_submit()
now checks security.owner instead of relying on the unix domain peer
credential, so it works as expected over flux-proxy.

Update sharness test.

Fixes flux-framework#5530
garlick added a commit to garlick/flux-core that referenced this issue Nov 6, 2023
Problem: flux-proxy(1) refuses to connect as a guest without --force.

Drop this code as the reason for it no longer exists.  flux_job_submit()
now checks security.owner instead of relying on the unix domain peer
credential, so it works as expected over flux-proxy.

Update sharness test.

Fixes flux-framework#5530
grondo pushed a commit to garlick/flux-core that referenced this issue Nov 7, 2023
Problem: flux-proxy(1) refuses to connect as a guest without --force.

Drop this code as the reason for it no longer exists.  flux_job_submit()
now checks security.owner instead of relying on the unix domain peer
credential, so it works as expected over flux-proxy.

Update sharness test.

Fixes flux-framework#5530
@mergify mergify bot closed this as completed in 543f834 Nov 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants