-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
configure system instance to require every job be a new instance #4214
Comments
Yes, if I remember right, reducing the load at the system instance was the main motivation.
Can we add a key to |
We could, though nothing prevents a user from setting this key themselves, e.g. |
Or maybe the validator can look at the task section and check if that invokes flux broker or has |
Ok, so a system instance validator plugin could employ the following couple of checks: If any check is true then the job is validated for submission to the instance:
I think you are right this should cover 99% of cases. Possibly we should distribute the validator plugin outside of flux-core in case there are some corner cases the plugin could be updated on-the-fly. Here's a proof of concept plugin: """Only allow jobs that launch a new instance of Flux
"""
import errno
from os.path import basename
from flux.job.validator import ValidatorPlugin
class Validator(ValidatorPlugin):
def validate(self, args):
if "batch" in args.jobspec["attributes"]["system"]:
return
command = args.jobspec["tasks"][0]["command"]
arg0 = basename(command[0])
if arg0 == "flux" and command[1] in ["broker", "start"]:
return
return (
errno.EINVAL,
"Direct job submission is disabled for this instance."
+ " Please use the batch or alloc subcommands of flux-mini(1)",
) $ flux mini run hostname
flux-mini: ERROR: [Errno 22] Direct job submission is disabled for this instance. Please use the batch or alloc subcommands of flux-mini(1)
$ flux mini alloc -n1 hostname
fluke108
[detached: session exiting] |
So fast! |
Problem: It might be useful for a system instance to reject jobs which are not themselves new instances of Flux, i.e. either batch jobs or jobs that run `flux start` or `flux broker`. However, Flux does not provide a way to do this. Add a very simple `require-instance` job validator which attempts to reject all jobs that do not create a new instance of Flux. It does this by looking for either the "batch" system attribute in jobspec (which will cause the job shell to start a new instance), or a command that starts with "flux start" or "flux broker". If necessary, this plugin could be copied and modified in environments that want to be slightly more permissive, or allow other command arguments that might result in a new instance. However, at least Flux will provide an example from which to base new validators. Fixes flux-framework#4214
Problem: It might be useful for a system instance to reject jobs which are not themselves new instances of Flux, i.e. either batch jobs or jobs that run `flux start` or `flux broker`. However, Flux does not provide a way to do this. Add a very simple `require-instance` job validator which attempts to reject all jobs that do not create a new instance of Flux. It does this by looking for either the "batch" system attribute in jobspec (which will cause the job shell to start a new instance), or a command that starts with "flux start" or "flux broker". If necessary, this plugin could be copied and modified in environments that want to be slightly more permissive, or allow other command arguments that might result in a new instance. However, at least Flux will provide an example from which to base new validators. Fixes flux-framework#4214
(I thought we had captured this requirement somewhere, but I'm unable to find it at the moment)
We've talked about somehow enforcing that all jobs submitted to a system instance are new instances of Flux, i.e. they are submitted with jobspecs equivalent to
flux mini batch
orflux mini alloc
. Some of the reasons for this requirement include:srun
in the batch partition just fine if you are patient enough)Given that the second two bullets are not really true anymore, I wonder if this is still a requirement. It does seem like the first bullet has enough benefits that we might want to at least consider implementing a solution.
On the other hand, I'm not sure there is a practical way to determine that a submitted jobspec definitely results in a new instance of Flux. So this needs a little more thought.
The text was updated successfully, but these errors were encountered: