-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support node exclusive allocations #4245
Conversation
It occurs to me I should get @trws and @ryanday36's blessing on the end result here, i.e. that |
That sounds about like what I was hoping for. Thanks @grondo! |
Great! Thanks @ryanday36 and @trws! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just had one comment, not a show stopper either. Great work here!
I did give this a manual poke and LGTM.
Truth be known, I seem to type -N
without -n
a lot, despite knowing better. It's hard to unlearn that muscle memory. So at minimum this change is going reduce reduce my annoyed-with-self time.
src/common/librlist/rlist.c
Outdated
static struct rlist *rlist_alloc_nnodes (struct rlist *rl, | ||
struct rlist_alloc_info *ai) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice cleanup!
Not sure if this is feasible, but in functions where the struct rlist_alloc_info
parameter is not modified, can it be declared const
? Parameters that were obviously "in" before could now be sneakily "in/out", so the extra declaration could clarify that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that is a good suggestion!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I've made that change and force-pushed the result. Thanks!
Problem: Passing allocation parameters like nnodes, nslots, and slot size on as separate parameters to rlist allocation functions makes extensibility difficult. Add a new rlist_alloc_info structure which contains allocation parameters for rlist allocation functions. Refactor code internally to take advantage of the new structure.
Problem: librlist doesn't expose a public allocation function that takes the more convenient struct rlist_alloc_info structure. Add rlist_alloc_ex() which offers an alternative to rlist_alloc() (so that all callers do not need to be immediately updated.). Since job constraints are (optionally) provided in the alloc_info argument to rlist_alloc_ex(), handle constraints directly in this function, instead of requiring the caller to filter the rlist by constraint before calling rlist_alloc().
Problem: rlist_alloc_ex() is a preferred interface over rlist_alloc(), but there are still a lot of users of the older rlist_alloc() interface. Add rl_alloc() wrapper for rlist_alloc_ex() to test/rlist.c and switch all callers. This moves the unit test away from the older rlist_alloc() and begins to exercise the preferred rlist_alloc_ex() interface.
Problem: If rnode_match_validate() fails, rlist_copy_constraint() returns an error without errno set. This is inconvenient because functions like flux_respond_error() etc. expect a nonzero errno. Set errno to EINVAL if rnode_match_validate() fails in rlist_copy_constraint().
Problem: rlist_alloc() no longer has any callers Rename rlist_alloc_ex() to rlist_alloc() and remove the latter. Update all callers.
Problem: Jobspec v1 supports an exclusive flag for node exclusive allocations, but rlist_alloc() does not have any support for an exclusive flag. Add an exclusive flag to `struct rlist_alloc_info`. Support exclusive node allocation in rlist_alloc() via the new flag, but only when there is an explicit request for a number of nodes.
Problem: None of the librlist unit tests exercise the exclusive node allocation option of rlist_alloc(). Add a small set of exclusive node allocation tests to librlist/test/rlist.c
Problem: Jobspec v1 supports an exclusive flag, but the libjj jobspec reader doesn't support reading that flag from jobspec. Add an exclusive flags in the "jj_counts" structure returned by libjj_get_counts() and libjj_get_counts_json().
Problem: The sched-simple test program jj-reader doesn't emit the exclusive flags along with the other data gleaned from the provided test jobspec, and thus cannot be used to ensure the exclusive flag is detected. Print the value detected for the exclusive flag in the jj-reader output. Update existing test.
Problem: Jobspec from_*_command() constructors do not have a way to specify the exclusive flag to request nodes exclusively. Add an exclusive flag to all constructors, which defaults to False. Since the exclusive flag must be specified on a node resource vertex, raise an error if the exclusive flag is requested without also specifying an explicit node count.
Problem: The jobspec reader in sched-simple supports reading an exclusive node flag from jobspec, but sched-simple doesn't pass this flag to the librlist allocator, so effectively the exclusive flag is not supported. Pass the exclusive flag from jobspec to rlist_alloc() in sched-simple so that exclusive node requests are supported.
Problem: Jobspec v1 supports an exclusive flag to request exclusive allocation of nodes, but flux-mini commands have interface to set this flag in generated jobspec. Add an --exclusive flag to flux-mini which simply passses the flag down to the various Jobspec from_command() constructors.
Problem: A test in the testsuite ensure that flux mini run --nodes=2 fails due to a "node count cannot be greater than task count" error, but soon this usage will be allowed. (ntasks will default to nnodes with exclusive flag set) Remove the test to avoid breaking the testsuite.
Problem: It is inconvenient to require the specification of both ntasks and nnodes when a user wants one task/slot per node. Until recently, it was not possible to handle this in a coherent manner, though, so the Python Jobspec class and flux-mini commands throw an error whenever the node count is greater than the number of requested tasks/slots. Now that node exclusivity can be set in the jobspec, though, it is possible to set ntasks/slots to the number of nodes (when ntasks is not explicitly set), by also defaulting the node exclusive flag to True for this case. This allows `flux mini run -N4 command` to work consistently regardless of whether or not the enclosing instance defaults to node exclusive allocation. Fixes flux-framework#4228
Problem: No tests in the testsuite exercise the flux-mini --exclusive flag or implicit exclusive node requests when --nodes is used without --ntasks or --nslots. Add a set of tests to the flux-mini sharness scripts that exercise expected behavior for flux-mini --exclusive, both explicit and implicit use.
Problem: The flux-mini(1) --exclusive option is not documented in the man page. Add documentation for the --exclusive option. Also mention that the exclusive option will default to true when --nodes is used without --ntasks or --nslots.
Codecov Report
@@ Coverage Diff @@
## master #4245 +/- ##
==========================================
+ Coverage 83.51% 83.55% +0.03%
==========================================
Files 386 386
Lines 64659 64719 +60
==========================================
+ Hits 54003 54077 +74
+ Misses 10656 10642 -14
|
MWP? |
Yes, setting MWP now. |
This PR adds an
--exclusive
flag in theflux-mini
set of commands, which only works when-N, --nodes
is also specified, and sets theexclusive
key totrue
on the node vertex in jobspec. Support for node exclusive requests is added to sched-simple (mainly for testing), and finally the number of tasks/slots influx mini
commands is defaulted to the number of nodes, and theexclusive
flag enabled, when number of tasks is not otherwise specified on the command line (fixes #4228).As noted above, the new
--exclusive
flag only works if-N, --nodes
is also specified. I was having trouble deciding if Jobspec V1 could even support setting the exclusive flag on nodes without also specifying the number of required nodes, though I admittedly didn't spend too much time on it. The current approach addresses the main concern here, which was defaulting ntasks/slots to nnodes in a coherent manner, so that seemed good enough for now.There is a difference in behavior between Fluxion and sched-simple when allocating nodes exclusively. Fluxion will emit R with only the minimum resources required by the jobspec on each node (but all resources on the node will show as allocated), whereas sched-simple currently emits all exclusive node resources in the emitted R. Thus, for sched-simple:
whereas with Fluxion:
I'm not sure, but this seems like something we'll want to fix. Though I may be missing some subtlety here, it seems like it would be surprising behavior if users ask for nodes exclusively but only get one core from each node in their child instance. @dongahn?