-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for broker rank-based partial release #1163
Conversation
274f631
to
dc1b519
Compare
d51891a
to
5ceb23f
Compare
87600df
to
e05db43
Compare
791c2c8
to
f4e5900
Compare
@zekemorton, @trws, @jameshcorbett: please take a look at the current state when you get a chance. Getting your feedback soon will help me make sure this PR is on the right track. |
318281d
to
dfec30c
Compare
Problem: resource-query doesn't have partial cancel functionality. Add the necessary functions, help information, and input parsing to enable sharness tests.
Problem: there are no sharness CI tests for partial cancel/release. Add tests for correct behavior.
@trws: I've addressed your concern in |
Works for me, if it ever comes up as a perf concern we can deal with it then. Great job @milroy, love the cleanup you did along the way. |
Feel free to land this one, I'll re-up the reformat and then fix the check requirements so we don't have to rerun all the checks here. |
Since this was also asked in slack, here's a small script to wait for the import sys
import flux
from flux.job import JobID, event_watch_async
found = False
handle = flux.Flux()
def event_cb(future, jobid):
event = future.get_event()
print(f"{jobid.f58}: {event}")
if event.name == "start":
found = True
handle.reactor_stop()
for jobid in sys.argv[1:]:
jobid = JobID(jobid)
event_watch_async(handle, jobid).then(event_cb, jobid)
handle.reactor_run()
if not found:
print("start event for job never appeared", file=sys.stderr)
sys.exit(1) |
Oh cool, thanks @grondo! |
@milroy, ready for MWP? |
@trws, @milroy, @jameshcorbett, what do you think about tagging another flux-sched release once this goes in? Then we can get a working housekeeping setup on fluke for testing asap.. |
I'd be happy to see that happen. Just need @milroy's sign-off to push this in. |
I'm going to amend one commit to add REAPI CLI support for partial cancel as well. It should be ready within the hour. |
Problem: partial cancel functionality isn't available in the C REAPI. Add the interface functions for module and cli.
Problem: there is no partial cancel functionality available in the C++ REAPI for the CLI and module. Add the interface functions and the implementations for C++.
Problem: the resource module doesn't have support for partial cancellation. Add the callback for partial cancellation, and the logic to distinguish between a single partial cancellation and a sequence of partial cancellations that results in a full cancellation.
Problem: the callback that handles the `.free` RPC does not unpack the R string payload. Add the capability to unpack the R string and call the new partial remove function.
Problem: the qmanager base policy and derived-classes do not have partial cancellation functionality. Add a virtual remove function overload in base, and overrides in FCFS and backfill policies. In each case the policy needs to call the REAPI module partial cancel function, which means it can't reside in the base class. Consolidate logic in the `remove` function to call partial cancel and check if it fully removed the job's resources. If so, remove the job from the allocated, running maps and set the state to be `COMPLETE`. Do not enter the job into the completed map, because that will get popped and cancelled again in `cancel_completed_jobs`. Note that the sched loop needs to be cancelled and blocked jobs need to be reconsidered. Finally, resume the sched loop to continue scheduling jobs.
Problem: std::strings are currently constructed from char * input parameters and used to find resources and sub-planners, creating unnecessary overhead. Remove the extraneous constructions.
Problem: flux ion-resource does not support partial cancellation. Add support for sending partial cancel RPCs.
Problem: flux ion-resource does not have testsuite tests. Add them.
Problem: many range-based loops in Fluxion result in calling an object copy constructor. Avoid the copy constructor cost and add const type qualifier where appropriate.
Problem: partial cancel functionality changes the order of jobid3 and jobid4 start after cancellation of jobid2 in t1009-recovery-multiqueue. Add an OR condition to wait on jobid3 or jobid4 to start upon cancelling jobid2.
Fluxion issue #1151 and flux-core issue 4312 describe the need for partially releasing broker resources. The first implementation of the functionality is scoped to releasing all resources managed by a single broker rank per RPC. Elasticity considerations will require arbitrary release of resources, but this capability may only be needed in the scheduler.
In its current WIP state, this PR adds capability to identify all
boost
graph vertices managed by a broker rank in the RV1 reader. The reader builds and returns a map keyed by the resource types, with values corresponding to the number of resources of that type to be removed. The cancellation of the vertices occurs in the RV1 reader, and the partial cancellation an pruning filter updates occurs in the traverser. Note that the map is under-specified relative to number of resources in the resource graph. The traverser uses vectors of types and counts that are ordered by their visit order. Since the reader may unpack and find the boost vertices in any order (and lookups need to be fast) a map is the better choice for container. The complexity of translating between map and vector containers was added to the planners in PR #1061.The
planner
andplanner_multi
need modifications to handle span reduction based on the assembled reduction counts and types. They must deal with0
entries for reduction, correctly treat a sequence of span reductions as a single span removal. A single partial cancellation that contains allplanner
orplanner_multi
resources must behave like a full span removal. The planners must also return whether the span was totally removed to the client.The traverser begins a depth-first visit that stops as soon as it encounters a vertex untagged and cancelled by the RV1 reader. The traverser needs a new
enum class
to designate job modification traversal types. If the type isPARTIAL_CANCEL
, the traverser must only untag and erase thejobid
if the planner indicates that the removal constitutes a full removal.The PR adds support in
qmanager
for unpackingR
and calling thecancel
REAPI module function in policy derived classes. It avoids locking up the queues upon partial or full cancellation.Items remaining to be completed to remove the WIP tag: