Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for broker rank-based partial release #1163

Merged
merged 24 commits into from
Jul 10, 2024

Conversation

milroy
Copy link
Member

@milroy milroy commented Apr 8, 2024

Fluxion issue #1151 and flux-core issue 4312 describe the need for partially releasing broker resources. The first implementation of the functionality is scoped to releasing all resources managed by a single broker rank per RPC. Elasticity considerations will require arbitrary release of resources, but this capability may only be needed in the scheduler.

In its current WIP state, this PR adds capability to identify all boost graph vertices managed by a broker rank in the RV1 reader. The reader builds and returns a map keyed by the resource types, with values corresponding to the number of resources of that type to be removed. The cancellation of the vertices occurs in the RV1 reader, and the partial cancellation an pruning filter updates occurs in the traverser. Note that the map is under-specified relative to number of resources in the resource graph. The traverser uses vectors of types and counts that are ordered by their visit order. Since the reader may unpack and find the boost vertices in any order (and lookups need to be fast) a map is the better choice for container. The complexity of translating between map and vector containers was added to the planners in PR #1061.

The planner and planner_multi need modifications to handle span reduction based on the assembled reduction counts and types. They must deal with 0 entries for reduction, correctly treat a sequence of span reductions as a single span removal. A single partial cancellation that contains all planner or planner_multi resources must behave like a full span removal. The planners must also return whether the span was totally removed to the client.

The traverser begins a depth-first visit that stops as soon as it encounters a vertex untagged and cancelled by the RV1 reader. The traverser needs a new enum class to designate job modification traversal types. If the type is PARTIAL_CANCEL, the traverser must only untag and erase the jobid if the planner indicates that the removal constitutes a full removal.

The PR adds support in qmanager for unpacking R and calling the cancel REAPI module function in policy derived classes. It avoids locking up the queues upon partial or full cancellation.

Items remaining to be completed to remove the WIP tag:

  • Add RPC handling
  • Add REAPI support
  • Add resource module implementation
  • Add support in qmanager
  • Add tests in testsuite

@milroy milroy force-pushed the partial-cancel branch 3 times, most recently from 274f631 to dc1b519 Compare May 22, 2024 17:30
@milroy milroy force-pushed the partial-cancel branch 10 times, most recently from d51891a to 5ceb23f Compare June 21, 2024 07:13
@milroy milroy force-pushed the partial-cancel branch 6 times, most recently from 87600df to e05db43 Compare June 25, 2024 04:25
@milroy milroy force-pushed the partial-cancel branch 4 times, most recently from 791c2c8 to f4e5900 Compare June 27, 2024 05:33
@milroy
Copy link
Member Author

milroy commented Jun 27, 2024

@zekemorton, @trws, @jameshcorbett: please take a look at the current state when you get a chance. Getting your feedback soon will help me make sure this PR is on the right track.

@milroy milroy force-pushed the partial-cancel branch 3 times, most recently from 318281d to dfec30c Compare June 27, 2024 16:42
milroy added 2 commits July 8, 2024 22:26
Problem: resource-query doesn't have partial cancel functionality.

Add the necessary functions, help information, and input parsing to
enable sharness tests.
Problem: there are no sharness CI tests for partial cancel/release.

Add tests for correct behavior.
@milroy
Copy link
Member Author

milroy commented Jul 9, 2024

Anyway, as I read it, take the run_sched_loop out of the remove method, and make t1009 accept either job3 or job4 starting within the timeout. Just one of them has to, don't care which.

@trws: I've addressed your concern in qmanager, but let me know if you want me to treat t1009 differently.

@trws
Copy link
Member

trws commented Jul 9, 2024

Works for me, if it ever comes up as a perf concern we can deal with it then. Great job @milroy, love the cleanup you did along the way.

@trws
Copy link
Member

trws commented Jul 9, 2024

Feel free to land this one, I'll re-up the reformat and then fix the check requirements so we don't have to rerun all the checks here.

@grondo
Copy link
Contributor

grondo commented Jul 9, 2024

Since this was also asked in slack, here's a small script to wait for the start event for any number of jobs and exit with success on the first:

import sys
import flux
from flux.job import JobID, event_watch_async

found = False
handle = flux.Flux()

def event_cb(future, jobid):
    event = future.get_event()
    print(f"{jobid.f58}: {event}")
    if event.name == "start":
        found = True
        handle.reactor_stop()

for jobid in sys.argv[1:]:
    jobid = JobID(jobid)
    event_watch_async(handle, jobid).then(event_cb, jobid)

handle.reactor_run()
if not found:
    print("start event for job never appeared", file=sys.stderr)
    sys.exit(1)

@trws
Copy link
Member

trws commented Jul 9, 2024

Oh cool, thanks @grondo!

@trws
Copy link
Member

trws commented Jul 9, 2024

@milroy, ready for MWP?

@grondo
Copy link
Contributor

grondo commented Jul 9, 2024

@trws, @milroy, @jameshcorbett, what do you think about tagging another flux-sched release once this goes in? Then we can get a working housekeeping setup on fluke for testing asap..

@trws
Copy link
Member

trws commented Jul 9, 2024

I'd be happy to see that happen. Just need @milroy's sign-off to push this in.

@milroy
Copy link
Member Author

milroy commented Jul 9, 2024

I'm going to amend one commit to add REAPI CLI support for partial cancel as well. It should be ready within the hour.

milroy added 10 commits July 9, 2024 17:32
Problem: partial cancel functionality isn't available in the C REAPI.

Add the interface functions for module and cli.
Problem: there is no partial cancel functionality available in the C++
REAPI for the CLI and module.

Add the interface functions and the implementations for C++.
Problem: the resource module doesn't have support for partial
cancellation.

Add the callback for partial cancellation, and the logic to distinguish
between a single partial cancellation and a sequence of partial
cancellations that results in a full cancellation.
Problem: the callback that handles the `.free` RPC does not unpack the R
string payload.

Add the capability to unpack the R string and call the new partial
remove function.
Problem: the qmanager base policy and derived-classes do not have
partial cancellation functionality.

Add a virtual remove function overload in base, and overrides in FCFS
and backfill policies. In each case the policy needs to call the REAPI
module partial cancel function, which means it can't reside in the base
class.

Consolidate logic in the `remove` function to call partial cancel and
check if it fully removed the job's resources. If so, remove the job
from the allocated, running maps and set the state to be `COMPLETE`. Do
not enter the job into the completed map, because that will get popped
and cancelled again in `cancel_completed_jobs`. Note that the sched
loop needs to be cancelled and blocked jobs need to be reconsidered.
Finally, resume the sched loop to continue scheduling jobs.
Problem: std::strings are currently constructed from char * input
parameters and used to find resources and sub-planners, creating
unnecessary overhead.

Remove the extraneous constructions.
Problem: flux ion-resource does not support partial cancellation.

Add support for sending partial cancel RPCs.
Problem: flux ion-resource does not have testsuite tests.

Add them.
Problem: many range-based loops in Fluxion result in calling an object
copy constructor.

Avoid the copy constructor cost and add const type qualifier where
appropriate.
Problem: partial cancel functionality changes the order of jobid3 and
jobid4 start after cancellation of jobid2 in
t1009-recovery-multiqueue.

Add an OR condition to wait on jobid3 or jobid4 to start upon
cancelling jobid2.
@milroy
Copy link
Member Author

milroy commented Jul 10, 2024

Thanks for the feedback and testing @trws, @garlick, and @grondo!

Setting MWP.

@milroy milroy added the merge-when-passing mergify.io - merge PR automatically once CI passes label Jul 10, 2024
@mergify mergify bot merged commit 7234b37 into flux-framework:master Jul 10, 2024
23 checks passed
Copy link

codecov bot commented Oct 8, 2024

Codecov Report

Attention: Patch coverage is 76.91310% with 178 lines in your changes missing coverage. Please review.

Project coverage is 74.3%. Comparing base (5e54a68) to head (a20ad21).
Report is 170 commits behind head on master.

Files with missing lines Patch % Lines
resource/traversers/dfu_impl_update.cpp 73.9% 42 Missing ⚠️
resource/reapi/bindings/c++/reapi_cli_impl.hpp 0.0% 38 Missing ⚠️
resource/readers/resource_reader_jgf.cpp 80.4% 24 Missing ⚠️
resource/modules/resource_match.cpp 69.0% 13 Missing ⚠️
resource/readers/resource_reader_rv1exec.cpp 76.5% 11 Missing ⚠️
resource/utilities/command.cpp 82.3% 9 Missing ⚠️
resource/planner/c/planner_multi_c_interface.cpp 84.0% 8 Missing ⚠️
qmanager/policies/base/queue_policy_base.hpp 74.0% 7 Missing ⚠️
resource/reapi/bindings/c++/reapi_module_impl.hpp 66.6% 7 Missing ⚠️
qmanager/modules/qmanager_callbacks.cpp 60.0% 6 Missing ⚠️
... and 5 more
Additional details and impacted files
@@           Coverage Diff            @@
##           master   #1163     +/-   ##
========================================
+ Coverage    74.1%   74.3%   +0.2%     
========================================
  Files         103     104      +1     
  Lines       14606   15206    +600     
========================================
+ Hits        10828   11311    +483     
- Misses       3778    3895    +117     
Files with missing lines Coverage Δ
qmanager/modules/qmanager.cpp 73.5% <ø> (+<0.1%) ⬆️
qmanager/policies/queue_policy_bf_base_impl.hpp 81.1% <100.0%> (-1.1%) ⬇️
qmanager/policies/queue_policy_fcfs.hpp 100.0% <ø> (ø)
qmanager/policies/queue_policy_fcfs_impl.hpp 72.6% <100.0%> (-1.6%) ⬇️
resource/planner/c++/planner_multi.hpp 100.0% <ø> (ø)
resource/planner/test/planner_test01.cpp 100.0% <100.0%> (ø)
resource/planner/test/planner_test02.cpp 100.0% <100.0%> (ø)
resource/readers/resource_reader_base.hpp 100.0% <100.0%> (ø)
resource/readers/resource_reader_hwloc.hpp 100.0% <ø> (ø)
resource/readers/resource_reader_jgf.hpp 100.0% <ø> (ø)
... and 18 more

... and 2 files with indirect coverage changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merge-when-passing mergify.io - merge PR automatically once CI passes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants