Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up final constraint enforcement #1286

Merged
merged 8 commits into from
Sep 3, 2024

Conversation

trws
Copy link
Member

@trws trws commented Aug 31, 2024

This includes fixes for various performance issues in the scoring api type, and resource data types (it should even speed up graph creation by making the resource types movable). The main item though is below. It takes the performance of the benchmark from #1171 from 10 jobs/s for constrained jobs and 15 jobs/s for unconstrained to 100-150 jobs/s for constrained and 150-200 jobs/s for unconstrained. For some reason this patch also consistently hits the issue in the progress bar where it will reach past the end of style, so if that comes up make sure to be testing with flux-core master.

problem: In commit b96899b, as part of adding node exclusive,
enforce_dfv was added to precisely mark unconstrained resources between
the root of the containment tree and the constrained resources. This
works fine, but it's a full traversal of the graph that gets run even
when selecting a single core. Usually this isn't terrible performance
because if the Node type is constrained then all the node vertices get
skipped after the constrained one is found. When the job is only
constrained on a core or socket however, or if the node isn't found
quickly, it can be as bad as a full traversal. Literally 90% of the
time spent matching a small single core job in the perf test case in
issue #1171.

solution: The short version is: Remove enforce_dfv completely. The
longer version is that instead of traversing the entire graph, we now
re-use the iteration over constrained resources in dfu_impl_t::enforce
and walk upward from each constrained vertex in the dominant subsystem
until we find a vertex that we've already set best_k (since any above
that must already have been set). That way we walk the minimum amount
of the tree necessary. Honestly it's still doing more work than it
should, because we're still iterating over resources that are children
of exclusive resources unnecessarily, and we're having to iterate over
in_edges rather than directly knowing the parent or subsetting on
subsystem, but fixing either of those would be a meaningful refactor so
I'm stopping here for now with the 15-20x speedup.

fixes #1171

@trws trws requested review from grondo and milroy August 31, 2024 17:45
trws added 2 commits August 31, 2024 10:45
problem: resource pools and relations don't have move constructors or
move assignment

solution: add them, defaulted, and remove the unnecessary explicit
constructors
problem: we're accidentally hashing the unique_id twice in some cases,
and using a very slow and not great hash for the pointer in the
shared_ptr case

solution: specialize the rc storage to more efficiently hash shared_ptr,
and only compute the unique_id hash once
@trws trws force-pushed the refactor-subsystem-identity branch from 2eec4cd to dad5626 Compare August 31, 2024 17:45
trws added 4 commits August 31, 2024 10:48
problem: scoring_api does a lot of extra std::map lookups that aren't
really necessary, and uses a map where an interned_key_vec would do at
least as well

solution: swap the std::map for the interned_key_vec, and remove a whole
lot of calls to handle new keys, which actually hadn't been necessary
for a bit
problem: we have typedefs for common maps of subsystem to string or set
of string, but we're using std::map

solution: stop doing that
problem: we store a mapping of subsystems in every edge, but each edge
is only ever in one subsystem, they're effectively an edge type, so it's
a waste of time and space both in memory and in all our serialization
formats that carry it like JGF

solution: swap the std::map<string,subsystem_t> for a single
subsystem_t, fix the breakage in all the readers and writers and fix all
the JGFs in the repository
problem: the files _start_ with json but then have other output after
that, so putting them through jq causes errors

solution: use test_cmp instead
@trws trws force-pushed the refactor-subsystem-identity branch from dad5626 to cd63a6a Compare August 31, 2024 17:48
problem: In commit b96899b, as part of adding node exclusive,
enforce_dfv was added to precisely mark unconstrained resources between
the root of the containment tree and the constrained resources.  This
works fine, but it's a _full traversal of the graph_ that gets run even
when selecting a single core.  Usually this isn't terrible performance
because if the Node type is constrained then all the node vertices get
skipped after the constrained one is found. When the job is only
constrained on a core or socket however, or if the node isn't found
quickly, it can be as bad as a full traversal.  Literally 90% of the
time spent matching a small single core job in the perf test case in
issue flux-framework#1171.

solution: The short version is: Remove enforce_dfv completely.  The
longer version is that instead of traversing the entire graph, we now
re-use the iteration over constrained resources in dfu_impl_t::enforce
and walk upward from each constrained vertex in the dominant subsystem
until we find a vertex that we've already set best_k (since any above
that must already have been set).  That way we walk the minimum amount
of the tree necessary.  Honestly it's still doing more work than it
should, because we're still iterating over resources that are children
of exclusive resources unnecessarily, and we're having to iterate over
`in_edges` rather than directly knowing the parent or subsetting on
subsystem, but fixing either of those would be a meaningful refactor so
I'm stopping here for now with the 15-20x speedup.

fixes flux-framework#1171
@trws trws force-pushed the refactor-subsystem-identity branch from 0d5065a to 85d61d8 Compare September 1, 2024 23:40
problem: several files have extra _seconds_ of compile time due to
over-including headers in headers

solution: include less things we do not need
Copy link
Contributor

@grondo grondo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow! Huge improvement here:

Submitting 100 jobs: │███████████████████████████████████████│100.0% 150.5 job/s

real	0m4.229s
user	0m0.063s
sys	0m0.008s
{
 "V": 1654785,
 "E": 1654784,
 "by_rank": {
  "[0-16383]": 101
 },
 "load-time": 75.918168234999996,
 "graph-uptime": 40,
 "time-since-reset": 116,
 "match": {
  "succeeded": {
   "njobs": 100,
   "njobs-reset": 100,
   "max-match-jobid": 1950452023297,
   "max-match-iters": 1,
   "stats": {
    "min": 0.022669928999999998,
    "max": 0.028772064,
    "avg": 0.027917891049999994,
    "variance": 1.0197218638951509e-6
   }
  },
  "failed": {
   "njobs": 0,
   "njobs-reset": 0,
   "max-match-jobid": 0,
   "max-match-iters": 0,
   "stats": {
    "min": 0.0,
    "max": 0.0,
    "avg": 0.0,
    "variance": 0.0
   }
  }
 }
}

One question, will this break existing JGF on clusters that have it, or if Fluxion is reloaded from an older version on some system?

Oh, and also does flux ion-R need an update? (Sorry if I missed something obvious)

@trws
Copy link
Member Author

trws commented Sep 3, 2024

It's subtle, but the change for flux ion-R is handled by these two lines: https://github.com/flux-framework/flux-sched/pull/1286/files#diff-5f60472bc026d952db8a60e3a2ccbc3f838188190d39ca9cd1f9ea05170ee4cdL94-R94

All the JGF handling in there is done by that other file, come to find out. It actually took me a shockingly long time to find that.

As to the breaking existing JGF, it's possible if we're using JGF in something on those systems, which we might be in some cases. I don't see where though, it looks like we're just storing R for the R key on tioga for example, so I don't think there's anything to break there. This is part of why I asked about evolving JGF at the last meeting. If we need I can make it so the reader can ingest either the new or old format and just always turn it into this internal representation, but this will not be able to emit the old format without some loss.

@grondo
Copy link
Contributor

grondo commented Sep 3, 2024

Ok, sounds good. Perhaps @jameshcorbett could comment if there are any systems actively using JGF (which I think he mentioned it shouldn't be an issue to upgrade)

@trws trws added the merge-when-passing mergify.io - merge PR automatically once CI passes label Sep 3, 2024
@mergify mergify bot merged commit 758258e into flux-framework:master Sep 3, 2024
20 of 21 checks passed
jameshcorbett added a commit to jameshcorbett/flux-coral2 that referenced this pull request Sep 4, 2024
Problem: flux-framework/flux-sched/pull/1286 changed the way edges
are represented in JGF.

Update the expected JGF output of flux-dws2jgf.
jameshcorbett added a commit to jameshcorbett/flux-coral2 that referenced this pull request Sep 5, 2024
Problem: flux-framework/flux-sched/pull/1286 changed the way edges
are represented in JGF.

Update the expected JGF output of flux-dws2jgf.
Copy link

codecov bot commented Dec 2, 2024

Codecov Report

Attention: Patch coverage is 70.83333% with 35 lines in your changes missing coverage. Please review.

Project coverage is 75.3%. Comparing base (4d9fc6f) to head (8b63d37).
Report is 67 commits behind head on master.

Files with missing lines Patch % Lines
resource/utilities/resource-query.cpp 19.0% 17 Missing ⚠️
resource/evaluators/scoring_api.cpp 76.0% 6 Missing ⚠️
resource/schema/resource_data.cpp 50.0% 6 Missing ⚠️
resource/evaluators/scoring_api.hpp 77.7% 2 Missing ⚠️
resource/writers/match_writers.cpp 60.0% 2 Missing ⚠️
resource/readers/resource_reader_jgf.cpp 90.9% 1 Missing ⚠️
resource/traversers/dfu_impl.cpp 95.6% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##           master   #1286    +/-   ##
=======================================
  Coverage    75.3%   75.3%            
=======================================
  Files         111     110     -1     
  Lines       15361   15236   -125     
=======================================
- Hits        11575   11482    -93     
+ Misses       3786    3754    -32     
Files with missing lines Coverage Δ
resource/modules/resource_match.cpp 68.9% <ø> (ø)
resource/readers/resource_reader_grug.cpp 81.3% <100.0%> (ø)
resource/readers/resource_reader_hwloc.cpp 65.3% <100.0%> (ø)
resource/readers/resource_reader_jgf.hpp 100.0% <ø> (ø)
resource/readers/resource_reader_rv1exec.cpp 71.0% <100.0%> (ø)
resource/readers/resource_spec_grug.cpp 47.2% <ø> (ø)
resource/readers/resource_spec_grug.hpp 100.0% <ø> (ø)
resource/traversers/dfu_impl.hpp 94.5% <ø> (ø)
resource/utilities/command.cpp 78.5% <ø> (ø)
src/common/libintern/interner.hpp 97.5% <100.0%> (+<0.1%) ⬆️
... and 8 more

... and 2 files with indirect coverage changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merge-when-passing mergify.io - merge PR automatically once CI passes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sharness based scheduler performance study
2 participants