flux-resource: improve performance of `flux resource list` #5823

grondo · 2024-03-23T17:53:26Z

This PR improves the performance of flux resource list by modifying some really inefficient code:

The ResourceSetExtra wrapper class unnecessarily reinitializes the ResourceSet argument. Just stash the ResourceSet instead and forward method calls to it to avoid this wasted work.
resources_uniq_lines() iterates over each individual rank in each resource set to collect common lines of output together. Instead, split a resource set into all combinations of properties and iterate each of these sets. This should be the minimum number of sets required to create possibly unique lines (at least at this point). This reduces the iteration count significantly.

Timing before with a very large resource set (~10K ranks):

 time flux resource list --from-stdin < ../../large-list.json >/dev/null

real	0m6.878s
user	0m6.762s
sys	0m0.089s

Timing after these changes:

$ time src/cmd/flux resource list --from-stdin < ../../large-list.json >/dev/null

real	0m1.790s
user	0m1.662s
sys	0m0.102s

Further improvements will probably have to be made in librlist itself.

Problem: The ResourceSetExtra class is supposed to be just a wrapper around a ResourceSet class that adds the convenience `propertiesx` and `queue` properties. However, using this class is slow because it reinvokes the underlying ResourceSet initializer, which ends up creating a unnecessary copy of the resource set from scratch. Make the class a wrapper by just stashing the original ResourceSet object and forwarding unknown getattrs to the wrapped object. This avoids the wasted time recreating the coped resource set.

Problem: The `flux resource list` command spends the majority of its time in resource_uniq_lines() on large clusters because it iterates over every rank in each state resource set in order to create the set of unique lines. Instead split the resource into smaller subsets based on all combinations of properties contained within that set. For the purposes of the current incarnation of `flux resource list` this should be the minimum number of distinct resource sets required to generate possibly unique lines of output.

Problem: lines.values() does not guarantee a sorting order, but this is passed to formatter.print_items() in `flux resource list`, which could lead to arbitrary output order. Sort output lines on (resource state, first rank) to create reproducible output.

garlick

LGTM! Nice results!

grondo · 2024-03-24T14:22:35Z

Thanks! I'll set mwp

codecov · 2024-03-24T14:23:18Z

Codecov Report

Merging #5823 (5693383) into master (a1070a3) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #5823   +/-   ##
=======================================
  Coverage   83.35%   83.35%           
=======================================
  Files         509      509           
  Lines       82494    82515   +21     
=======================================
+ Hits        68759    68782   +23     
+ Misses      13735    13733    -2

Files	Coverage Δ
src/cmd/flux-resource.py	`95.44% <100.00%> (+0.27%)`	⬆️

... and 15 files with indirect coverage changes

grondo added 3 commits March 24, 2024 02:02

grondo force-pushed the issue#5819 branch from 402a962 to 5693383 Compare March 24, 2024 02:02

grondo mentioned this pull request Mar 24, 2024

optimize key librlist functions to improve flux resource list performance #5824

Merged

garlick approved these changes Mar 24, 2024

View reviewed changes

grondo added the merge-when-passing label Mar 24, 2024

mergify bot merged commit 794be81 into flux-framework:master Mar 24, 2024
34 of 35 checks passed

grondo deleted the issue#5819 branch March 24, 2024 14:28

This was referenced Mar 25, 2024

flux-resource list processing is very slow #5819

Closed

job submissions are serialized and not interactively performant flux-framework/flux-sched#1159

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

flux-resource: improve performance of `flux resource list` #5823

flux-resource: improve performance of `flux resource list` #5823

grondo commented Mar 23, 2024

garlick left a comment

grondo commented Mar 24, 2024

codecov bot commented Mar 24, 2024

flux-resource: improve performance of flux resource list #5823

flux-resource: improve performance of flux resource list #5823

Conversation

grondo commented Mar 23, 2024

garlick left a comment

Choose a reason for hiding this comment

grondo commented Mar 24, 2024

codecov bot commented Mar 24, 2024

Codecov Report

flux-resource: improve performance of `flux resource list` #5823

flux-resource: improve performance of `flux resource list` #5823