Flexible Scheduling with or_slot [WIP] #1296

zekemorton · 2024-09-16T22:35:01Z

This PR introduces flexible scheduling to the traverser. Flexible scheduling will allow the user to define several different acceptable resource configurations that can be selected from while the traverser is walking the resource graph. These configurations are specified as or_slots, where a single or_slot is an acceptable compute unit. These or_slots are selected from equally from any available options at a given point in the graph traversal. Any combination of the or_slots may be selected, and they should be thought of as interchangeable configurations.

Implementation:
or_slot:
or_slots are a new slot-like resource type. They searched for and treated similarly to slots, but in this case you can have sibling or_slots at the same level. or slots also have the added complexity of needing to select the best configuration of the or_slots for available resources. At time of submitting this WIP PR, this is done by taking the union of all resources specified in all the or slots and completing the traversal with all of those resources to get resource counts for those resources. With those resource counts the best configurations can be selected and scheduled as normally with slots.

slot configuration selection:
After getting resource counts for all possible resources in all of the or_slots, the slot configuration is determined. I was not able to find an efficient means of finding the best configurations in terms of score. A first attempt landed me on a greedy algorithm that that would do well interns of score, but that would on occasion not find a match when a match could be found. This PR show an example of a DP algorithm that optimizes over the number of or_slots that it can schedule with the given resources. How the configuration is selected could be configurable in a similar way to how match policies are selected, and there could be much effort to give flexible and features to this selection process. An example might be to use the same DP algo, but weight some configurations more than others.

Caveats:

Because we take the union of all possible resources based on type, this leads to errors when using or_slots at levels above leaves. If we have tow configurations that have sockets, one with 10 cores and the other with 20, we can only get counts for sockets with one of those numbers of cores. This issue stems from how we color the graph during a traversal, which may not be necessary. It may be pssobile that we can remove the coloring, and then travers the graph for each or_slot. This may require some adaptation to the DP algo, but will be an overall improvement.
Slots are scheduled exclusively. Because slots are scheduled exclusively, this results in some weird behavior if or_slots are under slots or if or_slots contain non leaf nodes. This behavior needs to be tested further
The calculation for which configuration to use is done during the traversal and only with the context of the resource counts at that particular stage of the traversal. This means that with this method, it would be impossible to find an optimal configuration over the entire resource graph instead of just that local vertex

To do Items:

code clean up and add comments as needed
write full suite of tests, including tests with non leaf nodes
investigate changes to traversal without graph coloring
modularize the or config selection policy
handle or_slot counts appropriately

Example Job Spec:

version: 9999
resources:
    - type: cluster
      count: 1
      with:
        - type: rack
          count: 1
          with:
            - type: node
              count: 1
              with:
                - type: socket
                  count: 1
                  with:
                    - type: or_slot
                      count: 1
                      label: small
                      with:
                        - type: core
                          count: 8
                        - type: gpu
                          count: 1
                    - type: or_slot
                      count: 1
                      label: big
                      with:
                        - type: core
                          count: 10
# a comment
attributes:
  system:
    duration: 3600
tasks:
  - command: [ "app" ]
    slot: default
    count:
      per_slot: 1

zekemorton · 2024-10-15T22:58:31Z

Capturing the original greedy solution here since I am going to collapse that commit with the dp solution:

int dfu_impl_t::dom_or_slot (const jobmeta_t &meta,
                             vtx_t u,
                             const std::vector<Resource> &slots,
                             unsigned int nslots,
                             bool pristine,
                             bool *excl,
                             scoring_api_t &dfu)
{
    int rc;
    bool x_inout = true;
    unsigned int qual_num_slots = 0;
    std::vector<eval_egroup_t> edg_group_vector;
    const subsystem_t &dom = m_match->dom_subsystem ();
    std::unordered_set<edg_t *> edges_used;
    scoring_api_t dfu_slot;

    // collect a set of all resource types in the or_slots to get resource
    // counts. This does not work well with non leaf vertex resources because
    // it cannot distinguish beyond type. This may be resolveable if graph
    // coloring is removed during the selection process.
    std::vector<Resource> slot_resource_union;
    std::set<resource_type_t> resource_types;
    for (auto &slot : slots) {
        for (auto r : slot.with) {
            if (resource_types.find (r.type) == resource_types.end ()) {
                resource_types.insert (r.type);
                slot_resource_union.push_back (r);
            }
        }
    }

    if ((rc = explore (meta,
                       u,
                       dom,
                       slot_resource_union,
                       pristine,
                       &x_inout,
                       visit_t::DFV,
                       dfu_slot,
                       nslots))
        != 0)
        goto done;
    if ((rc = m_match->dom_finish_slot (dom, dfu_slot)) != 0)
        goto done;
    for (unsigned int i = 0; i < nslots; ++i) {
        std::unordered_set<edg_t *> remove_edges;
        eval_egroup_t edg_group;
        for (auto slot : slots) {
            auto slot_shape = slot.with;
            int64_t score = MATCH_MET;
            eval_egroup_t test_edg_group;
            std::unordered_set<edg_t *> test_edges;
            bool found = true;
            for (auto &slot_elem : slot_shape) {
                unsigned int j = 0;
                unsigned int qc = dfu_slot.qualified_count (dom, slot_elem.type);
                unsigned int count = m_match->calc_count (slot_elem, qc);
                dfu_slot.eval_egroups_iter_reset (dom, slot_elem.type);
                while (j < count) {
                    auto egroup_i = dfu_slot.eval_egroups_iter_next (dom, slot_elem.type);
                    if (egroup_i == dfu_slot.eval_egroups_end (dom, slot_elem.type)) {
                        found = false;
                        break;
                    }
                    if (edges_used.find (&(*egroup_i).edges[0].edge) == edges_used.end ())
                        test_edges.insert (&(*egroup_i).edges[0].edge);
                    else
                        continue;
                    eval_edg_t ev_edg ((*egroup_i).edges[0].count,
                                       (*egroup_i).edges[0].count,
                                       1,
                                       (*egroup_i).edges[0].edge);
                    score += (*egroup_i).score;
                    test_edg_group.edges.push_back (ev_edg);
                    j += (*egroup_i).edges[0].count;
                }
            }
            if (!found) {
                continue;
            }
            if (edg_group.score < score) {
                remove_edges = test_edges;
                test_edg_group.score = score;
                test_edg_group.count = 1;
                test_edg_group.exclusive = 1;
                edg_group = test_edg_group;
            }
        }
        edges_used.insert (remove_edges.begin (), remove_edges.end ());
        edg_group_vector.push_back (edg_group);
    }
    for (auto &edg_group : edg_group_vector)
        dfu.add (dom, or_slot_rt, edg_group);

done:
    return (qual_num_slots) ? 0 : -1;
}

Problem: the traverser does not support options for flexible scheduling. Add support for a logical or type of resource group, or_slots. or_slots are options for resource configurations that the traverser considers when selecting resources.

Problem: The traverser primes the jobspec with count of resources that are specified as pruning filter types. This additive accumulation results in counts that could be much higher than available counts in the planner when using flexible scheduling with or_slots. These high counts cause the pruning by subplanner to stop the traversal. This results in matches not being found when matches are available. Add in a new accumulation option min_if. This takes the lowest count instead of the sum of all resource counts. Use this when the parent type is or_slot_rt.

Problem: there are no tests for or_slots Add tests

zekemorton · 2024-10-29T23:48:32Z

After a discussion with @milroy, we decided that it does not make sense to modularize the policy and selection algorithm for or slots yet, and that we can save it for when we have more than one policy or decide on how that can be expressed in the job spec. this is a similar point to how we want to deal with or slot counts, we do not yet have a great solution for that yet.

Additionally, I have done some performance testing on or slots to see how this algorithm affects the scheduling time. I uploaded the csv with this comment. The columns in the CSV are Graph,Policy,Jobs,Slots,Or-Slots,Resources,Time where Graph refers to the resource graph used, policy is the policy used either first or high, Jobs is the number of jobs scheduled or revered in a particular run, Slots is the number of regular slots either 0 or 1, or-slots is the number of or-slots, resources is the number of resources in a slot, and time is the time it took to schedule a single job using resource query. I plan to evaluate the data more closely and make some graphs, but I figured I upload the csv now in case anyone else wants to take a look
or_slot_results.csv

Problem: the or_config map uses a string as an index. Fluxion is moving away from using strings where ever possible. Create a struct and custom hashing function to use the map of resource counts directly as an index.

codecov · 2024-11-20T18:35:49Z

Codecov Report

Attention: Patch coverage is 95.94595% with 6 lines in your changes missing coverage. Please review.

Project coverage is 75.5%. Comparing base (996f999) to head (a35f8a8).
Report is 18 commits behind head on master.

Files with missing lines	Patch %	Lines
resource/traversers/dfu_impl.cpp	96.2%	5 Missing ⚠️
resource/traversers/dfu_impl.hpp	93.3%	1 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff            @@
##           master   #1296     +/-   ##
========================================
+ Coverage    75.3%   75.5%   +0.2%     
========================================
  Files         111     111             
  Lines       15300   16119    +819     
========================================
+ Hits        11531   12183    +652     
- Misses       3769    3936    +167

Files with missing lines	Coverage Δ
resource/traversers/dfu_impl.hpp	`94.2% <93.3%> (-0.4%)`	⬇️
resource/traversers/dfu_impl.cpp	`85.8% <96.2%> (+1.7%)`	⬆️

... and 69 files with indirect coverage changes

---- 🚨 Try these New Features:

Flaky Tests Detection - Detect and resolve failed and flaky tests

zekemorton force-pushed the resource-or branch 13 times, most recently from 69c8c91 to 29bb462 Compare September 19, 2024 21:32

zekemorton mentioned this pull request Sep 23, 2024

Jobspecs with flexible resource types #1259

Open

zekemorton force-pushed the resource-or branch from 29bb462 to b64c82b Compare October 15, 2024 22:59

zekemorton added 2 commits October 15, 2024 15:59

traverser: add support for or_slots

ac65bd0

Problem: the traverser does not support options for flexible scheduling. Add support for a logical or type of resource group, or_slots. or_slots are options for resource configurations that the traverser considers when selecting resources.

zekemorton force-pushed the resource-or branch from b64c82b to c4fbdb4 Compare October 15, 2024 22:59

tests: add or_slot flexible tests

5095b76

Problem: there are no tests for or_slots Add tests

zekemorton force-pushed the resource-or branch from c4fbdb4 to 5095b76 Compare October 17, 2024 21:27

zekemorton force-pushed the resource-or branch from f011ad7 to 6465c9d Compare November 20, 2024 18:17

traverser: use map as index for or_config

a35f8a8

Problem: the or_config map uses a string as an index. Fluxion is moving away from using strings where ever possible. Create a struct and custom hashing function to use the map of resource counts directly as an index.

zekemorton force-pushed the resource-or branch from 6465c9d to a35f8a8 Compare November 20, 2024 18:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flexible Scheduling with or_slot [WIP] #1296

Flexible Scheduling with or_slot [WIP] #1296

zekemorton commented Sep 16, 2024 •

edited

Loading

zekemorton commented Oct 15, 2024

zekemorton commented Oct 29, 2024

codecov bot commented Nov 20, 2024

Flexible Scheduling with or_slot [WIP] #1296

Are you sure you want to change the base?

Flexible Scheduling with or_slot [WIP] #1296

Conversation

zekemorton commented Sep 16, 2024 • edited Loading

zekemorton commented Oct 15, 2024

zekemorton commented Oct 29, 2024

codecov bot commented Nov 20, 2024

Codecov Report

zekemorton commented Sep 16, 2024 •

edited

Loading