Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flexible Scheduling with or_slot [WIP] #1296

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

zekemorton
Copy link
Collaborator

@zekemorton zekemorton commented Sep 16, 2024

This PR introduces flexible scheduling to the traverser. Flexible scheduling will allow the user to define several different acceptable resource configurations that can be selected from while the traverser is walking the resource graph. These configurations are specified as or_slots, where a single or_slot is an acceptable compute unit. These or_slots are selected from equally from any available options at a given point in the graph traversal. Any combination of the or_slots may be selected, and they should be thought of as interchangeable configurations.

Implementation:
or_slot:
or_slots are a new slot-like resource type. They searched for and treated similarly to slots, but in this case you can have sibling or_slots at the same level. or slots also have the added complexity of needing to select the best configuration of the or_slots for available resources. At time of submitting this WIP PR, this is done by taking the union of all resources specified in all the or slots and completing the traversal with all of those resources to get resource counts for those resources. With those resource counts the best configurations can be selected and scheduled as normally with slots.

slot configuration selection:
After getting resource counts for all possible resources in all of the or_slots, the slot configuration is determined. I was not able to find an efficient means of finding the best configurations in terms of score. A first attempt landed me on a greedy algorithm that that would do well interns of score, but that would on occasion not find a match when a match could be found. This PR show an example of a DP algorithm that optimizes over the number of or_slots that it can schedule with the given resources. How the configuration is selected could be configurable in a similar way to how match policies are selected, and there could be much effort to give flexible and features to this selection process. An example might be to use the same DP algo, but weight some configurations more than others.

Caveats:

  • Because we take the union of all possible resources based on type, this leads to errors when using or_slots at levels above leaves. If we have tow configurations that have sockets, one with 10 cores and the other with 20, we can only get counts for sockets with one of those numbers of cores. This issue stems from how we color the graph during a traversal, which may not be necessary. It may be pssobile that we can remove the coloring, and then travers the graph for each or_slot. This may require some adaptation to the DP algo, but will be an overall improvement.
  • Slots are scheduled exclusively. Because slots are scheduled exclusively, this results in some weird behavior if or_slots are under slots or if or_slots contain non leaf nodes. This behavior needs to be tested further
  • The calculation for which configuration to use is done during the traversal and only with the context of the resource counts at that particular stage of the traversal. This means that with this method, it would be impossible to find an optimal configuration over the entire resource graph instead of just that local vertex

To do Items:

  • code clean up and add comments as needed
  • write full suite of tests, including tests with non leaf nodes
  • investigate changes to traversal without graph coloring
  • modularize the or config selection policy
  • handle or_slot counts appropriately

Example Job Spec:

version: 9999
resources:
    - type: cluster
      count: 1
      with:
        - type: rack
          count: 1
          with:
            - type: node
              count: 1
              with:
                - type: socket
                  count: 1
                  with:
                    - type: or_slot
                      count: 1
                      label: small
                      with:
                        - type: core
                          count: 8
                        - type: gpu
                          count: 1
                    - type: or_slot
                      count: 1
                      label: big
                      with:
                        - type: core
                          count: 10
# a comment
attributes:
  system:
    duration: 3600
tasks:
  - command: [ "app" ]
    slot: default
    count:
      per_slot: 1

@zekemorton zekemorton force-pushed the resource-or branch 13 times, most recently from 69c8c91 to 29bb462 Compare September 19, 2024 21:32
@zekemorton
Copy link
Collaborator Author

Capturing the original greedy solution here since I am going to collapse that commit with the dp solution:

int dfu_impl_t::dom_or_slot (const jobmeta_t &meta,
                             vtx_t u,
                             const std::vector<Resource> &slots,
                             unsigned int nslots,
                             bool pristine,
                             bool *excl,
                             scoring_api_t &dfu)
{
    int rc;
    bool x_inout = true;
    unsigned int qual_num_slots = 0;
    std::vector<eval_egroup_t> edg_group_vector;
    const subsystem_t &dom = m_match->dom_subsystem ();
    std::unordered_set<edg_t *> edges_used;
    scoring_api_t dfu_slot;

    // collect a set of all resource types in the or_slots to get resource
    // counts. This does not work well with non leaf vertex resources because
    // it cannot distinguish beyond type. This may be resolveable if graph
    // coloring is removed during the selection process.
    std::vector<Resource> slot_resource_union;
    std::set<resource_type_t> resource_types;
    for (auto &slot : slots) {
        for (auto r : slot.with) {
            if (resource_types.find (r.type) == resource_types.end ()) {
                resource_types.insert (r.type);
                slot_resource_union.push_back (r);
            }
        }
    }

    if ((rc = explore (meta,
                       u,
                       dom,
                       slot_resource_union,
                       pristine,
                       &x_inout,
                       visit_t::DFV,
                       dfu_slot,
                       nslots))
        != 0)
        goto done;
    if ((rc = m_match->dom_finish_slot (dom, dfu_slot)) != 0)
        goto done;
    for (unsigned int i = 0; i < nslots; ++i) {
        std::unordered_set<edg_t *> remove_edges;
        eval_egroup_t edg_group;
        for (auto slot : slots) {
            auto slot_shape = slot.with;
            int64_t score = MATCH_MET;
            eval_egroup_t test_edg_group;
            std::unordered_set<edg_t *> test_edges;
            bool found = true;
            for (auto &slot_elem : slot_shape) {
                unsigned int j = 0;
                unsigned int qc = dfu_slot.qualified_count (dom, slot_elem.type);
                unsigned int count = m_match->calc_count (slot_elem, qc);
                dfu_slot.eval_egroups_iter_reset (dom, slot_elem.type);
                while (j < count) {
                    auto egroup_i = dfu_slot.eval_egroups_iter_next (dom, slot_elem.type);
                    if (egroup_i == dfu_slot.eval_egroups_end (dom, slot_elem.type)) {
                        found = false;
                        break;
                    }
                    if (edges_used.find (&(*egroup_i).edges[0].edge) == edges_used.end ())
                        test_edges.insert (&(*egroup_i).edges[0].edge);
                    else
                        continue;
                    eval_edg_t ev_edg ((*egroup_i).edges[0].count,
                                       (*egroup_i).edges[0].count,
                                       1,
                                       (*egroup_i).edges[0].edge);
                    score += (*egroup_i).score;
                    test_edg_group.edges.push_back (ev_edg);
                    j += (*egroup_i).edges[0].count;
                }
            }
            if (!found) {
                continue;
            }
            if (edg_group.score < score) {
                remove_edges = test_edges;
                test_edg_group.score = score;
                test_edg_group.count = 1;
                test_edg_group.exclusive = 1;
                edg_group = test_edg_group;
            }
        }
        edges_used.insert (remove_edges.begin (), remove_edges.end ());
        edg_group_vector.push_back (edg_group);
    }
    for (auto &edg_group : edg_group_vector)
        dfu.add (dom, or_slot_rt, edg_group);

done:
    return (qual_num_slots) ? 0 : -1;
}

Problem: the traverser does not support options for flexible
scheduling.

Add support for a logical or type of resource group, or_slots.
or_slots are options for resource configurations that the traverser
considers when selecting resources.
Problem: The traverser primes the jobspec with count of resources
that are specified as pruning filter types. This additive
accumulation results in counts that could be much higher than
available counts in the planner when using flexible scheduling with
or_slots. These high counts cause the pruning by subplanner to stop
the traversal. This results in matches not being found when matches
are available.

Add in a new accumulation option min_if. This takes the lowest count
instead of the sum of all resource counts. Use this when the parent
type is or_slot_rt.
Problem: there are no tests for or_slots

Add tests
@zekemorton
Copy link
Collaborator Author

After a discussion with @milroy, we decided that it does not make sense to modularize the policy and selection algorithm for or slots yet, and that we can save it for when we have more than one policy or decide on how that can be expressed in the job spec. this is a similar point to how we want to deal with or slot counts, we do not yet have a great solution for that yet.

Additionally, I have done some performance testing on or slots to see how this algorithm affects the scheduling time. I uploaded the csv with this comment. The columns in the CSV are Graph,Policy,Jobs,Slots,Or-Slots,Resources,Time where Graph refers to the resource graph used, policy is the policy used either first or high, Jobs is the number of jobs scheduled or revered in a particular run, Slots is the number of regular slots either 0 or 1, or-slots is the number of or-slots, resources is the number of resources in a slot, and time is the time it took to schedule a single job using resource query. I plan to evaluate the data more closely and make some graphs, but I figured I upload the csv now in case anyone else wants to take a look
or_slot_results.csv

Problem: the or_config map uses a string as an index. Fluxion is
moving away from using strings where ever possible.

Create a struct and custom hashing function to use the map of
resource counts directly as an index.
Copy link

codecov bot commented Nov 20, 2024

Codecov Report

Attention: Patch coverage is 95.94595% with 6 lines in your changes missing coverage. Please review.

Project coverage is 75.5%. Comparing base (996f999) to head (a35f8a8).
Report is 18 commits behind head on master.

Files with missing lines Patch % Lines
resource/traversers/dfu_impl.cpp 96.2% 5 Missing ⚠️
resource/traversers/dfu_impl.hpp 93.3% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff            @@
##           master   #1296     +/-   ##
========================================
+ Coverage    75.3%   75.5%   +0.2%     
========================================
  Files         111     111             
  Lines       15300   16119    +819     
========================================
+ Hits        11531   12183    +652     
- Misses       3769    3936    +167     
Files with missing lines Coverage Δ
resource/traversers/dfu_impl.hpp 94.2% <93.3%> (-0.4%) ⬇️
resource/traversers/dfu_impl.cpp 85.8% <96.2%> (+1.7%) ⬆️

... and 69 files with indirect coverage changes

---- 🚨 Try these New Features:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant