plugin: add queue priority to priority calculation, plugin #207

cmoussa1 · 2022-02-14T22:26:27Z

Background

The cleanup work described in #204 has now been completed with the merge of #205 and now allows for the start of integrating the use of queues into the priority plugin. I think there are two prongs to adding queue support into the plugin:

the use of a queue's associated priority to further increase or decrease the priority of a job submitted from a user/bank (which is the focus of this PR), and
the enforcement of per-job limits using a queue's associated limits (the work of a future PR)

This PR looks to integrate queue information into the priority calculation of a job in the multi-factor priority plugin. At a high level, the goal is to use queue information to further calculate a job priority using the information from the queue_table in the flux-accounting database, which for each queue, contains an associated priority and a number of limits. A queue can be specified on the command line like so (I looked at this flux-sched file for reference):

$ flux mini submit --setattr=system.queue=expedite -n1 hostname

The plugin will both 1) verify that the queue exists and is a valid queue for the user/bank to run jobs in, and 2) use the associated priority of the queue to help calculate its priority. In the above example, an expedite queue will most likely have a high integer priority, so it will increase the priority of the submitted job.

In the bulk update script, a new payload is generated containing information of all the queues in the queue_table and sends it to the plugin. A new callback is also added in the plugin to unpack this information and place it in a map with the queue name as the key, and a queue_info struct as the value, which has the following members:

struct queue_info {
    int min_nodes_per_job;
    int max_nodes_per_job;
    int max_time_per_job;
    int priority;
};

Two new members are also added to the bank_info struct, so for every user/bank, information pertaining to what queues they are allowed to submit jobs in, as well as the integer priority associated with the queue they passed in when submitting a job are included in the struct.

I made a change to the final result of the priority calculation to no longer return the absolute value of an integer priority, but to instead check to see if the result is < 0, and if so, just return FLUX_JOB_PRIORITY_MIN, since if a low priority queue decreases the priority of a job so much so that the resulting priority is negative, then I think the right behavior would be to just return the minimum priority.

I'm posting this now and would be interested to see if @ryanday36 has any high-level feedback on how the queue priority calculation is implemented. Do the additions proposed in this PR sound reasonable at a high level and in-line to how queue priority affects a single job's priority on our current systems? Let me know.

Fixes #165
Fixes #204

ryanday36 · 2022-03-07T17:01:16Z

This looks good @cmoussa1! If the priority of a job can't be < 0, it's probably worth adding checks to things like the queue priority factor to ensure that they aren't <0, or at least a warning that total priority will never be <0. If someone defines a queue with a priority of -1000, it sounds like it might not do what they expect. Incidentally, are you planning to have (or do you already have) a way to define weights on the various factors in the multi factor priority plugin? i.e. if I want the queue priority to have 100x the effect of the fairshare priority, would I just use values for the queue priority that are 100x the fairshare range, or is there another way to achieve that? My other main question is, if a user asks for a queue that they don't have access to, does the priority plugin reject that job, or just assign it a priority of 0 (hold it)? I lean toward rejecting the job, but there could be an argument for holding the job.

I have some other, more general questions about how queues are expected to work in flux. Mostly around what happens if a user doesn't specify a queue or if they specify a queue that doesn't exist. I'm not sure how much these fall into this PR though, so feel free to send me off to other places to ask them. Is there a default queue that we can assign a priority to, or do priorities have to be assigned relative to 0 (or whatever a user gets if they don't ask for a queue with --setattr=system.queue=...)? At the other end of the spectrum, what happens if a user submits a job with --setattr=system.queue=foo and 'foo' is not a defined queue? Does flux core reject that job, or is that something that you would be picking up as part of seeing if a user has access to the queue?

cmoussa1 · 2022-03-07T18:04:42Z

Thanks for the review @ryanday36! I'll try to answer your questions as best I can but feel free to ask any follow-ups if I didn't do a good job answering them.

If the priority of a job can't be < 0, it's probably worth adding checks to things like the queue priority factor to ensure that they aren't <0, or at least a warning that total priority will never be <0. If someone defines a queue with a priority of -1000, it sounds like it might not do what they expect.

Oh, interesting. Thanks for clarifying this. I don't know why I thought you could define a queue with a negative priority; perhaps I should just remove that capability then and check that queue priorities can't be < 0. I'm assuming that it's more common to define a queue with a priority == 0 rather than < 0 then?

Incidentally, are you planning to have (or do you already have) a way to define weights on the various factors in the multi factor priority plugin? i.e. if I want the queue priority to have 100x the effect of the fairshare priority, would I just use values for the queue priority that are 100x the fairshare range, or is there another way to achieve that?

Yeah, this is actually something I wanted to work on this month. Right now, the weights for the various factors are defined in the plugin itself and cannot be changed. Perhaps I should create a plugin_factor_table in the flux-accounting DB where you can edit the weights for the factors used in the plugin, which can then be periodically sent to the plugin during the bulk update.

My other main question is, if a user asks for a queue that they don't have access to, does the priority plugin reject that job, or just assign it a priority of 0 (hold it)? I lean toward rejecting the job, but there could be an argument for holding the job.

In this PR, my proposal for handling the case where a user asks for a queue that they don't have access to is to just reject the job with a message saying that the user doesn't have permission to submit jobs in said queue.

Is there a default queue that we can assign a priority to, or do priorities have to be assigned relative to 0 (or whatever a user gets if they don't ask for a queue with --setattr=system.queue=...)?

I'm not super knowledgeable about this either. If the plan is to have a default queue, I know from a flux-accounting standpoint, that shouldn't be too hard to add, for I can add default as a valid queue for all users to submit jobs in and allow a scheduler operator or sys admin define a priority for that default queue, similar to any other queue. Then, when users want to submit a job in a default queue (either explicitly setting it with --setattr=system.queue=default or not specifying a queue at all), the plugin can use default and whatever priority is associated with it.

At the other end of the spectrum, what happens if a user submits a job with --setattr=system.queue=foo and 'foo' is not a defined queue? Does flux core reject that job, or is that something that you would be picking up as part of seeing if a user has access to the queue?

This PR proposes rejecting a job when a user submits a job with a queue name that doesn't exist. It will output a message saying Queue does not exist.

ryanday36 · 2022-03-07T18:34:16Z

That all sounds good @cmoussa1. Regarding this:

Oh, interesting. Thanks for clarifying this. I don't know why I thought you could define a queue with a negative priority; perhaps I should just remove that capability then and check that queue priorities can't be < 0. I'm assuming that it's more common to define a queue with a priority == 0 rather than < 0 then?

and default queues, I see two potential ways to implement queues. What we've done in Slurm and LSF is generally we have a default queue / qos with some large priority, an expedite queue/qos with an even higher priority, and a standby queue/qos with essentially zero priority. So, in that sort of an arrangement none of the priorities are < 0, but it requires the ability to have a default queue with a large priority factor. Another alternative would be for users who don't specify a queue to not get anything added to the overall priority (effectively an implied default queue with a priority of 0) and have something like an expedite queue with a large positive priority factor and a standby queue with a large negative priority factor. In that case, you don't need to define a default queue, but you do need to allow negative priorities to allow you to prioritize jobs in that standby queue. Either of those approaches will work a it's basically just adding a constant to job priority, but I wanted to be more clear on which one we're going to be using.

garlick · 2022-03-07T19:48:45Z

Keep in mind there is flux job urgency which lets a user or sys admin set the job urgency, one input to the multi-factor calculation of priority, and the only one besides submit time if flux accounting is not part of the config. It has a range of 0-31, with special values for hold and expedite, and the upper range only settable by sys admins. See https://flux-framework.readthedocs.io/projects/flux-rfc/en/latest/spec_30.html

Also the calculated priority is used to order jobs in the one queue that flux-core supports, that is shown by flux jobs. If you're using the priority also to order the job within multiple queues supported by fluxion, then the main queue order is going to look weird I guess.

Apologies if I'm stating the obvious as I haven't been following this closely. OK. This non-sequitur was brought to you by...tight deadlines!

ryanday36 · 2022-03-07T21:23:30Z

Keep in mind there is flux job urgency which lets a user or sys admin set the job urgency

I'd forgotten about the expedite functionality of urgency. That is a good 'big hammer' to have.

If you're using the priority also to order the job within multiple queues supported by fluxion, then the main queue order is going to look weird I guess.

This is something that we should probably track elsewhere, but we are going to want flux jobs to have options to filter jobs by queue once we start using multiple queues.

cmoussa1 · 2022-04-12T23:26:14Z

OK, now that I've gotten some feedback from @ryanday36, I'll take this PR out of [WIP]. I've proposed the following changes based on the feedback I've received so far:

the tests that involve defining a queue with a negative priority have been adjusted to use a queue with a smaller, positive integer priority
if a user/bank combo submits a job without specifying a queue to run in, the plugin will look for a default queue and use its associated priority to calculate the priority of a job. This allows a sys admin or a scheduler operator to assign some base priority for the default queue, and customize associated priorities for all other queues. The default queue is not created in the flux-accounting database by default, however, and it is the responsibility of the admin or scheduler operator to create a default queue and assign a base priority. If no default queue is defined and a user/bank combo tries to submit a job in this queue, the job will be rejected (or have an exception raised in job.new and job.state.priority) with a message saying the the default queue does not exist. I could definitely see the argument for automatically creating this default queue (and assigning a priority of 0 or something) when the database is first created, so if that is preferred, I don't mind adding that functionality as well. Then, the sys admin or scheduler operator doesn't have to remember to create the default queue themself.

grondo

This LGTM! Just a few comments inline.

One unrelated thing that comes to mind -- and I'm sure you already have this covered somewhere, but I'm curious -- is the job age or submit time taken into account as one of the factors in the priority calculation? If not, should we think about getting that in soon, or else could newly submitted jobs always overtake jobs that have been sitting in the queue a very long time? (Sorry for the unrelated question!)

grondo · 2022-04-13T22:31:52Z

src/plugins/mf_priority.cpp

                            "userid", &uid,
                            "bank", &bank,
                            "def_bank", &def_bank,
                            "fairshare", &fshare,
                            "max_running_jobs", &max_running_jobs,
-                            "max_active_jobs", &max_active_jobs) < 0)
+                            "max_active_jobs", &max_active_jobs,
+                            "queues", &queues) < 0)


Just a question: Would it be simpler to require queues to be a JSON array in the protocol here? Rather than a comma separated string which is unpacked by hand below?

(Just a question for discussion, not a real comment)

grondo · 2022-04-13T22:34:34Z

src/plugins/mf_priority.cpp

+        s_stream << queues; // create string stream from the string
+        while (s_stream.good ()) {
+            std::string substr;
+            getline (s_stream, substr, ','); // get string delimited by comma
+            b->queues.push_back (substr);
+        }
+


If you keep this method of splitting queues into a vector from comma separated string, it might be better to put this block into its own function, e.g. string_split () or something, in case it can be reused.

I agree, this should be placed into a separate function. I just squashed and pushed a change that moves this to a helper function.

grondo · 2022-04-13T22:42:58Z

src/plugins/mf_priority.cpp

+                                             "mf_priority", 0,
+                                             "Queue does not exist");


User friendly addition, consider adding the queue name to the error message, e.g.

"Queue '%s' does not exist", queue,

grondo · 2022-04-13T22:43:30Z

src/plugins/mf_priority.cpp

+            } else if (bank_it->second.queue_factor == INVALID_QUEUE) {
+                flux_jobtap_raise_exception (p, FLUX_JOBTAP_CURRENT_JOB,
+                                             "mf_priority", 0,
+                                             "Queue not valid for user");


Similar to above, consider adding the queue name to the exception text

grondo · 2022-04-13T22:44:48Z

src/plugins/mf_priority.cpp

+    if (bank_it->second.queue_factor == NO_SUCH_QUEUE)
+        return flux_jobtap_reject_job (p, args, "Queue does not exist");
+    else if (bank_it->second.queue_factor == INVALID_QUEUE)
+        return flux_jobtap_reject_job (p, args, "Queue not valid for user");
+    else if (bank_it->second.queue_factor == NO_DEFAULT_QUEUE)
+        return flux_jobtap_reject_job (p, args, "No default queue exists");


See above for suggestions on adding queue names to error output (more user friendly)

Agreed. I squashed and pushed a change to include the queue name in the error message where appropriate throughout the callbacks where it is used. Thanks for this suggestion!

grondo · 2022-04-13T22:47:23Z

src/plugins/mf_priority.cpp

+        // fetch priority associated with passed-in queue (or default queue)
+        bank_it->second.queue_factor = get_queue_info (queue, bank_it);
+
+        if (bank_it->second.queue_factor == NO_SUCH_QUEUE) {
+            flux_jobtap_raise_exception (p, FLUX_JOBTAP_CURRENT_JOB,
+                                         "mf_priority", 0,
+                                         "job.new: Queue does not exist");
+            return -1;
+        } else if (bank_it->second.queue_factor == INVALID_QUEUE) {
+            flux_jobtap_raise_exception (p, FLUX_JOBTAP_CURRENT_JOB,
+                                         "mf_priority", 0,
+                                         "job.new: Queue not valid for user");
+            return -1;
+        }
+        else if (bank_it->second.queue_factor == NO_DEFAULT_QUEUE) {
+            flux_jobtap_raise_exception (p, FLUX_JOBTAP_CURRENT_JOB,
+                                         "mf_priority", 0,
+                                         "job.new: No default queue exists");
+            return -1;
+        }
+


I'm skimming, but this block of code looks similar enough to the other blocks that I wonder if this could be refactored into a function. A prefix argument could be passed to the function to optionally add job.new to the generated job exceptions, or the error message could be passed back in flux_error_t or similar container.

At this point I think this refactoring would be optional, but I thought I'd point it out.

I think you're absolutely right, and upon playing around with it for a bit, I think I was able to successfully convert this into its own helper function that can be called with an optional prefix to denote what the state that the job was in when the error message was generated; I squashed and pushed that change up.

Add two new members to the bank_info struct: - queues, a vector of strings which will hold all available queues a particular user/bank row can run jobs in. The queues are passed in as a comma-delimited string, and then parsed and pushed one-by-one into the "queues" member in the bank_info struct. - queue_factor: an integer to hold the associated priority of a queue passed in from a user/bank job.

Add the values from the queues column in the association_table to the RPC that is sent from the database to the priority plugin. This column represents the available queues to each user/bank row in the association_table. Add "queues" key-value pairs to the sample payloads in existing sharness tests. Add a "default" queue that won't affect the results of calculating priorities in the existing sharness tests.

Add another section to bulk_update() which grabs queue information from the queue_table and sends it to the priority plugin.

Add a new callback function to the plugin which will receive and store queue information from the flux-accounting database to a map with the name of a queue as the key, and a struct of information about that queue as the value. These queue values will be used to further calculate job priorities if one is passed in.

Add validation for an optional queue argument when a job is submitted. The queue is first checked to exist in the queues map. It is then checked to determine if it is a valid queue for a user/bank to specify when submitting their job. If no queue is specified, the plugin will look for a "default" queue and use its associated priority. If no default queue is added, jobs trying to use this default queue will be rejected with a message saying that no default queue exists. It is up to the sys admin or scheduler operator to ensure that at least a default queue exists in the queue_table of the flux-accounting DB. If all checks pass, the queue's associated integer priority is added to the bank_info struct for the user/bank job.

codecov · 2022-04-14T19:15:31Z

Codecov Report

Merging #207 (d0bc52a) into master (d659500) will increase coverage by 0.10%.
The diff coverage is 85.18%.

❗ Current head d0bc52a differs from pull request most recent head 76a897e. Consider uploading reports for the commit 76a897e to get more accurate results

@@            Coverage Diff             @@
##           master     #207      +/-   ##
==========================================
+ Coverage   83.94%   84.04%   +0.10%     
==========================================
  Files          23       23              
  Lines        1090     1166      +76     
==========================================
+ Hits          915      980      +65     
- Misses        175      186      +11

Impacted Files	Coverage Δ
src/plugins/mf_priority.cpp	`86.39% <85.18%> (-0.34%)`	⬇️

cmoussa1 · 2022-04-14T19:33:01Z

Thanks for reviewing and approving this @grondo! I took a look at your great suggestions and pushed up some changes per your feedback. I can set MWP shortly.

One unrelated thing that comes to mind -- and I'm sure you already have this covered somewhere, but I'm curious -- is the job age or submit time taken into account as one of the factors in the priority calculation? If not, should we think about getting that in soon, or else could newly submitted jobs always overtake jobs that have been sitting in the queue a very long time? (Sorry for the unrelated question!)

Yeah, job age is mentioned in the gap analysis issue in #8, but I haven't circled back to that yet because of the work on user/queue-limits. Sorry about that! And thanks for pointing it out. I totally agree, I should work on that perhaps this month and see if I can make some progress on adding it to the priority plugin.

cmoussa1 added new feature new feature high priority items that must be worked on for major milestones labels Feb 14, 2022

cmoussa1 force-pushed the add.queue.priority branch 2 times, most recently from b55eb4c to 02fac6f Compare February 25, 2022 23:06

cmoussa1 mentioned this pull request Mar 4, 2022

support reading data from TOML files #216

Open

cmoussa1 force-pushed the add.queue.priority branch from 02fac6f to 42100bf Compare March 7, 2022 16:29

cmoussa1 force-pushed the add.queue.priority branch from 42100bf to 9bce842 Compare March 7, 2022 18:08

cmoussa1 force-pushed the add.queue.priority branch 2 times, most recently from aa3a10c to 601190e Compare March 7, 2022 22:35

garlick mentioned this pull request Mar 7, 2022

use case: heterogeneous clusters flux-framework/flux-core#4143

Closed

cmoussa1 force-pushed the add.queue.priority branch 2 times, most recently from ae281c3 to fd093f4 Compare March 8, 2022 18:40

cmoussa1 changed the title ~~[WIP] plugin: add queue priority to priority calculation, plugin~~ plugin: add queue priority to priority calculation, plugin Mar 8, 2022

cmoussa1 marked this pull request as ready for review March 8, 2022 20:47

cmoussa1 requested a review from grondo March 8, 2022 20:47

cmoussa1 mentioned this pull request Mar 16, 2022

[WIP] priority plugin: add configurable plugin factor weights for priority calculation #220

Closed

cmoussa1 removed the request for review from grondo March 21, 2022 15:06

cmoussa1 changed the title ~~plugin: add queue priority to priority calculation, plugin~~ [WIP] plugin: add queue priority to priority calculation, plugin Mar 21, 2022

cmoussa1 marked this pull request as draft March 21, 2022 15:07

cmoussa1 force-pushed the add.queue.priority branch 3 times, most recently from 76e9da9 to 1be8b6d Compare April 12, 2022 22:53

cmoussa1 changed the title ~~[WIP] plugin: add queue priority to priority calculation, plugin~~ plugin: add queue priority to priority calculation, plugin Apr 12, 2022

cmoussa1 marked this pull request as ready for review April 12, 2022 23:23

cmoussa1 requested a review from grondo April 13, 2022 18:43

grondo approved these changes Apr 14, 2022

View reviewed changes

cmoussa1 added 4 commits April 14, 2022 11:25

bulk_update: add queue payload section

9161d21

Add another section to bulk_update() which grabs queue information from the queue_table and sends it to the priority plugin.

cmoussa1 force-pushed the add.queue.priority branch from 1be8b6d to fcfac06 Compare April 14, 2022 19:00

cmoussa1 added 3 commits April 14, 2022 12:09

plugin: add queue_factor to priority calculation

7255493

t: add queue priority sharness tests

76a897e

cmoussa1 force-pushed the add.queue.priority branch from fcfac06 to 76a897e Compare April 14, 2022 19:15

cmoussa1 added the merge-when-passing label Apr 14, 2022

mergify bot merged commit 9c2ad6c into flux-framework:master Apr 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

plugin: add queue priority to priority calculation, plugin #207

plugin: add queue priority to priority calculation, plugin #207

cmoussa1 commented Feb 14, 2022 •

edited

Loading

ryanday36 commented Mar 7, 2022

cmoussa1 commented Mar 7, 2022

ryanday36 commented Mar 7, 2022

garlick commented Mar 7, 2022 •

edited

Loading

ryanday36 commented Mar 7, 2022

cmoussa1 commented Apr 12, 2022

grondo left a comment

grondo Apr 13, 2022

grondo Apr 13, 2022

cmoussa1 Apr 14, 2022

grondo Apr 13, 2022

grondo Apr 13, 2022

grondo Apr 13, 2022

cmoussa1 Apr 14, 2022

grondo Apr 13, 2022

cmoussa1 Apr 14, 2022

codecov bot commented Apr 14, 2022

cmoussa1 commented Apr 14, 2022

plugin: add queue priority to priority calculation, plugin #207

plugin: add queue priority to priority calculation, plugin #207

Conversation

cmoussa1 commented Feb 14, 2022 • edited Loading

Background

ryanday36 commented Mar 7, 2022

cmoussa1 commented Mar 7, 2022

ryanday36 commented Mar 7, 2022

garlick commented Mar 7, 2022 • edited Loading

ryanday36 commented Mar 7, 2022

cmoussa1 commented Apr 12, 2022

grondo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Apr 14, 2022

Codecov Report

cmoussa1 commented Apr 14, 2022

cmoussa1 commented Feb 14, 2022 •

edited

Loading

garlick commented Mar 7, 2022 •

edited

Loading