Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

plugin: add queue priority to priority calculation, plugin #207

Merged
merged 7 commits into from
Apr 14, 2022

Conversation

cmoussa1
Copy link
Member

@cmoussa1 cmoussa1 commented Feb 14, 2022

Background

The cleanup work described in #204 has now been completed with the merge of #205 and now allows for the start of integrating the use of queues into the priority plugin. I think there are two prongs to adding queue support into the plugin:

  1. the use of a queue's associated priority to further increase or decrease the priority of a job submitted from a user/bank (which is the focus of this PR), and
  2. the enforcement of per-job limits using a queue's associated limits (the work of a future PR)

This PR looks to integrate queue information into the priority calculation of a job in the multi-factor priority plugin. At a high level, the goal is to use queue information to further calculate a job priority using the information from the queue_table in the flux-accounting database, which for each queue, contains an associated priority and a number of limits. A queue can be specified on the command line like so (I looked at this flux-sched file for reference):

$ flux mini submit --setattr=system.queue=expedite -n1 hostname

The plugin will both 1) verify that the queue exists and is a valid queue for the user/bank to run jobs in, and 2) use the associated priority of the queue to help calculate its priority. In the above example, an expedite queue will most likely have a high integer priority, so it will increase the priority of the submitted job.

In the bulk update script, a new payload is generated containing information of all the queues in the queue_table and sends it to the plugin. A new callback is also added in the plugin to unpack this information and place it in a map with the queue name as the key, and a queue_info struct as the value, which has the following members:

struct queue_info {
    int min_nodes_per_job;
    int max_nodes_per_job;
    int max_time_per_job;
    int priority;
};

Two new members are also added to the bank_info struct, so for every user/bank, information pertaining to what queues they are allowed to submit jobs in, as well as the integer priority associated with the queue they passed in when submitting a job are included in the struct.

I made a change to the final result of the priority calculation to no longer return the absolute value of an integer priority, but to instead check to see if the result is < 0, and if so, just return FLUX_JOB_PRIORITY_MIN, since if a low priority queue decreases the priority of a job so much so that the resulting priority is negative, then I think the right behavior would be to just return the minimum priority.

I'm posting this now and would be interested to see if @ryanday36 has any high-level feedback on how the queue priority calculation is implemented. Do the additions proposed in this PR sound reasonable at a high level and in-line to how queue priority affects a single job's priority on our current systems? Let me know.

Fixes #165
Fixes #204

@cmoussa1 cmoussa1 added new feature new feature high priority items that must be worked on for major milestones labels Feb 14, 2022
@cmoussa1 cmoussa1 force-pushed the add.queue.priority branch 2 times, most recently from b55eb4c to 02fac6f Compare February 25, 2022 23:06
@cmoussa1 cmoussa1 force-pushed the add.queue.priority branch from 02fac6f to 42100bf Compare March 7, 2022 16:29
@ryanday36
Copy link
Contributor

This looks good @cmoussa1! If the priority of a job can't be < 0, it's probably worth adding checks to things like the queue priority factor to ensure that they aren't <0, or at least a warning that total priority will never be <0. If someone defines a queue with a priority of -1000, it sounds like it might not do what they expect. Incidentally, are you planning to have (or do you already have) a way to define weights on the various factors in the multi factor priority plugin? i.e. if I want the queue priority to have 100x the effect of the fairshare priority, would I just use values for the queue priority that are 100x the fairshare range, or is there another way to achieve that? My other main question is, if a user asks for a queue that they don't have access to, does the priority plugin reject that job, or just assign it a priority of 0 (hold it)? I lean toward rejecting the job, but there could be an argument for holding the job.

I have some other, more general questions about how queues are expected to work in flux. Mostly around what happens if a user doesn't specify a queue or if they specify a queue that doesn't exist. I'm not sure how much these fall into this PR though, so feel free to send me off to other places to ask them. Is there a default queue that we can assign a priority to, or do priorities have to be assigned relative to 0 (or whatever a user gets if they don't ask for a queue with --setattr=system.queue=...)? At the other end of the spectrum, what happens if a user submits a job with --setattr=system.queue=foo and 'foo' is not a defined queue? Does flux core reject that job, or is that something that you would be picking up as part of seeing if a user has access to the queue?

@cmoussa1
Copy link
Member Author

cmoussa1 commented Mar 7, 2022

Thanks for the review @ryanday36! I'll try to answer your questions as best I can but feel free to ask any follow-ups if I didn't do a good job answering them.

If the priority of a job can't be < 0, it's probably worth adding checks to things like the queue priority factor to ensure that they aren't <0, or at least a warning that total priority will never be <0. If someone defines a queue with a priority of -1000, it sounds like it might not do what they expect.

Oh, interesting. Thanks for clarifying this. I don't know why I thought you could define a queue with a negative priority; perhaps I should just remove that capability then and check that queue priorities can't be < 0. I'm assuming that it's more common to define a queue with a priority == 0 rather than < 0 then?

Incidentally, are you planning to have (or do you already have) a way to define weights on the various factors in the multi factor priority plugin? i.e. if I want the queue priority to have 100x the effect of the fairshare priority, would I just use values for the queue priority that are 100x the fairshare range, or is there another way to achieve that?

Yeah, this is actually something I wanted to work on this month. Right now, the weights for the various factors are defined in the plugin itself and cannot be changed. Perhaps I should create a plugin_factor_table in the flux-accounting DB where you can edit the weights for the factors used in the plugin, which can then be periodically sent to the plugin during the bulk update.

My other main question is, if a user asks for a queue that they don't have access to, does the priority plugin reject that job, or just assign it a priority of 0 (hold it)? I lean toward rejecting the job, but there could be an argument for holding the job.

In this PR, my proposal for handling the case where a user asks for a queue that they don't have access to is to just reject the job with a message saying that the user doesn't have permission to submit jobs in said queue.

Is there a default queue that we can assign a priority to, or do priorities have to be assigned relative to 0 (or whatever a user gets if they don't ask for a queue with --setattr=system.queue=...)?

I'm not super knowledgeable about this either. If the plan is to have a default queue, I know from a flux-accounting standpoint, that shouldn't be too hard to add, for I can add default as a valid queue for all users to submit jobs in and allow a scheduler operator or sys admin define a priority for that default queue, similar to any other queue. Then, when users want to submit a job in a default queue (either explicitly setting it with --setattr=system.queue=default or not specifying a queue at all), the plugin can use default and whatever priority is associated with it.

At the other end of the spectrum, what happens if a user submits a job with --setattr=system.queue=foo and 'foo' is not a defined queue? Does flux core reject that job, or is that something that you would be picking up as part of seeing if a user has access to the queue?

This PR proposes rejecting a job when a user submits a job with a queue name that doesn't exist. It will output a message saying Queue does not exist.

@cmoussa1 cmoussa1 force-pushed the add.queue.priority branch from 42100bf to 9bce842 Compare March 7, 2022 18:08
@ryanday36
Copy link
Contributor

That all sounds good @cmoussa1. Regarding this:

Oh, interesting. Thanks for clarifying this. I don't know why I thought you could define a queue with a negative priority; perhaps I should just remove that capability then and check that queue priorities can't be < 0. I'm assuming that it's more common to define a queue with a priority == 0 rather than < 0 then?

and default queues, I see two potential ways to implement queues. What we've done in Slurm and LSF is generally we have a default queue / qos with some large priority, an expedite queue/qos with an even higher priority, and a standby queue/qos with essentially zero priority. So, in that sort of an arrangement none of the priorities are < 0, but it requires the ability to have a default queue with a large priority factor. Another alternative would be for users who don't specify a queue to not get anything added to the overall priority (effectively an implied default queue with a priority of 0) and have something like an expedite queue with a large positive priority factor and a standby queue with a large negative priority factor. In that case, you don't need to define a default queue, but you do need to allow negative priorities to allow you to prioritize jobs in that standby queue. Either of those approaches will work a it's basically just adding a constant to job priority, but I wanted to be more clear on which one we're going to be using.

@garlick
Copy link
Member

garlick commented Mar 7, 2022

Keep in mind there is flux job urgency which lets a user or sys admin set the job urgency, one input to the multi-factor calculation of priority, and the only one besides submit time if flux accounting is not part of the config. It has a range of 0-31, with special values for hold and expedite, and the upper range only settable by sys admins. See https://flux-framework.readthedocs.io/projects/flux-rfc/en/latest/spec_30.html

Also the calculated priority is used to order jobs in the one queue that flux-core supports, that is shown by flux jobs. If you're using the priority also to order the job within multiple queues supported by fluxion, then the main queue order is going to look weird I guess.

Apologies if I'm stating the obvious as I haven't been following this closely. OK. This non-sequitur was brought to you by...tight deadlines!

@ryanday36
Copy link
Contributor

Keep in mind there is flux job urgency which lets a user or sys admin set the job urgency

I'd forgotten about the expedite functionality of urgency. That is a good 'big hammer' to have.

If you're using the priority also to order the job within multiple queues supported by fluxion, then the main queue order is going to look weird I guess.

This is something that we should probably track elsewhere, but we are going to want flux jobs to have options to filter jobs by queue once we start using multiple queues.

@cmoussa1 cmoussa1 force-pushed the add.queue.priority branch 2 times, most recently from aa3a10c to 601190e Compare March 7, 2022 22:35
@cmoussa1 cmoussa1 force-pushed the add.queue.priority branch 2 times, most recently from ae281c3 to fd093f4 Compare March 8, 2022 18:40
@cmoussa1 cmoussa1 changed the title [WIP] plugin: add queue priority to priority calculation, plugin plugin: add queue priority to priority calculation, plugin Mar 8, 2022
@cmoussa1 cmoussa1 marked this pull request as ready for review March 8, 2022 20:47
@cmoussa1 cmoussa1 requested a review from grondo March 8, 2022 20:47
@cmoussa1 cmoussa1 removed the request for review from grondo March 21, 2022 15:06
@cmoussa1 cmoussa1 changed the title plugin: add queue priority to priority calculation, plugin [WIP] plugin: add queue priority to priority calculation, plugin Mar 21, 2022
@cmoussa1 cmoussa1 marked this pull request as draft March 21, 2022 15:07
@cmoussa1 cmoussa1 force-pushed the add.queue.priority branch 3 times, most recently from 76e9da9 to 1be8b6d Compare April 12, 2022 22:53
@cmoussa1 cmoussa1 changed the title [WIP] plugin: add queue priority to priority calculation, plugin plugin: add queue priority to priority calculation, plugin Apr 12, 2022
@cmoussa1 cmoussa1 marked this pull request as ready for review April 12, 2022 23:23
@cmoussa1
Copy link
Member Author

OK, now that I've gotten some feedback from @ryanday36, I'll take this PR out of [WIP]. I've proposed the following changes based on the feedback I've received so far:

  • the tests that involve defining a queue with a negative priority have been adjusted to use a queue with a smaller, positive integer priority
  • if a user/bank combo submits a job without specifying a queue to run in, the plugin will look for a default queue and use its associated priority to calculate the priority of a job. This allows a sys admin or a scheduler operator to assign some base priority for the default queue, and customize associated priorities for all other queues. The default queue is not created in the flux-accounting database by default, however, and it is the responsibility of the admin or scheduler operator to create a default queue and assign a base priority. If no default queue is defined and a user/bank combo tries to submit a job in this queue, the job will be rejected (or have an exception raised in job.new and job.state.priority) with a message saying the the default queue does not exist. I could definitely see the argument for automatically creating this default queue (and assigning a priority of 0 or something) when the database is first created, so if that is preferred, I don't mind adding that functionality as well. Then, the sys admin or scheduler operator doesn't have to remember to create the default queue themself.

@cmoussa1 cmoussa1 requested a review from grondo April 13, 2022 18:43
Copy link
Contributor

@grondo grondo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM! Just a few comments inline.

One unrelated thing that comes to mind -- and I'm sure you already have this covered somewhere, but I'm curious -- is the job age or submit time taken into account as one of the factors in the priority calculation? If not, should we think about getting that in soon, or else could newly submitted jobs always overtake jobs that have been sitting in the queue a very long time? (Sorry for the unrelated question!)

"userid", &uid,
"bank", &bank,
"def_bank", &def_bank,
"fairshare", &fshare,
"max_running_jobs", &max_running_jobs,
"max_active_jobs", &max_active_jobs) < 0)
"max_active_jobs", &max_active_jobs,
"queues", &queues) < 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a question: Would it be simpler to require queues to be a JSON array in the protocol here? Rather than a comma separated string which is unpacked by hand below?

(Just a question for discussion, not a real comment)

Comment on lines 157 to 260
s_stream << queues; // create string stream from the string
while (s_stream.good ()) {
std::string substr;
getline (s_stream, substr, ','); // get string delimited by comma
b->queues.push_back (substr);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you keep this method of splitting queues into a vector from comma separated string, it might be better to put this block into its own function, e.g. string_split () or something, in case it can be reused.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, this should be placed into a separate function. I just squashed and pushed a change that moves this to a helper function.

Comment on lines 375 to 376
"mf_priority", 0,
"Queue does not exist");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

User friendly addition, consider adding the queue name to the error message, e.g.

   "Queue '%s' does not exist", queue,

} else if (bank_it->second.queue_factor == INVALID_QUEUE) {
flux_jobtap_raise_exception (p, FLUX_JOBTAP_CURRENT_JOB,
"mf_priority", 0,
"Queue not valid for user");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to above, consider adding the queue name to the exception text

Comment on lines 507 to 550
if (bank_it->second.queue_factor == NO_SUCH_QUEUE)
return flux_jobtap_reject_job (p, args, "Queue does not exist");
else if (bank_it->second.queue_factor == INVALID_QUEUE)
return flux_jobtap_reject_job (p, args, "Queue not valid for user");
else if (bank_it->second.queue_factor == NO_DEFAULT_QUEUE)
return flux_jobtap_reject_job (p, args, "No default queue exists");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above for suggestions on adding queue names to error output (more user friendly)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I squashed and pushed a change to include the queue name in the error message where appropriate throughout the callbacks where it is used. Thanks for this suggestion!

Comment on lines 601 to 646
// fetch priority associated with passed-in queue (or default queue)
bank_it->second.queue_factor = get_queue_info (queue, bank_it);

if (bank_it->second.queue_factor == NO_SUCH_QUEUE) {
flux_jobtap_raise_exception (p, FLUX_JOBTAP_CURRENT_JOB,
"mf_priority", 0,
"job.new: Queue does not exist");
return -1;
} else if (bank_it->second.queue_factor == INVALID_QUEUE) {
flux_jobtap_raise_exception (p, FLUX_JOBTAP_CURRENT_JOB,
"mf_priority", 0,
"job.new: Queue not valid for user");
return -1;
}
else if (bank_it->second.queue_factor == NO_DEFAULT_QUEUE) {
flux_jobtap_raise_exception (p, FLUX_JOBTAP_CURRENT_JOB,
"mf_priority", 0,
"job.new: No default queue exists");
return -1;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm skimming, but this block of code looks similar enough to the other blocks that I wonder if this could be refactored into a function. A prefix argument could be passed to the function to optionally add job.new to the generated job exceptions, or the error message could be passed back in flux_error_t or similar container.

At this point I think this refactoring would be optional, but I thought I'd point it out.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're absolutely right, and upon playing around with it for a bit, I think I was able to successfully convert this into its own helper function that can be called with an optional prefix to denote what the state that the job was in when the error message was generated; I squashed and pushed that change up.

Add two new members to the bank_info struct:

- queues, a vector of strings which will hold all available
queues a particular user/bank row can run jobs in.
The queues are passed in as a comma-delimited string,
and then parsed and pushed one-by-one into the "queues"
member in the bank_info struct.

- queue_factor: an integer to hold the associated priority
of a queue passed in from a user/bank job.
Add the values from the queues column in the
association_table to the RPC that is sent from the database
to the priority plugin. This column represents the
available queues to each user/bank row in the
association_table.

Add "queues" key-value pairs to the sample payloads in
existing sharness tests.

Add a "default" queue that won't affect the results of
calculating priorities in the existing sharness tests.
Add another section to bulk_update() which grabs
queue information from the queue_table and sends it
to the priority plugin.
Add a new callback function to the plugin which
will receive and store queue information from the
flux-accounting database to a map with the name of a
queue as the key, and a struct of information about
that queue as the value.

These queue values will be used to further calculate job
priorities if one is passed in.
@cmoussa1 cmoussa1 force-pushed the add.queue.priority branch from 1be8b6d to fcfac06 Compare April 14, 2022 19:00
Add validation for an optional queue argument
when a job is submitted. The queue is first checked
to exist in the queues map. It is then checked to
determine if it is a valid queue for a user/bank to specify
when submitting their job.

If no queue is specified, the plugin will look for a "default"
queue and use its associated priority. If no default queue is
added, jobs trying to use this default queue will be rejected
with a message saying that no default queue exists. It is up
to the sys admin or scheduler operator to ensure that at least
a default queue exists in the queue_table of the flux-accounting
DB.

If all checks pass, the queue's associated integer priority
is added to the bank_info struct for the user/bank job.
@cmoussa1 cmoussa1 force-pushed the add.queue.priority branch from fcfac06 to 76a897e Compare April 14, 2022 19:15
@codecov
Copy link

codecov bot commented Apr 14, 2022

Codecov Report

Merging #207 (d0bc52a) into master (d659500) will increase coverage by 0.10%.
The diff coverage is 85.18%.

❗ Current head d0bc52a differs from pull request most recent head 76a897e. Consider uploading reports for the commit 76a897e to get more accurate results

@@            Coverage Diff             @@
##           master     #207      +/-   ##
==========================================
+ Coverage   83.94%   84.04%   +0.10%     
==========================================
  Files          23       23              
  Lines        1090     1166      +76     
==========================================
+ Hits          915      980      +65     
- Misses        175      186      +11     
Impacted Files Coverage Δ
src/plugins/mf_priority.cpp 86.39% <85.18%> (-0.34%) ⬇️

@cmoussa1
Copy link
Member Author

Thanks for reviewing and approving this @grondo! I took a look at your great suggestions and pushed up some changes per your feedback. I can set MWP shortly.

One unrelated thing that comes to mind -- and I'm sure you already have this covered somewhere, but I'm curious -- is the job age or submit time taken into account as one of the factors in the priority calculation? If not, should we think about getting that in soon, or else could newly submitted jobs always overtake jobs that have been sitting in the queue a very long time? (Sorry for the unrelated question!)

Yeah, job age is mentioned in the gap analysis issue in #8, but I haven't circled back to that yet because of the work on user/queue-limits. Sorry about that! And thanks for pointing it out. I totally agree, I should work on that perhaps this month and see if I can make some progress on adding it to the priority plugin.

@mergify mergify bot merged commit 9c2ad6c into flux-framework:master Apr 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
high priority items that must be worked on for major milestones merge-when-passing new feature new feature
Projects
None yet
4 participants