-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
plugin: add multifactor priority plugin #122
Conversation
I would definitely start with a sharness test. If you have not already, pull in sharness.sh and There are a few existing priority-order tests you can look to as examples in flux-core:
You could also easily create your fake database, and/or have bulk_update.py push contrived values into your plugin, etc. Basically anything you can script can be tested. |
08856f5
to
60954cf
Compare
d31f360
to
48f940d
Compare
OK, I've updated this PR with the following changes:
So I'll go ahead and bring this PR out of [WIP] and post it for official review shortly. I handled the multiple-bank case in the following way: When a user submits a job, the validate callback will try to unpack a bank from the jobspec and store it in a Eventually, I think the plugin should handle this case by not rejecting the job and just using a default bank (e.g setting the default to the first bank the user was added to in the DB), and I'll open an issue to track this so I can add it in the near future. [EDIT] Oh, one other thing I wanted to make sure I was doing correctly was compiling the plugin into a |
This sounds fine for now. I would think eventually administrators would want to be able to tweak the default bank for a user outside of the order in which the user is added to banks, so I would consider making the default bank a property of each user which could be separately set in the db.
You got most of the way there! You'll want to also add You can just copy the
And then use that on your plugin's LDFLAGS. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cmoussa1, I made a first pass with some comments. I mainly reviewed the multifactor plugin, since I'm not the right person to review the remainder of the code.
I don't understand enough of how the user+bank is used to get the fairshare factor to know if the use of a multimap is the right data structure here, it seems like it requires a little extra work currently.
It may be easier if you had two maps -- one that maps banks to fairshare value, and one that maps userid to a list of banks in which the user is a member. Then you can easily test bank membership, and get the bank fairshare value in a single lookup call.
I'd also suggest caching the bank along with the job, perhaps with flux_jobtap_job_aux_set()
so that you don't have to continually lookup the bank name for each job when you've already done it in the job.validate
callback.
I could be misunderstanding the role of the users map though, so just correct me if I'm wrong.
src/plugins/mf_priority.cpp
Outdated
char v[] = "job.validate"; | ||
char p[] = "job.state.priority"; | ||
char g[] = "job.priority.get"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't necessary (or should not be), just use these strings directly in tab[]
below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing this out - the reason I defined them is because C++ gave me a warning about converting strings to char *
if I just use the strings directly in tab[]
:
warning: ISO C++ forbids converting a string constant to ‘char*’
Is it OK to ignore this warning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, this is because the topic string in struct flux_plugin_handler
is char *topic
instead of const char *topic
. This is a bug in the flux_plugin_t
interface. I'll open a bug in flux-core and fix it, and then those warnings should be resolved.
Sorry about that!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries! Thanks for helping me walk through that. 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A fix is up for this at flux-framework/flux-core#3720
src/plugins/mf_priority.cpp
Outdated
/* | ||
* Clear the multimap datastructure to prepare it for the next bulk update. | ||
*/ | ||
static void clear_map_cb (flux_t *h, | ||
flux_msg_handler_t *mh, | ||
const flux_msg_t *msg, | ||
void *arg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you are going to run into trouble having a separate clear_map
RPC from update_database
. What happens if a job is submitted between the two? It will be better if clear+update are the same callback to avoid raciness like this. If you want the clear to be optional, that can be a flag in the RPC.
src/plugins/mf_priority.cpp
Outdated
return flux_jobtap_reject_job (p, args, | ||
"user not found in flux-accounting DB"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
flux_jobtap_reject_job()
can only be used from job.validate
(or job.dependency.*
), but priority_calculation()
looks like it can be called from job.priority.get
. In this case, you can return flux_jobtap_priority_unavail()
if the job is still in PRIORITY state. It looks like this can't happen though -- if the user does not exist in the db then the job would have been rejected by validate_cb
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes I think you're right - validate_cb ()
would have rejected the job first if the user does not exist in the DB, so I don't think there is a need for the flux_jobtap_reject_job ()
in priority_calculation ()
; I can remove that.
Thanks for your feedback so far @grondo - maybe it makes sense to continue talking out my design decision regarding the data structure used in the plugin before I make more changes.
So the idea of using a multimap data structure was to make use of the ability to have multiple entries with the same key in the data structure, i.e there could potentially be two entries in the multimap that have a key of
Using a multimap is the reason I needed to implement a Maybe it would be better to use your suggestion and use two maps, although I am trying to think of how to look up an individual user's fairshare value just using the bank they belong to. |
Ah, does each user,bank pair have a different fairshare value? Sorry, I'm not familiar with the overall architecture here so I probably shouldn't comment on specific design decisions!
Yes, however, I was mainly pointing out that having separate "clear" and "update" RPCs makes the plugin racy, since a number of things could happen in between the two RPCs (e.g. a job submission), whereas if you perform the clear in the update rpc, then you are safe from races since the job manager module/thread cannot do any asynchronous work while in your callback. Does that make sense? I would just combine the separate clear and update service callbacks into a single clear+update callback. You could make the clear optional in the future if necessary by adding a |
No, I appreciate you asking these questions and helping me think through these things! Yes, user-bank pairs should each have their own fairshare values.
And yeah, I think this makes sense. I should combine the two into just one clear-and-update RPC. I believe I'll need to think about how the |
It will be much better if all updates are sent in a single RPC if we can figure out how to do that. The mass of RPCs for a large system will be inefficient, plus the race condition you've noted with clearing the multimap first. One thing you could do is instead of using a multimap, use a map of maps (or hash of hashes). The outer map is would be by userid, and the inner map would be by bank name. This would allow you to update existing bank entries instead of clearing and re-adding them. Then the bulk-update tool could send a single RPC with a JSON object which contained all updated user/bank pairs, and the update RPC would just iterate over those and do the updates. |
Yup, absolutely. I don't think a The one drawback that I believe comes with using a map of maps arises when trying to access a "default" bank for a user. I think C++ maps are internally ordered by the key, so defaulting to the first bank found under the I wonder if adding a field in the user table denoting which bank should be treated as the default bank would be a good way to handle this. I could then send that data in the external update service, and in the internal map where the key is |
Or, you could reserve the bank name "default" and store a copy or reference of the default bank under |
a224336
to
aebafa2
Compare
Thanks for the helpful review so far - I've just pushed up a round of
|
0c90fb8
to
a2424a5
Compare
Now that the userid field exists in the job_usage_factor_table, edit the index of values to look for in the dataframe to be moved one to the right.
Problem: The current default value for the fairshare field in the association table is 0.0, which in terms of fairshare, indicates that the user has used way more resources than allocated, which is an inaccurate representation when a user is first added to the flux-accounting database. Solution: Change the default value for fairshare to 0.5, which means that the user has used exactly the amount of resources allocated to them.
Add a new plugin to flux-accounting: a multi-factor priority plugin which takes factors from a flux-accounting DB to generate an integer priority value for a submitte job.
Add an external service for the multi-factor priority plugin which is responsible for querying data from a flux-accounting DB and pushing it to the plugin.
Add an rc directory to load modules for running tests under a flux instance.
a2424a5
to
b69f252
Compare
Codecov Report
@@ Coverage Diff @@
## master #122 +/- ##
==========================================
+ Coverage 85.00% 85.53% +0.53%
==========================================
Files 17 18 +1
Lines 840 892 +52
==========================================
+ Hits 714 763 +49
- Misses 126 129 +3
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM! Feel free to set MWP when ready as far as I'm concerned. Thanks for all the work!
Thanks @grondo for all of your feedback and help! |
Oh, I think this might have to be manually merged since I made a change to |
Ah yes, sorry I missed that. I'll go ahead and press the merge button. |
Thank you! |
This is a very early first-cut, work-in-progress PR that adds multi-factor priority calculation to flux-accounting. It is still missing a couple of integral pieces, but I thought it might be best to post it early for the following reasons:
Background
A multifactor priority plugin is responsible for calculating a job's priority using a number of different factors that may help or hurt a job's overall priority. This PR starts with adding just two factors: fairshare and urgency. The intent is that once this PR becomes mergeable, we can start to add additional factors in follow-up PR's to create a more complete and solidified priority calculator.
The fairshare factor represents the ratio between the amount of allotted resources and the amount of resources actually consumed. It should range between 0.0 and 1.0, where 0.0 represents the lowest possible value (indicating a user who is using way more resources than allocated) and 1.0 represents the highest. When a user first gets added to the database, their fairshare value should be 0.5, which should mean the user is using exactly the amount of resources they were allocated.
The urgency factor is similar to the nice factor in Slurm in that it allows users to prioritize their own jobs with an integer in the range [1,16], where 16 is the highest priority and 1 is the lowest.
Together, these two factors make up a priority calculation using the following formula:
where:
As additional factors get added (job size, age, queue, etc.), they will be added to this equation to generate a more representative multi-factor priority.
Implementation
This PR adds a new folder to
flux-accounting/src/
calledplugins/
which stores both the priority plugin itself and the external service that pushes data to the plugin.bulk_update.py is the external service that pushes user accounting data from a flux-accounting DB to the priority plugin, where it is stored in a multimap data structure to be looked up when a job priority needs to be calculated.
It is a Python script that can be run from the command line:
It has one optional argument,
--path
, a path to a flux-accounting DB. If no--path
is specified, it will look for a default location of the flux-accounting DB, similar to the front-end Python command line tool flux-account.py.bulk_update.py runs one query to the flux-accounting DB to fetch every user row. Specifically, it queries the username and their respective fairshare value. It packs this data in a dictionary object and sends an RPC request to the plugin.
Just before it begins sending user data to the plugin, it sends an empty message to the plugin, telling it to clear its multimap's contents to prepare for an update.
mf_priority.cpp is the priority plugin file. It contains a number of callbacks and a helper function that is responsible for calculating the priority itself.
clear_map_cb ()
andrec_update_cb ()
are the functions responsible for clearing and updating the multimap data structure in the plugin and interacting with the external service, bulk_update.py.When a job gets submitted,
validate_cb ()
runs and performs a lookup in the multimap to look for the userid that submitted the job. If the userid cannot be found, the job is rejected with an error message explaining that the user couldn't be found in the flux-accounting DB.If a user is indeed in the flux-accounting DB, the
userid
andurgency
are extracted usingflux_plugin_arg_unpack ()
and sent to a helper functionpriority_calculation ()
, where its priority is calculated and returned, to be packed usingflux_plugin_arg_pack ()
.priority_calculation ()
is responsible for calculating a job's priority using the formula described in the Background. It currently covers two corner cases:If a user submits a job with an urgency of
0
, the priority returned isFLUX_JOB_PRIORITY_MIN
= 0. A job will remain as pending.If a job is submitted (or updated) with a max urgency of
31
, the priority returned isFLUX_JOB_PRIORITY_MAX
= 4294967295. This is the highest priority a job can have.If the submitted job is neither of these, then the calculation proceeds normally with the formula stated above. I think a note should be made that the calculated priority is an absolute value. I chose this in case the priority calculated turns out to be a negative value (I think the scenario in which this happens is if a user's fairshare value is so low that, when subtracted from the urgency, it results in a small negative integer; taking an absolute value should result in a very low integer priority).
Examples
Since this PR does not have testing yet, I thought it might be beneficial to provide at least a couple examples to show how the plugin works in its current state.
Here is how I compile the plugin, load it, and push data to it:
--urgency=15
--urgency=0
and updating it later with--urgency=31
Assumptions/Gaps/Testing
I'll use this section to document the gaps for this PR I can think of so far, the assumptions made, and any other questions or feedback I think I could use.
This is one of the larger gaps I can think of for this PR. I looked at the tests for the plugins in flux-core and am wondering how I should approach testing in the flux-accounting repo. Are sharness tests a good approach for my plugin?Maybe I could populate a fake database with some fake users and fairshare values and have them submit similar jobs and make sure their calculated priorities are as expected, but I guess I wasn't sure how I would do this if I am not in a flux instance.One of the assumptions I've made so far in this PR is that a user only belongs to one bank; with this assumption, bank information doesn't need to be passed from the DB to the plugin. However, users can belong to multiple banks, so I need to handle this case.In its current structure, a job could be submitted at the time just after the map data structure gets cleared and just before an update is pushed to the plugin. In this case, the job would be rejected saying that the user wasn't found in the flux-accounting DB. Would the correct way to handle this would be to instead
return flux_jobtap_priority_unavail ()
so that the submitted job will remain in thePRIORITY
state until a priority is assigned usingflux_jobtap_reprioritize_job ()
?The plugin declares the map as a global variable. I declared it as global because multiple functions and callbacks interact with and update this map. Is this okay behavior? Should it instead be declared and initialized in one of the callbacks and passed to other functions when it is needed?
These are the
TODOS
and clarifications I can think of so far, but maybe there are things I missed.PR Structure
I think this PR can be broken into the following parts:
userid
field to the flux-accounting DB.rc
directory)