Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

plugin: add a per-user/bank max jobs limit #131

Merged
merged 7 commits into from
Aug 31, 2021

Conversation

cmoussa1
Copy link
Member

@cmoussa1 cmoussa1 commented Jul 20, 2021

note: this PR is built off of the changes proposed in #130, so I will leave this one as [WIP] until that one lands.

#130 has landed, but there are a couple of questions that might alter the direction this PR takes, so I think I'll leave this PR as [WIP] until those questions are sorted out.

Background

The multi-factor priority plugin added in #122 currently just calculates a user's job priority and does not enforce any kind of per-user limits like the ones laid out in #121. Some of the limits that flux-accounting has a direct impact on enforcing are the following:

max_jobs: max number of submitted jobs per user

max_nodes: max number of nodes used across all of a user's running jobs

This PR looks to add enforcement of the max_jobs per-user/bank combination limit to the priority plugin.

Implementation

This PR looks to add limit support to the priority plugin by enforcing a limit on the maximum number of submitted jobs per user/bank combination.

In the flux-accounting DB, a new column is added to the association_table called max_jobs, which represents the maximum number of submitted jobs per user:

max_jobs  int(11)     DEFAULT 5   NOT NULL

In the plugin, this new max_jobs value is unpacked and stored in a new map-of-maps data structure, very similar to the one that stores user/bank/fairshare information. This value is then compared to the current number of submitted jobs, represented by another integer in the data structure. When a job is submitted and enters job.validate, the current number of submitted jobs is compared to the user's limit and is incremented if the user is under their limit. Once a job finishes running and enters job.state.inactive, the user's active count of submitted jobs is decremented. The decrement happens in a new callback added to the plugin called inactive_cb ().

If a user has reached their limit and tries to submit another job, the job will be rejected with an error message saying that their limit has been reached.

@cmoussa1 cmoussa1 added the new feature new feature label Jul 20, 2021
@cmoussa1 cmoussa1 force-pushed the add.max-jobs.limit branch from d99cdaf to 96d1435 Compare July 26, 2021 15:35
@grondo
Copy link
Contributor

grondo commented Jul 27, 2021

@cmoussa1, before I start a code review, what is the requirement being solved by this PR? (Sorry I haven't paid attention so need some help catching up)

This appears (to me) to apply a limit of active jobs per user,bank pair. Will we also have need of a global per user limit (this could be done in a different plugin). Will banks be hierarchical like in Slurm, and will we need hierarchical bank limits?

Sorry for being behind on these concepts! Thanks.

@cmoussa1 cmoussa1 force-pushed the add.max-jobs.limit branch from 96d1435 to 6882d48 Compare July 27, 2021 23:35
@cmoussa1
Copy link
Member Author

cmoussa1 commented Jul 27, 2021

No need to apologize @grondo! Please let me know if there's anything else I can help clarify along the way.

This PR aims to add support for a max jobs limit per user/bank pair (ex. I could have a max-jobs limit of 5 active jobs under bank A, but a max-jobs limit of 3 active jobs under bank B). I'll need to follow up if there is a need to enforce a global per-user max jobs limit; that is a good question. Banks are currently structured in a hierarchical fashion like Slurm, but I haven't been informed of a per-bank limit (at least yet, I should follow up on that as well).

EDIT: @grondo Let me do some digging and double check that this implementation that I have proposed is indeed the right direction to be heading - I'd hate to have you review something twice.

@cmoussa1 cmoussa1 marked this pull request as ready for review July 27, 2021 23:40
@cmoussa1
Copy link
Member Author

OK, I've confirmed that the PR proposed does align with intended goal of enforcing a per-user max jobs limit on just a user/bank combination, so I'll go ahead and take this out of [WIP].

@cmoussa1 cmoussa1 changed the title [WIP] plugin: add a per-user max jobs limit plugin: add a per-user max jobs limit Jul 28, 2021
@cmoussa1 cmoussa1 requested a review from grondo July 28, 2021 16:34
@grondo
Copy link
Contributor

grondo commented Jul 28, 2021

OK, I've confirmed that the PR proposed does align with intended goal of enforcing a per-user max jobs limit on just a user/bank combination, so I'll go ahead and take this out of [WIP].

Suggestion, then: retitle the PR to "plugin: add a per user,bank max jobs limit"

Edit: same with commit descriptions, be sure to be clear that this is a per user,bank pair max-jobs limit, not a per-user limit.

@cmoussa1 cmoussa1 changed the title plugin: add a per-user max jobs limit plugin: add a per-user/bank max jobs limit Jul 28, 2021
@cmoussa1 cmoussa1 force-pushed the add.max-jobs.limit branch from 6882d48 to 909d3dd Compare July 28, 2021 17:08
Copy link
Contributor

@grondo grondo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't made a full review yet, however I ran across one question below.

Comment on lines 215 to 247
mj_it = users_mj.find (userid);

mj_inner_it = (bank == NULL) ?
mj_it->second.find ("default") :
mj_it->second.find (bank);

if (mj_inner_it->second[1] >= mj_inner_it->second[0])
return flux_jobtap_reject_job (p, args,
"user has max number of jobs submitted");

// increment user's max_jobs count
mj_inner_it->second[1]++;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like there may be a problem, though I could be misunderstanding the code since I'm not familiar with C++ std::map:

It appears that current job counts are tracked separately for a default bank "name" versus literally the "default" bank. E.g. if a user has a default bank of "A" with a limit of 4 jobs, and submits 4 jobs to bank "A", then submits a job with no bank, then the "default" bank max-jobs will be checked which is currently zero, so the job will be accepted.

Sorry if I misunderstood the code.

In any event, I'd suggest a test to ensure that jobs submitted with a default bank definitely have the default bank limit applied when jobs have a mix of explicit bank versus no bank.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes. I think you are right. Two separate counts would be tracked in both the "default" key-value entry and the bank "name" key-value entry. So I'll need to re-evaluate my approach, and probably just keep one count of active jobs, regardless if the user is submitting under a default bank or explicitly setting their default bank with the bank "name."

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I would suggest is to have a map of userid->default_bank. When a bank is not provided, then you look up the default bank name and use that name in place of a provided bank name. Everything else then works the same, except there is never a userid,"default" user/bank pair.

One gotcha will be if the user's default bank changes between when a job starts and when it ends. You can't look up the default bank name at the end of the job from default_banks[userid] since it will have changed, and you will decrement the wrong entry. So, the default bank for a job should be somehow set in the job.validate callback, and used for that job from that point on.

You could do this at first by stashing the bank name in the job "aux" hash, i.e. flux_jobtap_job_aux_set(). If you do this for all cases (i.e. whether a bank is set of your are looking up the default bank), then in later callbacks you can fetch the bank name directly with flux_jobtap_job_aux_get(). Or, you could instead stash a "bank_struct" with information like:

struct bank_info {
    const char *name;
    double fairshare;
    int max_jobs;
    int current_jobs;
};

Then your inactive_cb could be simplified to something like:

static int inactive_cb (flux_plugin_t *p,
                        const char *topic,
                        flux_plugin_arg_t *args,
                        void *data)
{   
    struct bank_info *bank;
    
    if ((bank = flux_jobtap_job_aux_get (p,
                                         FLUX_JOBTAP_CURRENT_JOB,
                                         "mf_priority:bank")))
        bank->current_jobs--;
    
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I would suggest is to have a map of userid->default_bank. When a bank is not provided, then you look up the default bank name and use that name in place of a provided bank name. Everything else then works the same, except there is never a userid,"default" user/bank pair.

Thanks for this suggestion. I think I understand what you are suggesting. With this change, I'm able to test the use case of submitting a mix of both "explicitly-set bank" and "default bank" jobs that both contribute to one active job count with the following:

test_expect_success 'submit max number of jobs with a mix of default bank and explicity set bank' '
	jobid1=$(flux python ${SUBMIT_AS} 5011 sleep 60) &&
	jobid2=$(flux python ${SUBMIT_AS} 5011 --setattr=system.bank=account3 sleep 60) &&
	jobid3=$(flux python ${SUBMIT_AS} 5011 sleep 60)
'

test_expect_success 'submit a job while already having max number of active jobs' '
	test_must_fail flux python ${SUBMIT_AS} 5011 sleep 60 > max_jobs.out 2>&1 &&
	test_debug "cat max_jobs.out" &&
	grep "user has max number of jobs submitted" max_jobs.out &&
'

I can push up a fix that contains this change.

You could do this at first by stashing the bank name in the job "aux" hash, i.e. flux_jobtap_job_aux_set(). If you do this for all cases (i.e. whether a bank is set of your are looking up the default bank), then in later callbacks you can fetch the bank name directly with flux_jobtap_job_aux_get().

I'm sorry that I don't understand this enough, so this is probably a dumb question. The concept makes sense to me; in the case where a user's default bank changes during the course of an active job, we'll need to correctly decrement the appropriate bank. Is it possible to only pass the bank name with flux_jobtap_job_aux_set () and get ()? Is there a reason for passing other information like fairshare and max jobs through a struct?

Copy link
Contributor

@grondo grondo Jul 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason for passing other information like fairshare and max jobs through a struct?

The only reason is that this approach avoids the extra lookup into the user/bank map to find the current job count. If you instead kept a bank_info structure in the map instead of raw counts, then you can attach a reference to the bank information for that job to the job with job_aux_set(). That may help you down the line, (e.g. it will avoid the extra lookup when reprioritizing jobs as well).

Storing the bank name will work just as well and this was only a suggestion for another way to do it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, okay. I could see this also reducing the extra lookup for fetching the bank values for calculating job priority as well.

Would you be okay with solely passing in the default bank information through job_aux_set () and get () in this PR, and posting a follow-up refactoring PR to use a bank_info structure to improve the information passing for both the max jobs limit and the priority calculation?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! I was just suggesting another way to do it.

@cmoussa1 cmoussa1 force-pushed the add.max-jobs.limit branch 2 times, most recently from 43bfedc to 3ed16ce Compare July 30, 2021 15:02
Copy link
Contributor

@grondo grondo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job on the testing!

Just a few more comments. It might be useful to have someone more familiar with C++ have a review as well, as I think I may be getting confused by the std::map interface (which seems awfully difficult to follow)

I made some suggestions inline, most of which should be considered optional, and will have another pass later today.


std::map<int, std::map<std::string, double>> users;
std::map<int, std::map<std::string, std::vector<int>>> users_mj;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: consider renaming users_mj to something more descriptive, since it is a global variable. Perhaps just user_max_jobs. Add a comment describing the global variable (e.g. "Mapping of user,bank to max jobs for that user,bank pair")

(You could also separately add a comment for the other globals here)

Comment on lines 222 to 223
def_bank_it = users_def_bank.find (userid);
mj_inner_it = mj_it->second.find (def_bank_it->second);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should you check for failure from std::map::find here? If these functions are guaranteed to succeed due to checks in job.validate then a comment to that effect might help. (Though I still wonder if the error condition should be checked here to avoid a broker crash. Someone with more C++ experience might be able to tell you)

NULL) < 0)
flux_log_error (h, "flux_jobtap_job_aux_set");
} else {
mj_inner_it = mj_it->second.find (bank);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, do you need to check for failure here, or add a comment to say why bank will always be found in mj_it->second?

&bank[0],
NULL) < 0)
flux_log_error (h, "flux_jobtap_job_aux_set");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: It might be cleaner to refactor the bank lookup, if necessary, into a separate function. Then you can do something like:

if (!bank
    && !(bank = lookup_default_bank (user))
    return flux_jobtap_reject_job (p, args, "unable to find default bank");

// proceed with "bank" as usual...

if (flux_jobtap_job_aux_set (p,
FLUX_JOBTAP_CURRENT_JOB,
"mf_priority:bank",
&bank[0],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't think of a good reason to use &bank[0] here, I think that is the same address as just bank.

Comment on lines 275 to 278
mj_it = users_mj.find (userid);
mj_inner_it = mj_it->second.find (static_cast<char *> (bank));

mj_inner_it->second[1]--;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a general observation: This code is hard to read. It is difficult to follow what mj_it and mj_inner_it point to and (not your fault) the first and second members of the iterator make things even more obtuse. Then the fact that max and current job count is stored as a tuple/array(?) the end result is a little confusing. (Disregard this comment if you still plan to collect user/bank information into a bank_info struct because I think that will resolve the issue)

Since this is a map type structure, can you not just index into the map, e.g. users_mj[userid][bank][1]--? (Sorry if that is a dumb question) You can see it would be even nicer if users[userid][bank] contained a struct bank_info because you could just reference members by name:

 users[userid][bank]->current_jobs--;

Or, if you stored the address of the bank_info struct as the aux item for the job, just:

bankinfo->current_jobs--;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I would agree with you. I took the weekend to take a step back and try and re-evaluate adding the max jobs limit support. I think using a bank_info struct would make the most sense, so I just pushed one fixup commit that attempts to use it. I don't expect it to be perfect but the commit itself contains a few significant changes.

  • the main proposed change is the use of the map data structure containing the user information - instead of adding one new map for every user information component (fairshare, max jobs, any future limits we decide to add in the future), a new map is created that maps a userid and bank combination to a bank_info struct:
struct bank_info {
    double fairshare;
    int max_jobs;
    int current_jobs;
};

I kept the default bank map for now because I wasn't sure how else to handle the default bank case.

  • the validate callback now fetches current_jobs and max_jobs information from the bank_info struct of the user/bank combination using the C++ map's at () function. This throws an out_of_range exception if the value cannot be found.
  • the use of the bank_info struct I think also simplifies the syntax of incrementing/decrementing the current job count, which no longer makes use of a map iterator:
users[userid][bank].current_jobs++;
  • using the struct also removes the need of using an iterator in the priority calculation as well, as we can fetch the fairshare value in a similar way:
fshare_factor = users[userid][bank].fairshare;

Let me know if these changes make sense and help improve the readability at all (I hope they do).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds great! I'll try to go through another review and testing today.

@cmoussa1 cmoussa1 force-pushed the add.max-jobs.limit branch from 60a801b to fd38c6e Compare August 2, 2021 17:08
@cmoussa1 cmoussa1 added the high priority items that must be worked on for major milestones label Aug 3, 2021
Copy link
Contributor

@grondo grondo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look great @cmoussa1! Much easier to follow the code now.

I just have some comments inline about error and exception handling.

Comment on lines 72 to 75
bank = static_cast<char *> (flux_jobtap_job_aux_get (
p,
FLUX_JOBTAP_CURRENT_JOB,
"mf_priority:bank"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Be sure to check for failure here. It is unlikely, since all jobs have mf_priority:bank set before they pass validation. However, if bank == NULL, my guess that will lead to a later crash, which will take out the flux broker, so better to catch "unlikely" errors.

My guess is if mf_priority:bank isn't found, then the job should have an exception raised.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. I just pushed up a fixup commit that checks if bank == NULL, and if it is, it raises an exception on the job saying that the bank could not be found, and sets the priority to 0. The same exception is raised in inactive_cb ().

inner_it = it->second.find ("default");
fshare_factor = inner_it->second;
}
fshare_factor = users[userid][bank].fairshare;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where I'm out of my depth: Should this be wrapped in a try/except block? Will the program abort if an exception is uncaught?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if it is even possible, but it would be good to test that failure is handled gracefully here if an invalid bank or userid is used at this point, so that either users[userid] or users[userid][bank] do not exist. (No problem though if that would be awkward to test)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, if there is no entry for .fairshare it will throw an out_of_range exception, so I should probably wrap this in a try/except block.

I'm not sure the best strategy is to reject the job at this point; what do you think? Is there a more appropriate way to return an error and stop the job from running other than returning flux_jobtap_reject_job ()? The reason I ask is if there is in fact no .fairshare value, then the priority calculation will be incorrect.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think you would have to actually raise an exception on the job, flux_jobtap_raise_exception(), since it is too late to call flux_jobtap_reject_job ().

You could raise an non sev 0 exception, and set the priority to 0 to put the job on hold? Then if the problem if later fixed the job does not have to resubmitted?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that makes sense. If the exception is a non sev 0 exception, should the severity be set to a positive integer? Is there a typical integer used when a value is not available? Something like this:

flux_jobtap_raise_exception (p, FLUX_JOBTAP_CURRENT_JOB, "plugin",
                             2, "plugin: fairshare value not available");

Sorry if this is a dumb question.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've used severity 4 for "warning" exceptions since this corresponds to syslog LOG_WARNING.
Here you may want to use 3 which is LOG_ERR? It is kind of arbitrary.

If the job is also held, you can note that in the exception. Also instead using plugin: for the prefix, I'd suggest mf_priority: so users/admins know which plugin logged the exception.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if it is even possible, but it would be good to test that failure is handled gracefully here if an invalid bank or userid is used at this point, so that either users[userid] or users[userid][bank] do not exist. (No problem though if that would be awkward to test)

I think what I can do is change the way fairshare is accessed from the users map to instead use the at () function, which will throw an out_of_range exception if either the user or the bank cannot be found:

fshare_factor = users.at (userid).at (bank).fairshare;

If I wrap this in a try-catch, if an exception is thrown, I can raise a an exception on the job with an error message saying that it can't find a fairshare value and just return a priority of 0.

I think I can also confirm that this behavior occurs by sending a fake payload in the sharness tests with a null fairshare key-value pair:

data = {"userid": str(userid), "bank": "account4", "default_bank": "account3", "fairshare": "", "max_jobs": "10"}

and then confirming the priority returned for a submitted job under that bank is 0.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clever testing strategy!

One formatting note. I know that our coding style requests a space between function and parens (). However for this C++ case of chaining function calls, I think you could drop that if you want, e.g. instead of

fshare_factor = users.at (userid).at (bank).fairshare;

you could do

fshare_factor = users.at(userid).at(bank).fairshare;

The second form seems more readable to me, but I leave the final decision to you!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you. The space with chaining function calls looks little funky. I'll push up a quick fix to get rid of that space. 👍

Comment on lines 234 to 239
// make sure user has not already hit their max active jobs count
if (current_jobs >= max_jobs)
return flux_jobtap_reject_job (p, args,
"user has max number of jobs submitted");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just had a thought: We'll probably want a way to set max_jobs == UNLIMITED. Maybe 0 or -1 would work for that? Either way, maybe:

Suggested change
// make sure user has not already hit their max active jobs count
if (current_jobs >= max_jobs)
return flux_jobtap_reject_job (p, args,
"user has max number of jobs submitted");
// make sure user has not already hit their max active jobs count
if (max_jobs > 0 && current_jobs >= max_jobs)
return flux_jobtap_reject_job (p, args,
"user has max number of jobs submitted");

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this suggestion! I just pushed up a fixup commit that adds this check for max_jobs > 0.

return -1;
}

bank = static_cast<char *> (flux_jobtap_job_aux_get (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, check return of flux_jobtap_job_aux_get() here to avoid a crash.

FLUX_JOBTAP_CURRENT_JOB,
"mf_priority:bank"));

users[userid][bank].current_jobs--;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same question about catching any exception on this line (honestly above my head though)

@cmoussa1 cmoussa1 force-pushed the add.max-jobs.limit branch 2 times, most recently from 345c1bf to f6396e9 Compare August 10, 2021 18:07
@cmoussa1
Copy link
Member Author

Just pushed up a change to squash the fixup commits that make changes to the plugin based on the feedback into just one fixup commit.

@grondo
Copy link
Contributor

grondo commented Aug 10, 2021

As far as I'm concerned, @cmoussa1, feel free to squash the final fixup commit into your history.

Then I'll make another quick pass, but you may also want a C++ developer feedback on the C++ bits before a final merge.

@cmoussa1 cmoussa1 force-pushed the add.max-jobs.limit branch from f6396e9 to cb5d754 Compare August 10, 2021 19:52
@cmoussa1
Copy link
Member Author

cmoussa1 commented Aug 10, 2021

Thanks @grondo.


@dongahn: if you have some time, would you mind looking at commit 1345833? More specifically, I think some feedback on how key-value entries are fetched from the users map would be most helpful, especially on how to handle a failed lookup if it were to occur. My approach for these map lookups were to wrap the .at () function in a try-catch block since it will throw an out_of_range exception if the lookup fails:

    try {
        fshare_factor = users.at(userid).at(bank).fairshare;
    } catch (const std::out_of_range& oor) {
        flux_jobtap_raise_exception (p, FLUX_JOBTAP_CURRENT_JOB, "plugin",
                                     3, "mf_priority: cannot find fairshare; "
                                     "holding job");
        return 0;
    }

In the priority calculation function, it will raise a job exception and return a priority of 0. It will also raise a job exception if it cannot decrement the user's active jobs count after a job has completed.

EDIT: small note - just noticed there's an unneeded #include <vector> in the plugin code, so I'll remove that after this round of review to not confuse commit hashes.

@grondo
Copy link
Contributor

grondo commented Aug 10, 2021

If you can create a series of tests to actually trigger those exceptions and exercise the code in the catch block, that will go a long way to giving confidence that your approach is working!

@cmoussa1
Copy link
Member Author

If you can create a series of tests to actually trigger those exceptions and exercise the code in the catch block, that will go a long way to giving confidence that your approach is working!

Is there a way to write sharness tests to look for those raised job exceptions?

@grondo
Copy link
Contributor

grondo commented Aug 10, 2021

Yes, flux job wait-event -t TIMEOUT [OPTIONS] exception JOBID will wait for an exception event in the eventlog for JOBID. For an example of usage see t2271-job-dependency-after.t:

https://github.com/flux-framework/flux-core/blob/fac664ff9d1746fab720ec22a730cc050c4f9ce5/t/t2271-job-dependency-after.t#L75-L83

@cmoussa1
Copy link
Member Author

Ah, okay, thanks. :-) Why don't I add some tests that trigger this exception before I ask @dongahn to review this, just in case things change slightly.

@cmoussa1 cmoussa1 force-pushed the add.max-jobs.limit branch from cb5d754 to 269a5e2 Compare August 11, 2021 15:08
@cmoussa1
Copy link
Member Author

I just pushed up a fixup commit (269a5e2) that I think does a better job of handling the map lookups in users and will fail more safely than before.

Previously, there were map lookups scattered in a number of the callbacks, in validate_cb(), priority_calculation(), and inactive_cb(), which could leave a point-of-failure in any of those functions that might be hard to track down and result in a program crash. I think it finally clicked what @grondo and I had a discussion about creating a bank_info struct, and then passing it via flux_jobtap_job_aux_set(). This reduces the map lookups (using map::find()) in the plugin to just occur in job.validate, and if any lookup happens to fail there, the job will be rejected. All other accessing of bank information after a job gets validated can then be fetched with flux_jobtap_job_aux_get() with the following:

struct bank_info *b;

b = static_cast<bank_info *> (flux_jobtap_job_aux_get (
                                                p,
                                                FLUX_JOBTAP_CURRENT_JOB,
                                                "mf_priority:bank_info"));

...

fshare_factor = b->fairshare;

Hopefully this cleans up the code some more and makes it easier to track down bugs related to the map access if they do occur.

@dongahn
Copy link
Member

dongahn commented Aug 11, 2021

OK, that's a good suggestion. If I understood your recommendation above, do you recommend I instead use map::find()? That way, if it fails to find, I can do one of two things:

This sounds reasonable to me.

I think it is actually outside the scope of this PR. But if it applicable, I would later also consider adding C++ exception safe callbacks for any other exceptions that could be raised from your callback. One example would be to use eh_wrapper_t class: https://github.com/flux-framework/flux-sched/blob/master/qmanager/modules/qmanager_callbacks.cpp#L455.

@cmoussa1 cmoussa1 force-pushed the add.max-jobs.limit branch from 269a5e2 to 05d2d55 Compare August 11, 2021 19:39
@cmoussa1
Copy link
Member Author

The fixup commits should be squashed now and now the plugin shouldn't make any use of C++ exception handling (just a job exception via flux_jobtap_raise_exception ()), and all of the map lookups should occur only in validate_cb () instead of in multiple callbacks in the plugin. Information needed for job priority calculation and active job count tracking should be set/accessed with flux_jobtap_job_aux_set/get ().

Add a new column to the association_table in the
flux-accounting database: max_jobs, which represents
the max number of submitted jobs a user can have at one
time.
@cmoussa1 cmoussa1 force-pushed the add.max-jobs.limit branch from 05d2d55 to e519040 Compare August 20, 2021 15:14
Copy link
Contributor

@grondo grondo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple questions inline after a quick skim. It could be that I'm confused about use of C++ map here, and if so I apologize for the noise!

Comment on lines 119 to 123
struct bank_info b;

b.fairshare = std::atof (fshare);
b.max_jobs = std::atoi (max_jobs);
b.current_jobs = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it save a couple assignments to do something like:

    struct bank_info *b;
    b = &users[std:atoi(uid)][bank];
    b->fairshare = std::atof (fshare);
    b->max_jobs = std::atoi (max_jobs);
    ...

Then you do not have to assign b to users[std::atoi (uid)][bank] below

Comment on lines 119 to 123
struct bank_info b;

b.fairshare = std::atof (fshare);
b.max_jobs = std::atoi (max_jobs);
b.current_jobs = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I'm probably confused, but are you sure you want to reset current_jobs to zero when a user/bank is updated? It seems like if there are current active jobs for this user/bank the count should be left alone.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, no, I think you're absolutely right. Thanks for pointing this out. Every time there is an update, the current_jobs count would be reset to 0. I failed to realize that it already gets initialized to 0 when the bank_info object is created. So I should leave the current_jobs count alone.

Maybe I can add another payload update towards the end of the sharness test to confirm that current_jobs doesn't get reset when another update occurs. Maybe between the last two tests... 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A good functionality test might be to send an update and make sure the max_jobs limit is applied, then send another update increasing the max_jobs limit, e.g. by 1, and ensure the new limit is applied. This will check that limits can be updated, and also that sending an update doesn't reset the current job count in the plugin.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that idea. I think the latest update I pushed up addresses that suggestion. A new payload update gets sent to the plugin with a new max_jobs limit of 4 (+1 to the user's previous max jobs limit of 3). This update occurs while the user has 3 active jobs running, so when the user submits another job (to increase their active count to 4), it should be submitted and run successfully. I added another job submission attempt as well to make sure that the user can't submit 5 or more active jobs. I think this should test two things:

  1. that the user's max_jobs count can be increased while the user has active jobs running, and
  2. that the user's current_jobs count is not reset when new payload data is sent to the plugin.

Add support for the priority plugin to keep count of
a user/bank combination's active jobs and reject jobs
that are submitted while the user already has their max
number of jobs currently active.

The number of active jobs is kept track via a "bank_info"
struct in the users map that contains user/bank information.
When a job enters job.validate, their max_jobs counter is
incremented and is decremented when the job enters job.state.inactive.
If a user already has their max number of active jobs running
and tries to submit a new job, the new job will be rejected
with an error message.
Add the "max_jobs" key-value pair to each user's
payload information to be sent to the priority plugin.
Add a new test to t1001-mf-priority-basic.t that sends a
fake payload with an empty fairshare key-value pair to
ensure that if a job is submitted with a fairshare value of
0, the job will be rejected.
@cmoussa1 cmoussa1 force-pushed the add.max-jobs.limit branch from e519040 to 5dff09e Compare August 21, 2021 02:00
@codecov
Copy link

codecov bot commented Aug 21, 2021

Codecov Report

Merging #131 (3849b57) into master (f1bb269) will decrease coverage by 0.23%.
The diff coverage is 86.36%.

❗ Current head 3849b57 differs from pull request most recent head 5dff09e. Consider uploading reports for the commit 5dff09e to get more accurate results

@@            Coverage Diff             @@
##           master     #131      +/-   ##
==========================================
- Coverage   85.76%   85.53%   -0.24%     
==========================================
  Files          20       20              
  Lines         906      933      +27     
==========================================
+ Hits          777      798      +21     
- Misses        129      135       +6     
Impacted Files Coverage Δ
src/plugins/mf_priority.cpp 88.60% <86.36%> (-5.63%) ⬇️

Copy link
Contributor

@grondo grondo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems good to merge to me!

Just one question inline, but the answer and any discussion could come in a future issue or PR.

Comment on lines +231 to +234
// if a user's fairshare value is 0, that means they shouldn't be able
// to run jobs on a system
if (fairshare == 0)
return flux_jobtap_reject_job (p, args, "user fairshare value is 0");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: if a users fairshare value is 0 should they still be able to submit a job, but it would be held until such time as fairshare > 0 and max jobs limit is not exceeded?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. I guess I hadn't thought of just holding a job if a user's fairshare value is 0 instead of rejecting it; I wonder how SLURM currently handles submitted jobs with a fairshare value of 0.

If you don't mind, I can open an issue on this for some further discussion and in case we might need to change this behavior. It might be a good idea for me to get @ryanday36's input too on how they currently treat submitted jobs under this case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me!

@cmoussa1
Copy link
Member Author

Thanks @grondo! I'll go ahead an open an issue per your question in the latest round of review to track it, and I'll set MWP in a bit.

@mergify mergify bot merged commit a70ab05 into flux-framework:master Aug 31, 2021
cmoussa1 added a commit to cmoussa1/flux-accounting that referenced this pull request Sep 1, 2021
Problem: When the max_jobs column was added to the flux-accounting DB
in flux-framework#131, it was not added to the function declaration of add_user(),
resulting in a NOT NULL constraint failure when attempting to add a
user. This error gets raised when calling the add-user subcommand
in flux-account.py.

Solution: Add max_jobs to the function header of add_user() and
to the SQL statement of a user when being added to the flux-accounting
DB.
@cmoussa1 cmoussa1 deleted the add.max-jobs.limit branch January 20, 2022 15:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
high priority items that must be worked on for major milestones merge-when-passing new feature new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants