Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

plugin: disable queue validation on an unknown queue, no configured "default" queue #281

Merged
merged 6 commits into from
Oct 3, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
75 changes: 31 additions & 44 deletions src/plugins/mf_priority.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -29,10 +29,21 @@ extern "C" {
#include <vector>
#include <sstream>

// the plugin does not know about the association who submitted a job and will
// assign default values to the association until it receives information from
// flux-accounting
#define BANK_INFO_MISSING -9
#define NO_SUCH_QUEUE -5

// a queue is specified for a submitted job that flux-accounting does not know
// about
#define UNKNOWN_QUEUE 0

// no queue is specified for a submitted job
#define NO_QUEUE_SPECIFIED 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit message nit: The commit message says:

Change the preprocessor directive INVALID_QUEUE to UNKNOWN_QUEUE

But it appears that here INVALID_QUEUE is still defined.

Also:

Change the preprocessor directive NO_DEFAULT_QUEUE to UNSPECIFIED_QUEUE,

I don't see a UNSPECIFIED_QUEUE, but there is a NO_QUEUE_SPECIFIED...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, I totally messed that up, didn't I? I meant to update this commit message before posting the PR but completely forgot. I'll update the commit message to use both UNKNOWN_QUEUE (instead of INVALID_QUEUE) and NO_QUEUE_SPECIFIED (instead of UNSPECIFIED_QUEUE). Sorry for the confusion!

Copy link
Contributor

@grondo grondo Oct 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While you are in there, it might be good to add comments on each of these #defines to say more of what each means. Not critical, but if you are modifying the commit anyway...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good suggestion. I will do that!


// a queue was specified for a submitted job that flux-accounting knows about and
// the association does not have permission to run jobs under
#define INVALID_QUEUE -6
#define NO_DEFAULT_QUEUE -7

std::map<int, std::map<std::string, struct bank_info>> users;
std::map<std::string, struct queue_info> queues;
Expand All @@ -50,6 +61,9 @@ struct bank_info {
int active;
};

// min_nodes_per_job, max_nodes_per_job, and max_time_per_job are not
// currently used or enforced in this plugin, so their values have no
// effect in queue limit enforcement.
struct queue_info {
int min_nodes_per_job;
int max_nodes_per_job;
Expand Down Expand Up @@ -126,13 +140,16 @@ static int get_queue_info (
{
std::map<std::string, struct queue_info>::iterator q_it;

// make sure that if a queue is passed in, it 1) exists, and 2) is a valid
// queue for the user to run jobs in
// make sure that if a queue is passed in, it is a valid queue for the
// user to run jobs in
if (queue != NULL) {
// check #1) the queue passed in exists in the queues map
// check #1) the queue passed in exists in the queues map;
// if the queue cannot be found, this means that flux-accounting
// does not know about the queue, and thus should return a default
// factor
q_it = queues.find (queue);
if (q_it == queues.end ())
return NO_SUCH_QUEUE;
return UNKNOWN_QUEUE;

// check #2) the queue passed in is a valid option to pass for user
std::vector<std::string>::iterator vect_it;
Expand All @@ -145,13 +162,8 @@ static int get_queue_info (
// add priority associated with the passed in queue to bank_info
return queues[queue].priority;
} else {
// no queue was specified, so use default queue and associated priority
q_it = queues.find ("default");

if (q_it == queues.end ())
return NO_DEFAULT_QUEUE;
else
return queues["default"].priority;
// no queue was specified, so just use a default queue factor
return NO_QUEUE_SPECIFIED;
}
}

Expand All @@ -174,25 +186,13 @@ int check_queue_factor (flux_plugin_t *p,
char *queue,
char *prefix = (char *) "")
{
if (queue_factor == NO_SUCH_QUEUE) {
flux_jobtap_raise_exception (p, FLUX_JOBTAP_CURRENT_JOB, "mf_priority",
0,
"%sQueue does not exist: %s",
prefix, queue);
return -1;
} else if (queue_factor == INVALID_QUEUE) {
if (queue_factor == INVALID_QUEUE) {
flux_jobtap_raise_exception (p, FLUX_JOBTAP_CURRENT_JOB,
"mf_priority", 0,
"%sQueue not valid for user: %s",
prefix, queue);
return -1;
}
else if (queue_factor == NO_DEFAULT_QUEUE) {
flux_jobtap_raise_exception (p, FLUX_JOBTAP_CURRENT_JOB,
"mf_priority", 0,
"No default queue exists");
return -1;
}

return 0;
}
Expand Down Expand Up @@ -723,17 +723,15 @@ static int validate_cb (flux_plugin_t *p,
return flux_jobtap_reject_job (p, args, "user/bank entry has been "
"disabled from flux-accounting DB");

// fetch priority associated with passed-in queue (or default queue)
// fetch priority associated with passed-in queue; if a queue cannot be
// found, use a default factor for the queue (UNKOWN_QUEUE). If a queue
// is found but is determined that a user does not have access to submit
// jobs to it, it can still be rejected.
bank_it->second.queue_factor = get_queue_info (queue, bank_it);

if (bank_it->second.queue_factor == NO_SUCH_QUEUE)
return flux_jobtap_reject_job (p, args, "Queue does not exist: %s",
queue);
else if (bank_it->second.queue_factor == INVALID_QUEUE)
if (bank_it->second.queue_factor == INVALID_QUEUE)
return flux_jobtap_reject_job (p, args, "Queue not valid for user: %s",
queue);
else if (bank_it->second.queue_factor == NO_DEFAULT_QUEUE)
return flux_jobtap_reject_job (p, args, "No default queue exists");

max_run_jobs = bank_it->second.max_run_jobs;
fairshare = bank_it->second.fairshare;
Expand Down Expand Up @@ -994,17 +992,6 @@ extern "C" int flux_plugin_init (flux_plugin_t *p)
|| flux_jobtap_service_register (p, "rec_q_update", rec_q_cb, p) < 0)
return -1;

struct queue_info *q;
q = &queues["default"];

// min_nodes_per_job, max_nodes_per_job, and max_time_per_job are not
// currently used or enforced in this plugin, so their values have no
// effect in queue limit enforcement.
q->min_nodes_per_job = 0;
q->max_nodes_per_job = 1024;
q->max_time_per_job = 64000;
q->priority = 1000;

return 0;
}

Expand Down
22 changes: 2 additions & 20 deletions t/t1013-mf-priority-queues.t
Original file line number Diff line number Diff line change
Expand Up @@ -61,19 +61,13 @@ test_expect_success 'stop the queue' '
flux queue stop
'

# test_expect_success 'trying to submit a job without specifying a queue should attempt to use default queue' '
# test_must_fail flux python ${SUBMIT_AS} 5011 -n1 hostname > no_default_queue.out 2>&1 &&
# test_debug "no_default_queue.out" &&
# grep "No default queue exists" no_default_queue.out
# '

test_expect_success 'adding a default queue allows users to run jobs without specifying a queue' '
test_expect_success 'users can run jobs without specifying a queue' '
flux account -p ${DB_PATH} add-queue default --priority=1000 &&
flux account-priority-update -p $(pwd)/FluxAccountingTest.db &&
jobid0=$(flux python ${SUBMIT_AS} 5011 -n1 hostname) &&
flux job wait-event -f json $jobid0 priority | jq '.context.priority' > job0.test &&
cat <<-EOF >job0.expected &&
10050000
50000
EOF
test_cmp job0.expected job0.test &&
flux job cancel $jobid0
Expand All @@ -99,13 +93,6 @@ test_expect_success 'submit a job using a queue the user does not belong to' '
grep "Queue not valid for user: expedite" unavail_queue.out
'

test_expect_success 'submit a job using a nonexistent queue' '
test_must_fail flux python ${SUBMIT_AS} 5011 --queue=foo \
-n1 hostname > bad_queue.out 2>&1 &&
test_debug "bad_queue.out" &&
grep "Queue does not exist: foo" bad_queue.out
'

test_expect_success 'submit a job using standby queue, which should not increase job priority' '
jobid1=$(flux python ${SUBMIT_AS} 5011 --job-name=standby \
--setattr=system.bank=account1 --queue=standby -n1 hostname) &&
Expand Down Expand Up @@ -167,9 +154,4 @@ test_expect_success 'reload mf_priority.so and update it with the sample test da
flux account-priority-update -p $(pwd)/FluxAccountingTest.db
'

test_expect_success 'ensure job exception was raised saying that the queue does not exist' '
flux job wait-event -v ${jobid6} exception > nonexistent_queue.test &&
grep "Queue does not exist: foo" nonexistent_queue.test
'

test_done
8 changes: 4 additions & 4 deletions t/t1018-mf-priority-disable-entry.t
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ test_expect_success 'submit a job successfully under default bank' '
jobid1=$(flux mini submit -n1 hostname) &&
flux job wait-event -f json $jobid1 priority | jq '.context.priority' > job1.test &&
cat <<-EOF >job1.expected &&
10050000
50000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be useful to state in the commit message why the priority changes are a result of the changes in validating queues since, at least to me, that is not an obvious effect. Though please correct me if this should be clear.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, looking back I realize this isn't as obvious as I initially thought, sorry about that.

The reason the priority changes for these jobs is because the queue factor is UNSPECIFIED_QUEUE, which is 0. This changes the resulting integer priority of the job from 1005000 to just 50000, as before, the multi-factor priority plugin had a "default" queue entry (and a factor of 1000) when an association submitted jobs under a default queue (like in the test cases above). I should make note of this in the commit message, and will push up a fix.

EOF
test_cmp job1.expected job1.test
'
Expand All @@ -53,7 +53,7 @@ test_expect_success 'submit a job successfully under second bank' '
jobid2=$(flux mini submit --setattr=system.bank=account2 -n1 hostname) &&
flux job wait-event -f json $jobid2 priority | jq '.context.priority' > job2.test &&
cat <<-EOF >job2.expected &&
10050000
50000
EOF
test_cmp job2.expected job2.test
'
Expand All @@ -78,7 +78,7 @@ test_expect_success 'submit a job successfully under second bank' '
jobid3=$(flux mini submit --setattr=system.bank=account2 -n1 hostname) &&
flux job wait-event -f json $jobid3 priority | jq '.context.priority' > job3.test &&
cat <<-EOF >job3.expected &&
10050000
50000
EOF
test_cmp job3.expected job3.test
'
Expand All @@ -92,7 +92,7 @@ test_expect_success 'try to submit job under new default user/bank entry' '
jobid4=$(flux mini submit -n1 hostname) &&
flux job wait-event -f json $jobid4 priority | jq '.context.priority' > job4.test &&
cat <<-EOF >job4.expected &&
10050000
50000
EOF
test_cmp job4.expected job4.test
'
Expand Down