Skip to content

Commit

Permalink
qmanager: use SCHEDUTIL_HELLO_PARTIAL_OK flag
Browse files Browse the repository at this point in the history
Problem: when the scheduler is reloaded, housekeeping jobs with
partially allocated resources are canceled rather than being sent to
the scheduler for re-allocation during the hello handshake.

This is because there was no way for the job manager to inform the
scheduler of the free R subset.  However, RFC 27 now specifies that the
sched.hello request may contain a 'partial-ok' flag.  If set, hello
responses may include a 'free' key containing an RFC 22 idset, with
ids corresponding to ranks of R that are free.

Schedutil wraps this so that if schedutil_create() is called with the
SCHEDUTIL_HELLO_PARTIAL_OK flag, then
- the hello request includes partial-ok
- free ranks, if any, are subtracted from R in each response message
  before calling the scheduler's callback

Note that the scheduling key (JGF), if present, always contains the
original, full resource set.

Set SCHEDUTIL_HELLO_PARTIAL_OK flag if defined by flux-core.
  • Loading branch information
garlick committed Dec 17, 2024
1 parent 507fbb9 commit a49ba81
Showing 1 changed file with 6 additions and 1 deletion.
7 changes: 6 additions & 1 deletion qmanager/modules/qmanager.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -555,9 +555,14 @@ static std::shared_ptr<qmanager_ctx_t> qmanager_new (flux_t *h)
ctx = nullptr;
goto done;
}
int schedutil_flags = 0;
#ifdef SCHEDUTIL_HELLO_PARTIAL_OK
// flag was added in flux-core 0.70.0
schedutil_flags |= SCHEDUTIL_HELLO_PARTIAL_OK;
#endif
if (!(ctx->schedutil =
schedutil_create (ctx->h,
0,
schedutil_flags,
&ops,
std::static_pointer_cast<qmanager_cb_ctx_t> (ctx).get ()))) {
flux_log_error (ctx->h, "%s: schedutil_create", __FUNCTION__);
Expand Down

0 comments on commit a49ba81

Please sign in to comment.