-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate resource-match with the new scheduler interface #468
Comments
@grondo: could you post some pointers as to what files I should start to take a look? Thanks. |
The helper functions for scheduler integration were developed by @garlick and can be found in For example uses check out We had planned to abstract |
Thanks @grondo. This will be my next priority. |
I looked at some of these suggested files. They look great. My feeling is, though, the scheduler loop service at flux-sched will have to be much more complex, and we will need to manage this complexity very carefully. Note that the original
I also talked with @garlick and @grondo and I will copy |
@garlick and @grondo: I assume you won't have queueing policies other than fcfs at the job-manager level, correct? So essentially This is fine but I thought I should doublecheck. When users want the any out of order policy like backfilling at the flux-sched level, I will assume sched will have to use "unlimited" to replicate the entire job queue. Also initially, scheduler loop trigger events will be "alloc" and "free" only since we have no resource event yet (additional resource joined; some resources are detected to be down and/or excluded). |
Correct on both counts. If this turns out to be too simplistic, let's talk. |
Functionality wise, this seems okay. So I will start to design based on these assumptions. If this appeared to be too redundant, I will call for a discussion. BTW There are things that the current interface solves pretty nicely for me. They include not needing finite state machines and not having to deal with individual events; and easy to implement resilience scheme. But I realize I will probably still have to implement performance optimization techniques like queue depth and delay scheduling at the scheduler level. This is fine. But I will see if there are opportunities to implement those at the core level which can benefit all schedulers. |
Yeah. It would awesome if the |
@garlick or @SteVwonder: I see from sched-dummy.c, the module load option is now Did we decide to require this style of option passing for modules at this point across the board? I am implementing this part of the new |
@garlick: My rc1 script for For the long haul, though it seems we would need a way to query whether a conflicting module has been loaded if so it can be first unloaded. |
That seems ok for now. I don't have any deeper thoughts on the subject
right now 🙂
…On Wed, Jun 26, 2019, 4:33 PM Dong H. Ahn ***@***.***> wrote:
@garlick <https://github.com/garlick>: My rc1 script for qmanager
currently fails because flux-core loads in sched-simple by default. I can
get around that by unloading sched-simple if present before loading up
qmanager in its rc1 script. Does it sound like a reasonable short term
solution?
For the long haul, though it seems we would need a way to query whether a
conflicting module has been loaded if so it can be first unloaded.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#468?email_source=notifications&email_token=AABJPW6IT3A23YAHY7FZJ3LP4P4FRA5CNFSM4HW7ZQQKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYVDFCY#issuecomment-506081931>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AABJPW2TWVSOGEEB6QYFIELP4P4FRANCNFSM4HW7ZQQA>
.
|
@garlick: A quick question. When you submit a jobspec with 1h duration at this point like flux job submit test.t60.json and the scheduler respond to the |
@garlick: also I unload 2019-06-28T06:41:10.936714Z broker.err[0]: rc3: flux-module: cmb.rmmod[0] sched-simple: No such file or directory
flux-broker: module 'qmanager' was not cleanly shutdown Any insight? |
As discussed in the meeting, not required, but easier. If using optparse, just watch out for module argv[0] being the first argument not argv[1] in modules (need to pass argv - 1, argc + 1).
After execution completes, and execution always completes quickly because the actual launch isn't implemented yet. There is a way to simulate exection of the full duration (with a sleep in the exec system), see
Modules are normally loaded in rc1 and unloaded in rc3, so maybe you need to provide an rc3 script also? They are not automatically unloaded. |
I do have it. I will take a look at it again though. BTW if the module (sched-simple) loaded by its rc1 got unloaded by others (like this case), presumably doing another unload by its rc3 script wouldn't lead to this error, would it? |
This should be very useful! |
PR #481 resolved this. |
Had a discussion with @grondo yesterday. Though the scheduler interface isn't where he wants it to be, we thought it would be a good idea to start doing the integration work sooner rather than later. Once #467 is merged, I plan to take a crack at this.
The text was updated successfully, but these errors were encountered: