-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Broader verification of scheduler correctness #168
Comments
We currently have a very simple test using the simulator that launches 10 jobs and verifies their schedule under FCFS and EASY. So the framework is there to do the testing (assuming the simulator is still functioning under master, which is a separate issue that I'll be address this summer), but now we need some more extensive tests. I will look into this to see what I can find. At minimum, we should be able to find emulated/simulated schedulers that run EASY, Conservative, and the hybrid algorithms. We could then throw the same set of jobs to the Flux implementation and the other implementation, comparing their schedules. At best, there is (as @dongahn mentioned) a set of workloads and associated schedules that already exist. |
Agreed. |
@dongahn and I spoke on the phone today, and the idea of using SLURM's scheduler to verify Flux's scheduler came up. We are unsure of the exact scheduling policies that SLURM supports. @lipari's slides (specifically slide 34) make it seem that 4 years ago only EASY backfilling was supported by SLURM. @lipari, do you know what scheduling policies are currently supported by SLURM? Does it now support conservative & hybrid backfilling? |
Thanks @SteVwonder. Context for @lipari: an easy verification mechanism of our backfill policy implementations will significantly add to reaching our CTS-1 scheduler milestone. Verifying FCFS is a no brainer -- we can look at each job's |
The Slurm settings to enable backfill behavior which will serve as a reference are:
The SchedulerParameters settings are where you will tune the backfill behavior from conservative to liberal. Look at the slurm.conf man page to see all the options. Also take a look at vulcan to see the bf_ settings we use in production. |
On further review, it looks like there is not a setting to enable EASY backfill. Even looking at the newer Slurm on vulcan, I don't see a setting which will violate the conservative rule of not scheduling a lower priority job if it will delay the start of any higher priority job. |
Thanks @lipari for information. So it seems that we can use the current version of SLURM to validate our Conservative backfilling algorithm. But that we cannot use the current version to test Hybrid or EASY. I wonder if there is another scheduler with a simulator that we can test with. |
Do we have well-known large enough workloads (e.g., ESP) and expected schedules under different scheduling policies? Perhaps we should de-couple CTS-1 scheduler milestone and correctness verification efforts. In particular, I am using core-level scheduling (i.e., submitting jobs only with their requirements for cores not nodes) in which case verifying correctness might even be harder. Perhaps we should build necessary confidence on correctness using existing workloads and expected schedules (e.g., ESP) and ensure that our schedules matches with golden orders. Then, we assume that this orthogonal effort would validate the correctness of our algorithms, and for the CTS-1 scheduler milestone, then we just focus on perf/scal characteristics? |
@dongahn, that approach sound reasonable to me. I just found another scheduler simulator, Alea, that is built on top of GridSim that seems promising: I will look for some workloads with "golden orders". |
@SteVwonder: you might want to ping Suraj. I heard he was trying use a similar for grid computing for his dissertation and was pretty unhappy with their capabilities and software quality. |
@dongahn: will do that now Also, just another thought, if we can come up with some invariant that each scheduling algorithm must follow, we could perform a test that the invariant holds before/after each iteration. For example, we could ensure that the start time of each reserved job only decreases (and never increases) after each iteration. This is a more involved approach, but if we pick a strong enough invariant/set of invariants, then our algorithm would be provably correct for a given set of inputs as opposed to empirically correct. |
Oh. This is an excellent idea! |
One other thought: It would be useful for both the formal verification and debugging to dump the state of the resrc after every scheduler iteration. Maybe we can add a flag to sched that would save out the resrc every iteration for later analysis? |
@SteVwonder: there is already a debug routine that can print out the resrc state. All we need will be to add a sched load option to turn this on at an appropriate point. Note that this will dump lots of data -- in particular at large scale. Not sure how easily it is to do a formal verification using this data though... |
FYI -- the function is here |
That printing function is nice for human parsing, but not as nice for programmatic parsing. If we serialize the resrc out to JSON (which there is already a function for), then it will be easy to work with in Python. An invariant like "the reservation time of a job should be non-increasing over time" shouldn't be that hard to code up (I'm willing to do so). I just don't know if that is a strong enough invariant. And I agree, size could be an issue for a sufficiently large system. I think it should be doable for ~100 nodes and several thousand jobs. |
@SteVwonder: agreed. Does this function help? https://github.com/flux-framework/flux-sched/blob/master/resrc/resrc_tree.c#L147 Once a json object is built from the top resrc, one can use json stringfy function to dump it? e.g., |
@SteVwonder: BTW, unlike under an emulatuon/sumulaion, it just dawned on me that schedulung under a real system is not necessarily deterministic. Even if we try to control a job's running time tightly (using sleep etc) , the system noise will make it pretty difficult to control the order of resource and job events. Say job 1 using 2 nodes is supposed to end 10 seconds early than job 2, which then can run the highest priority reserved job (job e requesting 2 nodes). But system noise can delay freeing up of job 1 so that job 2 using 1 node gets deallocated first. Then, insteas if job 3, job4 which only requests 1 node may get backfilled. This kind ca change the schedule on the same workload run to run. And this will be pretty prevalent for a stress test workload. I think this really motivates fully validating under emulation and complementing that by using some best effort invariant based solution if we need additional assurance for the scheduler's behavior under a real system. |
Another Sat. morning idea while jogging... what's deterministic is: given a totally ordered job and resource events, each changing the current resource state, schuduling must be determininstic. Then one addtional idea is not only gather the workload (jobs) but also collect these state changing evets. Then these can be fed into a validator for correctness check? |
Final thoguht. If HOC doean't have paper or SW in this area, this can land a pretty nice publication. Essenntialy it is becoming increasing difficult to validate if an algoritm or an implenmentaion of this algorithm correctly enforce the target schedulig policy in particular in the midst of more constraints to consider. We can propose a conceptual model that can lead to a scheduler/rm agbostic validator and show the result suojg flux as prrof by demonstration. |
Traditional emulator and other test cases currently verify the correctness. |
In #159, @dongahn mentioned:
Opening as a separate issue to isolate the discussion here.
The text was updated successfully, but these errors were encountered: