Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broader verification of scheduler correctness #168

Closed
SteVwonder opened this issue Jun 9, 2016 · 21 comments
Closed

Broader verification of scheduler correctness #168

SteVwonder opened this issue Jun 9, 2016 · 21 comments

Comments

@SteVwonder
Copy link
Member

In #159, @dongahn mentioned:

do you have an idea as to how we can test correctness of various scheduling algorithms for CTS-1 testing? Perhaps we can bring in some well-known workloads w/ known schedules?`

Opening as a separate issue to isolate the discussion here.

@SteVwonder
Copy link
Member Author

We currently have a very simple test using the simulator that launches 10 jobs and verifies their schedule under FCFS and EASY. So the framework is there to do the testing (assuming the simulator is still functioning under master, which is a separate issue that I'll be address this summer), but now we need some more extensive tests. I will look into this to see what I can find.

At minimum, we should be able to find emulated/simulated schedulers that run EASY, Conservative, and the hybrid algorithms. We could then throw the same set of jobs to the Flux implementation and the other implementation, comparing their schedules.

At best, there is (as @dongahn mentioned) a set of workloads and associated schedules that already exist.

@dongahn
Copy link
Member

dongahn commented Jun 28, 2016

Agreed.

@SteVwonder
Copy link
Member Author

@dongahn and I spoke on the phone today, and the idea of using SLURM's scheduler to verify Flux's scheduler came up. We are unsure of the exact scheduling policies that SLURM supports. @lipari's slides (specifically slide 34) make it seem that 4 years ago only EASY backfilling was supported by SLURM. @lipari, do you know what scheduling policies are currently supported by SLURM? Does it now support conservative & hybrid backfilling?

@dongahn
Copy link
Member

dongahn commented Aug 12, 2016

Thanks @SteVwonder.

Context for @lipari: an easy verification mechanism of our backfill policy implementations will significantly add to reaching our CTS-1 scheduler milestone.

Verifying FCFS is a no brainer -- we can look at each job's starting-time timestamp within lwj.$jobid and make sure they are ordered. But I have no idea how easily I can verify the correctness of backfill algorithms. I would like to cover EASY, CONSERVATIVE, and Hybrid with a relatively deep depth (currently set to 100).

@lipari
Copy link
Contributor

lipari commented Aug 12, 2016

The Slurm settings to enable backfill behavior which will serve as a reference are:

  • PriorityType=basic
  • SchedulerType=sched/backfill
  • SchedulerParameters=bf_...

The SchedulerParameters settings are where you will tune the backfill behavior from conservative to liberal. Look at the slurm.conf man page to see all the options. Also take a look at vulcan to see the bf_ settings we use in production.

@lipari
Copy link
Contributor

lipari commented Aug 12, 2016

On further review, it looks like there is not a setting to enable EASY backfill. Even looking at the newer Slurm on vulcan, I don't see a setting which will violate the conservative rule of not scheduling a lower priority job if it will delay the start of any higher priority job.

@SteVwonder
Copy link
Member Author

Thanks @lipari for information. So it seems that we can use the current version of SLURM to validate our Conservative backfilling algorithm. But that we cannot use the current version to test Hybrid or EASY.

I wonder if there is another scheduler with a simulator that we can test with.

@dongahn
Copy link
Member

dongahn commented Aug 12, 2016

Do we have well-known large enough workloads (e.g., ESP) and expected schedules under different scheduling policies?

Perhaps we should de-couple CTS-1 scheduler milestone and correctness verification efforts.

In particular, I am using core-level scheduling (i.e., submitting jobs only with their requirements for cores not nodes) in which case verifying correctness might even be harder.

Perhaps we should build necessary confidence on correctness using existing workloads and expected schedules (e.g., ESP) and ensure that our schedules matches with golden orders. Then, we assume that this orthogonal effort would validate the correctness of our algorithms, and for the CTS-1 scheduler milestone, then we just focus on perf/scal characteristics?

@SteVwonder
Copy link
Member Author

SteVwonder commented Aug 12, 2016

@dongahn, that approach sound reasonable to me.

I just found another scheduler simulator, Alea, that is built on top of GridSim that seems promising:

I will look for some workloads with "golden orders".

@dongahn
Copy link
Member

dongahn commented Aug 12, 2016

@SteVwonder: you might want to ping Suraj. I heard he was trying use a similar for grid computing for his dissertation and was pretty unhappy with their capabilities and software quality.

@SteVwonder
Copy link
Member Author

@dongahn: will do that now

Also, just another thought, if we can come up with some invariant that each scheduling algorithm must follow, we could perform a test that the invariant holds before/after each iteration. For example, we could ensure that the start time of each reserved job only decreases (and never increases) after each iteration. This is a more involved approach, but if we pick a strong enough invariant/set of invariants, then our algorithm would be provably correct for a given set of inputs as opposed to empirically correct.

@dongahn
Copy link
Member

dongahn commented Aug 12, 2016

Oh. This is an excellent idea!

@SteVwonder
Copy link
Member Author

One other thought: It would be useful for both the formal verification and debugging to dump the state of the resrc after every scheduler iteration. Maybe we can add a flag to sched that would save out the resrc every iteration for later analysis?

@dongahn
Copy link
Member

dongahn commented Aug 13, 2016

@SteVwonder: there is already a debug routine that can print out the resrc state. All we need will be to add a sched load option to turn this on at an appropriate point. Note that this will dump lots of data -- in particular at large scale. Not sure how easily it is to do a formal verification using this data though...

@dongahn
Copy link
Member

dongahn commented Aug 13, 2016

FYI -- the function is here

@SteVwonder
Copy link
Member Author

SteVwonder commented Aug 13, 2016

That printing function is nice for human parsing, but not as nice for programmatic parsing. If we serialize the resrc out to JSON (which there is already a function for), then it will be easy to work with in Python. An invariant like "the reservation time of a job should be non-increasing over time" shouldn't be that hard to code up (I'm willing to do so). I just don't know if that is a strong enough invariant.

And I agree, size could be an issue for a sufficiently large system. I think it should be doable for ~100 nodes and several thousand jobs.

@dongahn
Copy link
Member

dongahn commented Aug 13, 2016

@SteVwonder: agreed. Does this function help?

https://github.com/flux-framework/flux-sched/blob/master/resrc/resrc_tree.c#L147

Once a json object is built from the top resrc, one can use json stringfy function to dump it?

e.g., json_object_to_json_string_ext (o, JSON_C_TO_STRING_PRETTY))

@dongahn
Copy link
Member

dongahn commented Aug 13, 2016

@SteVwonder: BTW, unlike under an emulatuon/sumulaion, it just dawned on me that schedulung under a real system is not necessarily deterministic. Even if we try to control a job's running time tightly (using sleep etc) , the system noise will make it pretty difficult to control the order of resource and job events.

Say job 1 using 2 nodes is supposed to end 10 seconds early than job 2, which then can run the highest priority reserved job (job e requesting 2 nodes). But system noise can delay freeing up of job 1 so that job 2 using 1 node gets deallocated first. Then, insteas if job 3, job4 which only requests 1 node may get backfilled. This kind ca change the schedule on the same workload run to run. And this will be pretty prevalent for a stress test workload.

I think this really motivates fully validating under emulation and complementing that by using some best effort invariant based solution if we need additional assurance for the scheduler's behavior under a real system.

@dongahn
Copy link
Member

dongahn commented Aug 13, 2016

Another Sat. morning idea while jogging... what's deterministic is:

given a totally ordered job and resource events, each changing the current resource state, schuduling must be determininstic. Then one addtional idea is not only gather the workload (jobs) but also collect these state changing evets. Then these can be fed into a validator for correctness check?

@dongahn
Copy link
Member

dongahn commented Aug 13, 2016

Final thoguht. If HOC doean't have paper or SW in this area, this can land a pretty nice publication. Essenntialy it is becoming increasing difficult to validate if an algoritm or an implenmentaion of this algorithm correctly enforce the target schedulig policy in particular in the midst of more constraints to consider. We can propose a conceptual model that can lead to a scheduler/rm agbostic validator and show the result suojg flux as prrof by demonstration.

@dongahn
Copy link
Member

dongahn commented Mar 31, 2018

Traditional emulator and other test cases currently verify the correctness.

@dongahn dongahn closed this as completed Mar 31, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants