A linear job search performs poorly for HTC load #182

dongahn · 2016-08-22T19:03:33Z

We need an optimization for the linear job search scheme as captured in flux-distribution#14: specifically, this code. Once an optimization is done, please rerun some tests.

dongahn · 2016-08-22T20:55:39Z

The scheduler currently has three job queues: pending, running and complete queues. And it only performs very basic operations on these queues: find a job across all three queues; append a job to a queue; and finally move a job from one queue to another (e.g., from the pending queue to running queue).

An obvious optimization is to add indexing based on job id.

Given the scheduler only needs limited operations on our queues, an easy solution would be to add a zhash_t object and index each job into it on the first enqueue event (i.e., into the pending queue). Then, the remaining change will be to remove a job from this hash table when it is reaped from the complete queue -- which isn't implemented yet (So I will mark it as TODO in the code); and make sure the table is initialized/finalized properly.

This will add a small run overhead on each job append, but will turn O(N) to O(1) for q_find_job() where N is the number of jobs in the queues. And also slight increase in memory of course.

Alternatively, you can have a lookup table per queue, but that seems an overkill at this point and adds higher overheads as this will incur one additional delete on each job move.

Also one can introduce a composite data structure which abstracts out these operations (traditional enqueue + constant find). But I propose that we do this if and when more complex operations are needed on our job queues.

dongahn · 2016-08-23T03:35:38Z

This optimization alone doesn't seem to help much:

Jobs Executed Per Second (JEPS) at <2 nodes, 1 broker per node, unit sizing policy, non-mpi sleep, FCFS>: 1.77

I realized that q_move_to_rqueue() and q_move_to_cqueue() can also perform poorly as they use zlist_remove(), which also uses a linear search as in here. This needs to be optimized too. Now that I think about this, these may not be that bad w/ FCFS scheduling. Perhaps, #183 is a bigger fish to fry.

lipari · 2016-09-13T22:01:07Z

Close via #190

dongahn mentioned this issue Aug 22, 2016

Quantify the impact of and optimize the schedule loop's job iteration scheme #183

Closed

dongahn mentioned this issue Sep 3, 2016

optimizations: add queue-depth and better job search support #190

Merged

lipari closed this as completed Sep 13, 2016

dongahn mentioned this issue Mar 16, 2018

splash app tracking issue flux-framework/flux-core#1358

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A linear job search performs poorly for HTC load #182

A linear job search performs poorly for HTC load #182

dongahn commented Aug 22, 2016

dongahn commented Aug 22, 2016

dongahn commented Aug 23, 2016

lipari commented Sep 13, 2016

A linear job search performs poorly for HTC load #182

A linear job search performs poorly for HTC load #182

Comments

dongahn commented Aug 22, 2016

dongahn commented Aug 22, 2016

dongahn commented Aug 23, 2016

lipari commented Sep 13, 2016