Differentiate between compute and network based occupancy #7004

fjetter · 2022-09-05T16:05:40Z

Occupancy is an estimation of work the scheduler assigns to every worker. We compute this value in Scheduler._set_duration_estimate which is invoked in a couple of places

transition processing->memory (iff previously unknown duration)
_add_to_processing (i.e. whenever we assign a task to a worker)
_reevaluate_occupancy_worker (periodically if scheduler CUP load allows is)

Occupancy is measured in seconds and is calculated by summing the expected processing time of all tasks assigned to a worker. At all times, the invariant sum(ws.processing.values()) ~ ws.occupancy should hold (modulo floating point arithmetic errors).

This processing time is defined as TaskPrefix.duration_average + get_comm_cost(TaskState, WorkerState), i.e. the average compute duration of the TaskPrefix (see Scheduler.get_task_duration) and the estimated time to transfer all dependencies that are not, yet on that worker, see Scheduler.get_comm_cost

Occupancy is used for four purposes

Scheduler.total_occupancy (sum over all workers) is used to define an adaptive target
Scheduler.total_occupancy (sum over all workers) is used to estimate worker saturation
WorkerState.processing to calculate the steal_time ratio in work stealing
WorkerState.occupancy for making a scheduling decision in Scheduler.worker_objective

With the exception of the work stealing case, all other examples are very specifically referring to the number of worker threads. Worker threads do not impact network/gather data performance.

Taking Scheduler.worker_objective trying to calculate start_time as an example, the actual start time should rather be

wait_time_cpu: float = ws.compute_occupancy / ws.nthreads
wait_time_transfer: float = (ws.network_occupancy + comm_nbytes) / bandwidth

start_time = max(wait_time_transfer, wait_time_cpu)

This would likely increase the quality of our scheduling decisions and would very clearly avoid double counting problems like #7003

On top, this would add a significant observability component since we would directly visualize how much network vs compute work is expected from a worker. I could also see a ratio of the two values to be an interesting metric to track (similar to what work stealing is trying to do with the steal ratio)

The text was updated successfully, but these errors were encountered:

fjetter added enhancement Improve existing functionality or make things work better diagnostics performance discussion Discussing a topic with no specific actions yet scheduling scheduler labels Sep 5, 2022

This was referenced Sep 5, 2022

Timeboxed push for simplifying work stealing #6993

Closed

Root-ish tasks all schedule onto one worker #6573

Closed

gjoseph92 mentioned this issue Sep 7, 2022

Track CPU and network occupancy separately #7020

Closed

2 tasks

fjetter mentioned this issue Sep 9, 2022

Allow very fast keys and very expensive transfers as stealing candidates #7022

Merged

fjetter added the stealing label Sep 9, 2022

fjetter mentioned this issue Sep 9, 2022

Accurate occupancy calculation / occupancy replacement #7027

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Differentiate between compute and network based occupancy #7004

Differentiate between compute and network based occupancy #7004

fjetter commented Sep 5, 2022 •

edited

Loading

Differentiate between compute and network based occupancy #7004

Differentiate between compute and network based occupancy #7004

Comments

fjetter commented Sep 5, 2022 • edited Loading

fjetter commented Sep 5, 2022 •

edited

Loading