Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Differentiate between compute and network based occupancy #7004

Open
Tracked by #6993
fjetter opened this issue Sep 5, 2022 · 0 comments
Open
Tracked by #6993

Differentiate between compute and network based occupancy #7004

fjetter opened this issue Sep 5, 2022 · 0 comments
Labels
diagnostics discussion Discussing a topic with no specific actions yet enhancement Improve existing functionality or make things work better performance scheduler scheduling stealing

Comments

@fjetter
Copy link
Member

fjetter commented Sep 5, 2022

Occupancy is an estimation of work the scheduler assigns to every worker. We compute this value in Scheduler._set_duration_estimate which is invoked in a couple of places

Occupancy is measured in seconds and is calculated by summing the expected processing time of all tasks assigned to a worker. At all times, the invariant sum(ws.processing.values()) ~ ws.occupancy should hold (modulo floating point arithmetic errors).

This processing time is defined as TaskPrefix.duration_average + get_comm_cost(TaskState, WorkerState), i.e. the average compute duration of the TaskPrefix (see Scheduler.get_task_duration) and the estimated time to transfer all dependencies that are not, yet on that worker, see Scheduler.get_comm_cost

Occupancy is used for four purposes

  • Scheduler.total_occupancy (sum over all workers) is used to define an adaptive target
  • Scheduler.total_occupancy (sum over all workers) is used to estimate worker saturation
  • WorkerState.processing to calculate the steal_time ratio in work stealing
  • WorkerState.occupancy for making a scheduling decision in Scheduler.worker_objective

With the exception of the work stealing case, all other examples are very specifically referring to the number of worker threads. Worker threads do not impact network/gather data performance.

Taking Scheduler.worker_objective trying to calculate start_time as an example, the actual start time should rather be

wait_time_cpu: float = ws.compute_occupancy / ws.nthreads
wait_time_transfer: float = (ws.network_occupancy + comm_nbytes) / bandwidth

start_time = max(wait_time_transfer, wait_time_cpu)

This would likely increase the quality of our scheduling decisions and would very clearly avoid double counting problems like #7003

On top, this would add a significant observability component since we would directly visualize how much network vs compute work is expected from a worker. I could also see a ratio of the two values to be an interesting metric to track (similar to what work stealing is trying to do with the steal ratio)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
diagnostics discussion Discussing a topic with no specific actions yet enhancement Improve existing functionality or make things work better performance scheduler scheduling stealing
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant