Scheduler ignores current resource consumption levels while scheduling #6467

gjoseph92 · 2022-05-27T02:26:41Z

Currently on the scheduler, when a task is assigned to a worker and consumes resources, that's set in one place. When deciding whether a task can be assigned to a worker, that's checked in a different place. Therefore, current resource consumption levels are not considered in task scheduling.

The current scheduling appears to just consider which workers can run a task in theory: do they have enough of the resource to be able to run this task ever (even if none of it is available right now)?

Considering resources like GPUs, I suppose this makes sense: queuing extra tasks onto workers is beneficial so there's no idleness. Still, it's a little surprising. And the fact that worker_objective doesn't take current resource consumption into account seems likely to cause bad scheduling, since we could easily assign a task to a worker whose resource is currently used up, when there are other workers with the resource available.

When a task gets assigned to a worker, consume_resources only adjusts the count in WorkerState.used_resources:

distributed/distributed/scheduler.py

Lines 2674 to 2675 in e0ea5df

    
           for r, required in ts.resource_restrictions.items(): 
        
               ws.used_resources[r] += required

But SchedulerState.valid_workers looks for which workers can run a task, it only checks self.resouces[resource][address], and never looks at WorkerState.used_resources:

distributed/distributed/scheduler.py

Lines 2644 to 2652 in e0ea5df

    
           for resource, required in ts.resource_restrictions.items(): 
        
               dr: dict = self.resources.get(resource)  # type: ignore 
        
               if dr is None: 
        
                   self.resources[resource] = dr = {} 
        
               sw: set = set() 
        
               for addr, supplied in dr.items(): 
        
                   if supplied >= required: 
        
                       sw.add(addr)

So tasks will not enter the no-worker state just because all resources in the cluster are currently used up.

Instead, as usual, more tasks will get queued onto workers than they can run at once. Each worker will manage only running the correct number of tasks at once.

Is this intentional?
Why do we track resource counts in both self.resources and WorkerState.resources?
Why do we bother tracking WorkerState.used_resources if it's never actually used for scheduling decisions?

Note that changing this behavior would likely provide a viable temporary solution for #6360, a very common pain point for many users.

cc @mrocklin @fjetter

The text was updated successfully, but these errors were encountered:

elementace · 2024-05-23T06:32:08Z

FWIW, I'm running into a problem where XGBoost does some weird thread management per node in my cluster. If the sum of the 'njobs' of all assigned xgboost training tasks to a node is > # of vCPUs, then it uses only 1 vCPU in total for all tasks.

Hence I was looking to worker resource management to solve this problem (by making each require the whole resource of the worker).

The problem then stemming from this ticket is it successfully completes the first task by itself using the full compute capacity (all 32 cores), and then tries to run all the remaining tasks at the same time (without assessing the resources available and queuing 1 at a time accordingly). Resulting in the cpu utilisation dropping to 1 / 32 cores.

Has there been any progress on this in discussions @gjoseph92 ? @mrocklin ?

gjoseph92 added bug Something is broken discussion Discussing a topic with no specific actions yet and removed bug Something is broken labels May 27, 2022

This was referenced May 27, 2022

[DNM] Don't queue resource tasks; wait for availability #6468

Draft

Ease memory pressure by deprioritizing root tasks? #6360

Open

gjoseph92 mentioned this issue Dec 13, 2022

Assign tasks to idle workers rather than queuing them #7384

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scheduler ignores current resource consumption levels while scheduling #6467

Scheduler ignores current resource consumption levels while scheduling #6467

gjoseph92 commented May 27, 2022

elementace commented May 23, 2024

Scheduler ignores current resource consumption levels while scheduling #6467

Scheduler ignores current resource consumption levels while scheduling #6467

Comments

gjoseph92 commented May 27, 2022

elementace commented May 23, 2024