-
-
Notifications
You must be signed in to change notification settings - Fork 719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve work stealing for scaling situations #4920
base: main
Are you sure you want to change the base?
Conversation
I completely forgot, performance reports Before After |
distributed/stealing.py
Outdated
|
||
thief = min( | ||
thieves, key=partial(self.scheduler.worker_objective, ts) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How many thieves can there be here? If the answer is "maybe a lot" then we might need to be careful here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also curious how much impact this choice in particular had on the workload we're testing against.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How many thieves can there be here? If the answer is "maybe a lot" then we might need to be careful here.
Worst case situation would be if all but one worker are idling. then we'd have W-1
potential thieves. I thought about sampling this down to 20 or smth like that to avoid any extreme cases. Sampling and picking the best of the sample still sounds better than random. However...
I'm also curious how much impact this choice in particular had on the workload we're testing against.
Nothing, at all. This is the one change which didn't actually impact this particular example. It just felt right? idk. That's one of the things I want to test on different workloads
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In particular for workloads with actual payload data. If the dependents don't have data, this is a rather extreme stealing edge case and I'd like to verify that this also behaves nicely for proper payloads
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In decide_worker we choose the best worker if there are less than 20 workers and we choose an incrementing worker if there are more than 20 workers.
In practice though I doubt that it matters much. The difference in occupancy between thieves and victims means that we're making the world significantly better with the dumb change. If this results in a suboptimal world that's ok... we'll just improve it again in a future cycle if it becomes a problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In decide_worker we choose the best worker if there are less than 20 workers and we choose an incrementing worker if there are more than 20 workers.
I also thought about just using decide_worker. After all, why would the logic be any different (given that decide_worker takes a whitelist <-> thieves). However, the important logic is in the objective function which is why I picked that. If this things turns out to be valuable, we can have a closer look
In practice though I doubt that it matters much. The difference in occupancy between thieves and victims means that we're making the world significantly better with the dumb change.
True. As I said, the current problem doesn't actually trigger anything where this would matter. If I can not prove an improvement, I'll remove it again.
The difference in occupancy
The only thing I would like to leverage here is to pick a thief with dependencies, if that exists. that's implicitly done by the worker objective but could also be made explicit. My gut tells me this can make a difference but you might be right that this is only marginal.
we'll just improve it again in a future cycle if it becomes a problem.
There will not be another cylce to correct this unless the worker becomes saturated soon-ish, will there?. Non-saturated workers are not victimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There will not be another cylce to correct this unless the worker becomes saturated soon-ish, will there?. Non-saturated workers are not victimized.
Correct, but the stolen task probably won't be run immediately anyway. It's probably at the end of the queue. As the computation continues progressing and some workers become more free before others the saturated/non-saturated split will grow and trigger another cycle of transfers if necessary.
FYI tests break because the stealing now behaves differently and the tests reflect that (good!). just wanted to get this out early in case smbd has input already. |
distributed/scheduler.py
Outdated
ts: TaskState, all_workers, valid_workers: set, objective | ||
ts: TaskState, | ||
all_workers: set, | ||
valid_workers: Optional[set], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cython doesn't handling typing
objects like Optional
yet. Though it already happily accepts None
and checks for that
valid_workers: Optional[set], | |
valid_workers: set, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added optional mostly for type checking, e.g. in IDEs. is cython having trouble with this or does it simply ignore it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AIUI these could be kind of expensive if we don't need them, which is why the original code didn't coerce them to set
s. However it was that way even prior to Cythonizing the Scheduler
Summarizing things a bit here, I'm seeing two changes:
|
Change 2B: relax the sentinel values in
I like this idea so much that I wish it was my own ;) #4920 (comment) |
That's ok with me
Whoops! Well then it sounds like it must be a great idea :) |
Ah, my thinking (or rather Gabe's thinking) was that you would keep the workers with dependencies in the set, but then mix in a few others from outside of the set. |
Right... that sounds straight forward. there is a saying in germany "Manchmal sieht man den Wald for lauter Baeumen nicht" which translates loosely to "Sometimes you loose sight of the forest because there are too many trees." |
This saying is commonly known in English as well. It's a good saying. |
(Code still needs to be updated) I removed all the changes around decide_worker and solely focused on the blacklisting of main / defaultwithout blacklisting "expensive" tasksWhat we can see is that the occupancy (timeseries chart, top) is much more effectively balanced if the small tasks are not blacklisted. I do not have proper perf reports or screenshots but the task stream density behaves similarly that the well balanced occupancy is much denser. |
dc9586d
to
f42faec
Compare
if not thief: | ||
thief = thieves[i % len(thieves)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: might be easier to read if this was moved into _maybe_pick_thief
(and it was no longer maybe
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, I figured this is nicer because it felt weird passing i
to _maybe_pick_thief
.
# If there are potential thieves with dependencies we | ||
# should prefer them and pick the one which works best. | ||
# Otherwise just random/round robin | ||
if thieves_with_data: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If, say, just 1 idle worker holds dependencies, is it possible for that worker to get slammed with new tasks because of this? What has to happen between picking a worker as a thief and it getting removed from the idle set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If, say, just 1 idle worker holds dependencies, is it possible for that worker to get slammed with new tasks because of this
Yes, this may hurt, particularly for sub-topologies like
flowchart BT
A --> B1
A --> B2
A --> B3
where A is very small and B* is very compute heavy, i.e. steal_ratio will always be small and tasks are even allowed to be stolen from "public", i.e. non-saturated workers. I believe for other cases, i.e. if tasks are just stolen from saturated workers, this should be OK since decide_worker is much smarter with initial task placement.
What has to happen between picking a worker as a thief and it getting removed from the idle set?
Steal request must be confirmed which updates occupancies and recalculates the idle set. Somewhere on this PR a comment of mine is suggesting that this _mabye_pick_thief
should incorporate in_flight_occupancy
to reflect for the time until this happens.
Dropping the sentinels will emphasize the importance of proper measurements (#6115 (comment)) which is where I ran out of time last year. I myself would've only picked this up again after having a few solid benchmarks up and running. alternatively a lot of manual testing, maybe both. |
These are a few preliminary fixes which would close #4471. While this does not necessarily account for opportunity cost as I described in #4471 (comment) it still remedies the situation by
Tests missing (TODO), feedback welcome
black distributed
/flake8 distributed
/isort distributed