-
-
Notifications
You must be signed in to change notification settings - Fork 719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Withhold root tasks [no co assignment] #6614
Withhold root tasks [no co assignment] #6614
Commits on Jun 22, 2022
-
Idea was that if a `SortedSet` of unrunnable tasks is too expensive, then insertion order is probably _approximately_ priority order, since higher-priority (root) tasks will be scheduled first. This would give us O(1) for all necessary operations, instead of O(logn) for adding and removing. Interestingly, the SortedSet implementation could be hacked to support O(1) `pop` and `popleft`, and inserting a min/max value. In the most common case (root tasks), we're always inserting a value that's greater than the max. Something like this might be the best tradeoff, since it gives us O(1) in the common case but still maintains the sorted gaurantee, which is easier to reason about.
Configuration menu - View commit details
-
Copy full SHA for afedccd - Browse repository at this point
Copy the full SHA afedccdView commit details -
Configuration menu - View commit details
-
Copy full SHA for 6b6651b - Browse repository at this point
Copy the full SHA 6b6651bView commit details -
Configuration menu - View commit details
-
Copy full SHA for 6225d1a - Browse repository at this point
Copy the full SHA 6225d1aView commit details -
improve reasonableness of task-state order
Now task states on the dashboard are listed in the logical order that tasks transition through.
Configuration menu - View commit details
-
Copy full SHA for 1496abb - Browse repository at this point
Copy the full SHA 1496abbView commit details -
Configuration menu - View commit details
-
Copy full SHA for 7457865 - Browse repository at this point
Copy the full SHA 7457865View commit details -
Only support floats for
worker-oversaturation
Simpler, though I think basically just an int of 1 may be the most reasonable.
Configuration menu - View commit details
-
Copy full SHA for 67e9bd2 - Browse repository at this point
Copy the full SHA 67e9bd2View commit details -
Configuration menu - View commit details
-
Copy full SHA for 2410a82 - Browse repository at this point
Copy the full SHA 2410a82View commit details -
Configuration menu - View commit details
-
Copy full SHA for 49d5ddd - Browse repository at this point
Copy the full SHA 49d5dddView commit details -
driveby: WIP color task graph by worker
This is just a hack currently, but maybe it would actually be useful?
Configuration menu - View commit details
-
Copy full SHA for b546997 - Browse repository at this point
Copy the full SHA b546997View commit details -
Revert "driveby: WIP color task graph by worker"
This reverts commit df11f719b59aad11f39a27ccae7b2fd4dfd9243a.
Configuration menu - View commit details
-
Copy full SHA for 2b44820 - Browse repository at this point
Copy the full SHA 2b44820View commit details -
Configuration menu - View commit details
-
Copy full SHA for e494e87 - Browse repository at this point
Copy the full SHA e494e87View commit details -
Configuration menu - View commit details
-
Copy full SHA for ad417ed - Browse repository at this point
Copy the full SHA ad417edView commit details -
Configuration menu - View commit details
-
Copy full SHA for b4c698e - Browse repository at this point
Copy the full SHA b4c698eView commit details -
Fix co-assignment logic to consider queued tasks
When there were multiple root task groups, we were just re-using the last worker for every batch because it had nothing processing on it. Unintentionally this also fixes dask#6597 in some cases (because the first task goes to processing, but we measure queued, so we pick the same worker for both task groups)
Configuration menu - View commit details
-
Copy full SHA for aa4e531 - Browse repository at this point
Copy the full SHA aa4e531View commit details -
Revert "unused:
OrderedSet
collection"This reverts commit fdd5fd9.
Configuration menu - View commit details
-
Copy full SHA for b514e84 - Browse repository at this point
Copy the full SHA b514e84View commit details -
Configuration menu - View commit details
-
Copy full SHA for 1835a89 - Browse repository at this point
Copy the full SHA 1835a89View commit details -
WIP identify root task families
1. The family metric itself is flawed. Added linear chain traversal, but it's still not good. The maxsize is problematic and probably the wrong way to think about it? a) there's quite likely no maxsize parameter that will ever be right, because you could always have multiple independent crazy substructures that are each maxsize+1. b) even when every task would be in the same family because they're all interconnected, there's still benefit to scheduling subsequent things together, even if you do partition. Minimizing priority partitions is always what you want. Maybe there's something where maxsize is not a hard cutoff, but a cutoff for where to split up interconnected structures? 2. Families probably need to be data structures? When a task completes, you'd like to know if it belongs to a family that actually has more tasks to run on that worker, vs the task just happens to look like it belongs to a family but was never scheduled as a rootish task. Overall I like the family structure for scheduling up/down scaling, but figuring out how to identify them is tricky. Partitioning priority order is great because it totally avoids this problem, of course at the expense of scaling. Can we combine priority and graph structure to identify isolated families when reasonable, partition on priority when not?
Configuration menu - View commit details
-
Copy full SHA for db42c22 - Browse repository at this point
Copy the full SHA db42c22View commit details -
Configuration menu - View commit details
-
Copy full SHA for 0f6603c - Browse repository at this point
Copy the full SHA 0f6603cView commit details -
Configuration menu - View commit details
-
Copy full SHA for e10fdca - Browse repository at this point
Copy the full SHA e10fdcaView commit details -
Configuration menu - View commit details
-
Copy full SHA for 3eb1d68 - Browse repository at this point
Copy the full SHA 3eb1d68View commit details -
Update docstring and add back logic for queuing disabled case
Configuration menu - View commit details
-
Copy full SHA for c685b3c - Browse repository at this point
Copy the full SHA c685b3cView commit details -
Configuration menu - View commit details
-
Copy full SHA for e1dda98 - Browse repository at this point
Copy the full SHA e1dda98View commit details -
Configuration menu - View commit details
-
Copy full SHA for f811246 - Browse repository at this point
Copy the full SHA f811246View commit details -
Configuration menu - View commit details
-
Copy full SHA for d347b32 - Browse repository at this point
Copy the full SHA d347b32View commit details -
worker-oversaturation -> worker-saturation
Just easier to explain this way
Configuration menu - View commit details
-
Copy full SHA for 1990dd7 - Browse repository at this point
Copy the full SHA 1990dd7View commit details -
Configuration menu - View commit details
-
Copy full SHA for be1b9ca - Browse repository at this point
Copy the full SHA be1b9caView commit details -
Configuration menu - View commit details
-
Copy full SHA for 85f9120 - Browse repository at this point
Copy the full SHA 85f9120View commit details -
I think this fix is reasonable? I wonder if occupancy should include queued tasks though?
Configuration menu - View commit details
-
Copy full SHA for bb08c8d - Browse repository at this point
Copy the full SHA bb08c8dView commit details
Commits on Jun 23, 2022
-
Test releasing previously queued paused tasks
Tasks shouldn't be both `no-worker` and in the queue. If all workers are paused, tasks will currently to go `no-worker`, even if they're queued. If we then try to schedule them (because a slot opens up from task completion, tasks released, new worker joining, etc.) we find an invalid state.
Configuration menu - View commit details
-
Copy full SHA for 966d61f - Browse repository at this point
Copy the full SHA 966d61fView commit details -
driveby: fix transition debug log end state
This was logging the actual end state, instead of the recommended end state
Configuration menu - View commit details
-
Copy full SHA for 15494f0 - Browse repository at this point
Copy the full SHA 15494f0View commit details -
Refactor scheduling when no workers are running
If all workers were paused, we would put tasks in the `no-worker` state. Now that `queued` is a thing, we want queued tasks in this case to just stay on the queue, and not be added to `unrunnable`. This commit takes the opposite of @crusaderky's view in https://github.com/dask/distributed/pull/5665/files#r787886583, and makes `idle` always a subset of `running`. Even if pedantically, the name `idle` isn't quite accurate, `idle` is typically _used_ as the set of "prime candidate for new tasks", so we make it that way. We do this to maintain the invariant that `valid_workers` always returns None if the task doesn't have restrictions. Our root task detection logic relied on this, as did the `not ts.loose_restrictions` check. Otherwise, when some workers are paused, root tasks will no longer be scheduled in the typical way. There are other approaches here which might be simpler, which I'll explore in following commits.
Configuration menu - View commit details
-
Copy full SHA for 546aa4a - Browse repository at this point
Copy the full SHA 546aa4aView commit details -
Don't send queued tasks to no-worker
A way more minimal fix than 5b9d825afb9ab3a61ab22afef3b047dde238bc5f, but not ideal because if only some workers are paused, we'll get root task overproduction on the others (because having `valid_workers` bypasses the root task detection logic).
Configuration menu - View commit details
-
Copy full SHA for ffbb53b - Browse repository at this point
Copy the full SHA ffbb53bView commit details -
Schedule rootish tasks when some workers are paused
`valid_workers` will return a set if some workers are paused, even if the task doesn't have restrictions. This is anoying and a bit misleading, but possibly less intrusive of a change than 5b9d825afb9ab3a61ab22afef3b047dde238bc5f?
Configuration menu - View commit details
-
Copy full SHA for 65735f8 - Browse repository at this point
Copy the full SHA 65735f8View commit details -
Configuration menu - View commit details
-
Copy full SHA for 6bf710c - Browse repository at this point
Copy the full SHA 6bf710cView commit details -
Decrease test_root_task_overproduction size
Workers seem to be running out of memory on CI. Probably different base unmanaged memory sizes than my machine. This is tricky.
Configuration menu - View commit details
-
Copy full SHA for 25e6f3b - Browse repository at this point
Copy the full SHA 25e6f3bView commit details -
Configuration menu - View commit details
-
Copy full SHA for 988b0cf - Browse repository at this point
Copy the full SHA 988b0cfView commit details -
Configuration menu - View commit details
-
Copy full SHA for b86fe0f - Browse repository at this point
Copy the full SHA b86fe0fView commit details -
Fix co-assignment for binary operations
Bit of a hack, but closes dask#6597. I'd like to have a better metric for the batch size, but I think this is about as good as we can get. Any reasonably large number will do here.
Configuration menu - View commit details
-
Copy full SHA for 7ebd1d9 - Browse repository at this point
Copy the full SHA 7ebd1d9View commit details
Commits on Jun 24, 2022
-
Turn withholding off by default
Want to see if CI passes. This would be retaining current scheduling behavior. Task withholding would be behind a feature flag.
Configuration menu - View commit details
-
Copy full SHA for 034f980 - Browse repository at this point
Copy the full SHA 034f980View commit details
Commits on Aug 17, 2022
-
Configuration menu - View commit details
-
Copy full SHA for 0af53b4 - Browse repository at this point
Copy the full SHA 0af53b4View commit details -
Remove redundant insert into
idle
Already covered by `if p < nc` in `check_idle_saturated`. But the one removed here didn't check for `status == Status.running`
Configuration menu - View commit details
-
Copy full SHA for a63d25b - Browse repository at this point
Copy the full SHA a63d25bView commit details -
Configuration menu - View commit details
-
Copy full SHA for 5f7e7f1 - Browse repository at this point
Copy the full SHA 5f7e7f1View commit details -
Configuration menu - View commit details
-
Copy full SHA for 9aeecc9 - Browse repository at this point
Copy the full SHA 9aeecc9View commit details -
Configuration menu - View commit details
-
Copy full SHA for dcb11e4 - Browse repository at this point
Copy the full SHA dcb11e4View commit details -
Configuration menu - View commit details
-
Copy full SHA for 7dfc83e - Browse repository at this point
Copy the full SHA 7dfc83eView commit details
Commits on Aug 18, 2022
-
Configuration menu - View commit details
-
Copy full SHA for c99bbe8 - Browse repository at this point
Copy the full SHA c99bbe8View commit details -
Configuration menu - View commit details
-
Copy full SHA for c1544f3 - Browse repository at this point
Copy the full SHA c1544f3View commit details -
Configuration menu - View commit details
-
Copy full SHA for 349712f - Browse repository at this point
Copy the full SHA 349712fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 594585e - Browse repository at this point
Copy the full SHA 594585eView commit details -
Configuration menu - View commit details
-
Copy full SHA for b4f843d - Browse repository at this point
Copy the full SHA b4f843dView commit details -
fix
test_saturation_factor
againApparently they're just unpredictable
Configuration menu - View commit details
-
Copy full SHA for 2db4db9 - Browse repository at this point
Copy the full SHA 2db4db9View commit details
Commits on Aug 19, 2022
-
Configuration menu - View commit details
-
Copy full SHA for 8395ef4 - Browse repository at this point
Copy the full SHA 8395ef4View commit details -
hackily consider queue in adaptive target
TODO this is one of the main things unhandled in this PR: how do we address occupancy? Do queued tasks contribute to total occupancy or not? In either case, how is that implemented?? (I don't really want to make a `queued_occ` dict tracking per-task occupancy, like we have for processing; that feels like overkill.)
Configuration menu - View commit details
-
Copy full SHA for 36a60a5 - Browse repository at this point
Copy the full SHA 36a60a5View commit details -
Configuration menu - View commit details
-
Copy full SHA for da04438 - Browse repository at this point
Copy the full SHA da04438View commit details -
Configuration menu - View commit details
-
Copy full SHA for c92236c - Browse repository at this point
Copy the full SHA c92236cView commit details -
Configuration menu - View commit details
-
Copy full SHA for e990b92 - Browse repository at this point
Copy the full SHA e990b92View commit details -
I mistakenly thought that in the transitions loop, new recommendations were processed after old ones. I believe it's the opposite (`dict.update` will add the new items at the end, `dict.popitem` will pop those new items off the end). It wouldn't be too hard to sort all the recommendations here, just some extra allocations and copies.
Configuration menu - View commit details
-
Copy full SHA for 0d21c78 - Browse repository at this point
Copy the full SHA 0d21c78View commit details
Commits on Aug 23, 2022
-
Configuration menu - View commit details
-
Copy full SHA for b40cec1 - Browse repository at this point
Copy the full SHA b40cec1View commit details -
Configuration menu - View commit details
-
Copy full SHA for 12b94d0 - Browse repository at this point
Copy the full SHA 12b94d0View commit details -
Configuration menu - View commit details
-
Copy full SHA for 4c0e768 - Browse repository at this point
Copy the full SHA 4c0e768View commit details
Commits on Aug 24, 2022
-
Co-authored-by: crusaderky <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 14cc157 - Browse repository at this point
Copy the full SHA 14cc157View commit details -
Configuration menu - View commit details
-
Copy full SHA for f5d7be4 - Browse repository at this point
Copy the full SHA f5d7be4View commit details -
Configuration menu - View commit details
-
Copy full SHA for 704b485 - Browse repository at this point
Copy the full SHA 704b485View commit details -
Configuration menu - View commit details
-
Copy full SHA for 38e0598 - Browse repository at this point
Copy the full SHA 38e0598View commit details -
Split up
decide_worker
, remove recsThis overhauls `decide_worker` into separate methods for different cases. More importantly, it explicitly turns `transition_waiting_processing` into the primary dispatch mechanism for ready tasks. All ready tasks (deps in memory) now always get recommended to processing, regardless of whether there are any workers in the cluster, whether the have restrictions, whether they're root-ish, etc. `transition_waiting_processing` then decides how to handle them (depending on whether they're root-ish or not), and calls the appropriate `decide_worker` method to search for a worker. If a worker isn't available, then it recommends them off to `queued` or `no-worker` (depending, again, on whether they're root-ish and the WORKER_SATURATION setting). This also updates the `no-worker` state to better match `queued`. Before, `bulk_schedule_after_adding_worker` would send `no-worker` tasks to `waiting`, which would then send them to `processing`. This was weird, because in order to be in `no-worker`, they should already be ready to run (just in need of a worker). So going straight to `processing` makes more sense than sending a ready task back to waiting. Finally, this adds a `SchedulerState.is_rootish` helper. Not quite the static field on a task @fjetter wants in dask#6922, but a step in that direction.
Configuration menu - View commit details
-
Copy full SHA for d47e80d - Browse repository at this point
Copy the full SHA d47e80dView commit details -
remove no_worker->memory just to see what happens
The only valid way I can imagine any of these happening is `client.scatter` within a worker. If this is actually needed, I guess I should add an equivalent for queued?
Configuration menu - View commit details
-
Copy full SHA for 2cc8631 - Browse repository at this point
Copy the full SHA 2cc8631View commit details -
Configuration menu - View commit details
-
Copy full SHA for 100118a - Browse repository at this point
Copy the full SHA 100118aView commit details -
Configuration menu - View commit details
-
Copy full SHA for 842ee71 - Browse repository at this point
Copy the full SHA 842ee71View commit details -
Configuration menu - View commit details
-
Copy full SHA for 494fe48 - Browse repository at this point
Copy the full SHA 494fe48View commit details -
Configuration menu - View commit details
-
Copy full SHA for dd88b0d - Browse repository at this point
Copy the full SHA dd88b0dView commit details -
Revert "remove no_worker->memory just to see what happens"
This reverts commit 2cc8631.
Configuration menu - View commit details
-
Copy full SHA for e17c624 - Browse repository at this point
Copy the full SHA e17c624View commit details
Commits on Aug 25, 2022
-
Configuration menu - View commit details
-
Copy full SHA for 06d60fe - Browse repository at this point
Copy the full SHA 06d60feView commit details -
Configuration menu - View commit details
-
Copy full SHA for 96d59eb - Browse repository at this point
Copy the full SHA 96d59ebView commit details -
test_root_task_overproduction
adaptive data sizeStill maybe not a test that should run in CI, I just like how real-world it is. Let's see if picking the task size based on available memory helps on windows.
Configuration menu - View commit details
-
Copy full SHA for 3240a43 - Browse repository at this point
Copy the full SHA 3240a43View commit details -
Configuration menu - View commit details
-
Copy full SHA for 9344dd9 - Browse repository at this point
Copy the full SHA 9344dd9View commit details
Commits on Aug 26, 2022
-
Configuration menu - View commit details
-
Copy full SHA for aa8e1db - Browse repository at this point
Copy the full SHA aa8e1dbView commit details -
Configuration menu - View commit details
-
Copy full SHA for ee1a754 - Browse repository at this point
Copy the full SHA ee1a754View commit details -
Configuration menu - View commit details
-
Copy full SHA for f36a6ac - Browse repository at this point
Copy the full SHA f36a6acView commit details -
Configuration menu - View commit details
-
Copy full SHA for 78353e1 - Browse repository at this point
Copy the full SHA 78353e1View commit details -
Configuration menu - View commit details
-
Copy full SHA for 18b7bb5 - Browse repository at this point
Copy the full SHA 18b7bb5View commit details -
Configuration menu - View commit details
-
Copy full SHA for f3a66df - Browse repository at this point
Copy the full SHA f3a66dfView commit details -
Configuration menu - View commit details
-
Copy full SHA for c5f2746 - Browse repository at this point
Copy the full SHA c5f2746View commit details -
Configuration menu - View commit details
-
Copy full SHA for 4b2a209 - Browse repository at this point
Copy the full SHA 4b2a209View commit details -
remove
test_oversaturation_multiple_task_groups
will add it back when we actually implement co-assignment
Configuration menu - View commit details
-
Copy full SHA for 3cebe54 - Browse repository at this point
Copy the full SHA 3cebe54View commit details -
Configuration menu - View commit details
-
Copy full SHA for 51dca31 - Browse repository at this point
Copy the full SHA 51dca31View commit details
Commits on Aug 27, 2022
-
Co-authored-by: crusaderky <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 14dc850 - Browse repository at this point
Copy the full SHA 14dc850View commit details -
Configuration menu - View commit details
-
Copy full SHA for b36064e - Browse repository at this point
Copy the full SHA b36064eView commit details -
skip
test_root_task_overproduction
on windowsI don't understand why it's flaking on windows, but I imagine it's just because memory measurement and process memory overhead behaves differently. It could really just run on linux, but leaving it un-skipped for macOS right now out of convenience for macOS developers to run locally.
Configuration menu - View commit details
-
Copy full SHA for 5b2bc02 - Browse repository at this point
Copy the full SHA 5b2bc02View commit details
Commits on Aug 29, 2022
-
Configuration menu - View commit details
-
Copy full SHA for 1819a51 - Browse repository at this point
Copy the full SHA 1819a51View commit details -
Configuration menu - View commit details
-
Copy full SHA for 2952f6b - Browse repository at this point
Copy the full SHA 2952f6bView commit details -
decide_worker_rootish_queuing_enabled
no taskdon't even need to pass it in right now; it's not used
Configuration menu - View commit details
-
Copy full SHA for d00ea54 - Browse repository at this point
Copy the full SHA d00ea54View commit details -
Configuration menu - View commit details
-
Copy full SHA for 8ba4ced - Browse repository at this point
Copy the full SHA 8ba4cedView commit details
Commits on Aug 30, 2022
-
Configuration menu - View commit details
-
Copy full SHA for 63d863d - Browse repository at this point
Copy the full SHA 63d863dView commit details -
Configuration menu - View commit details
-
Copy full SHA for b7704e3 - Browse repository at this point
Copy the full SHA b7704e3View commit details -
Configuration menu - View commit details
-
Copy full SHA for 00b54e7 - Browse repository at this point
Copy the full SHA 00b54e7View commit details -
Configuration menu - View commit details
-
Copy full SHA for 12207e6 - Browse repository at this point
Copy the full SHA 12207e6View commit details
Commits on Aug 31, 2022
-
Configuration menu - View commit details
-
Copy full SHA for 02c98b3 - Browse repository at this point
Copy the full SHA 02c98b3View commit details -
remove
test_near_memory_limit_workload
feeling pretty good about just `test_graph_execution_width`
Configuration menu - View commit details
-
Copy full SHA for 2b3f6ae - Browse repository at this point
Copy the full SHA 2b3f6aeView commit details -
handle_worker_status_change
inretire_workers
Using it as an API saves having to manage `running` and `idle` in multiple places
Configuration menu - View commit details
-
Copy full SHA for 5e4d53d - Browse repository at this point
Copy the full SHA 5e4d53dView commit details -
Configuration menu - View commit details
-
Copy full SHA for 333bbb2 - Browse repository at this point
Copy the full SHA 333bbb2View commit details -
avoid flaky
test_graph_execution_width
hesitant on this, but I don't want to introduce a flaky test
Configuration menu - View commit details
-
Copy full SHA for acc524f - Browse repository at this point
Copy the full SHA acc524fView commit details -
Configuration menu - View commit details
-
Copy full SHA for ba336b9 - Browse repository at this point
Copy the full SHA ba336b9View commit details -
Configuration menu - View commit details
-
Copy full SHA for 9d99d74 - Browse repository at this point
Copy the full SHA 9d99d74View commit details -
Configuration menu - View commit details
-
Copy full SHA for 093d7dc - Browse repository at this point
Copy the full SHA 093d7dcView commit details