-
-
Notifications
You must be signed in to change notification settings - Fork 719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix transfer limiting in _select_keys_for_gather
#7071
Conversation
75bd8eb
to
0808455
Compare
Unit Test ResultsSee test report for an extended history of previous test failures. This is useful for diagnosing flaky tests. 15 files + 1 15 suites +1 6h 7m 18s ⏱️ + 30m 14s For more details on these failures, see this check. Results for commit ff4c4ed. ± Comparison against base commit 8c4133c. ♻️ This comment has been updated with latest results. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Can confirm that this change fixes the performance regression we observe.
Mostly minor comments to do with naming.
distributed/worker_state_machine.py
Outdated
self.transfer_incoming_bytes | ||
or to_gather | ||
) and total_nbytes + ts.get_nbytes() > bytes_left_to_fetch: | ||
if self._task_exceeds_transfer_limits(ts, total_nbytes): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not for this PR, but it seems that this gather logic is somewhat eager to terminate. Suppose we have many tasks available, a few already in flight, and the top priority task is large (will exceed transfer limits). We'll never go further through the available list to check if we could actually eagerly transfer some lower-priority (but smaller) tasks.
This may be deliberate, but not sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, it's a rather aggressive stop criterion. The main reason we have this is that I haven't seen any evidence that would support a more elaborate one. The alternative would be to implement some form of looking at what's behind the task that exceeds the limits (or a completely different algorithm for packing). Depending on how that's done, this might lead to a lot of unnecessary work. Suppose we have many tasks available and they are all roughly equally-sized. Chances are pretty high that if the top-priority task doesn't fit into the message anymore, none of the following ones would. I'm happy to iterate on this assuming it's becoming an actual problem.
Co-authored-by: Lawrence Mitchell <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two minor nits, otherwise looks great, thanks for the quick fix!
Co-authored-by: Lawrence Mitchell <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @hendrikmakait and @wence-
Closes
pre-commit run --all-files