Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove superfluous ShuffleSchedulerExtension.barriers #7389

Merged
merged 4 commits into from
Dec 12, 2022

Conversation

hendrikmakait
Copy link
Member

@hendrikmakait hendrikmakait commented Dec 12, 2022

The use of the ShuffleSchedulerExtension.barriers dictionary can be replaced by the use of existing state-tracking collections and class-level key-generation methods.

  • Tests added / passed
  • Passes pre-commit run --all-files

@hendrikmakait hendrikmakait marked this pull request as draft December 12, 2022 10:27
@hendrikmakait hendrikmakait marked this pull request as ready for review December 12, 2022 11:44
@github-actions
Copy link
Contributor

github-actions bot commented Dec 12, 2022

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

       18 files  ±0         18 suites  ±0   8h 23m 27s ⏱️ + 23m 3s
  3 255 tests +1    3 169 ✔️ +1       85 💤 ±0  1 ±0 
29 304 runs  +9  28 076 ✔️ +7  1 227 💤 +2  1 ±0 

For more details on these failures, see this check.

Results for commit 6fede6c. ± Comparison against base commit 047b082.

♻️ This comment has been updated with latest results.

Comment on lines 773 to 777
if not self._is_barrier_key(key):
return
shuffle_id = self.id_from_key(key)
if shuffle_id not in self.worker_for:
return
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I ran a couple of tests and think we do not suffer any significant performance impact here. However, I would like to ask for caution when it comes to this sort of refactoring in a transition hook. Str comparisons + replacements are not cheap, relatively speaking. Transitions are amongst the hottest for loops we have and it typically pays off to be a bit cautious here. Everything past these guards is allowed to be slow since we're only executing it for barriers but the guards themselves are evaluated for every task in a potentially very large graph.

This changes introduces an overhead of 100-200ns compared to the earlier version so I think this is fine. However, this can quickly spiral out of control if the logic on these methods become more complex or we open the transition to other allowed finished states, etc.

Copy link
Member Author

@hendrikmakait hendrikmakait Dec 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, could you share the snippet you used to benchmark this change? I ran a very simple test that showed no meaningful difference for the latest commit.(However, 9c2a9c4 had a ~200ns overhead when I checked runtimes.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: We could inline _is_barrier_key to shave off the function call overhead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran something like this

adict = {f"foo-{x}": True for x in range(10)}

a_long_key = "shuffle-barrier-123566886512341"

def foo_main():
    if finish != "forgotten":
        return
    if a_long_key in adict.values():
        return
def is_barrier():
    return a_long_key.startswith("shuffle-barrier")

def foo():
    if finish != "forgotten":
        return
    if not is_barrier():
        return
    assert a_long_key.startswith("shuffle-barrier")
    key = a_long_key.replace("shuffle-barrier", "")
    if key in adict:
        return
    return

image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: We could inline _is_barrier_key to shave off the function call overhead.

Don't worry about it. I merely wanted to ensure some awareness.

Copy link
Member Author

@hendrikmakait hendrikmakait Dec 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for sharing, the difference in our results makes sense. I had focused on the common case where the key is forgotten but it's not a shuffle barrier, which is coincidentally faster in your example.

I merely wanted to ensure some awareness.

Appreciated!

@hendrikmakait hendrikmakait self-assigned this Dec 12, 2022
@hendrikmakait
Copy link
Member Author

No related failures on CI

@fjetter fjetter merged commit c4af791 into dask:main Dec 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants