Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shuffle metrics 2/4: Add background metrics #8365

Merged
merged 1 commit into from
Dec 20, 2023

Conversation

crusaderky
Copy link
Collaborator

@crusaderky crusaderky commented Nov 19, 2023

Please read: #7943 (comment)
There are two commits in this PR. All but the last are the previous PRs in the chain.

Copy link
Contributor

github-actions bot commented Nov 19, 2023

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

       27 files  ±  0         27 suites  ±0   11h 40m 34s ⏱️ + 2m 18s
  3 947 tests +  3    3 835 ✔️ ±  0     110 💤 +1  2 +2 
49 649 runs  +27  47 346 ✔️ +21  2 301 💤 +4  2 +2 

For more details on these failures, see this check.

Results for commit faa5bd7. ± Comparison against base commit 4d41d32.

♻️ This comment has been updated with latest results.

@crusaderky crusaderky marked this pull request as ready for review November 19, 2023 23:37
@crusaderky crusaderky requested a review from fjetter as a code owner November 19, 2023 23:37
size += s
self.sizes[part_id] -= s
except IndexError:
break
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pretty sure this is a SyntaxError since the break is outside a loop

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused how this isn't lighting up CI

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because it breaks the inner while size < self.max_message_size:, not the outer one

distributed/shuffle/_buffer.py Outdated Show resolved Hide resolved
else:
# Unserialized numpy arrays
with context_meter.meter("serialize", func=thread_time):
frames = list(concat(pickle_bytelist(shard) for shard in shards))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I imagine this list call could have a non-trivial impact on performance since it forces the entire list of shards to be serialized and the serialized bytes to be retained before anything is written to disk, doesn't it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory, yes.
In practice, no, because it uses pickle buffers. There shouldn't be any use cases where the buffer will be temporarily deep-copied (as we do not cater for object dtypes). So the only thing that will be temporarily kept in memory is the pickle metadata, which is always trivial (~120 bytes per shard, while the worst use case we currently care to monitor is 8 kiB worth of buffer per shard).

This said, I reworked the code to remove the list, so that we avoid any and all doubts.

@crusaderky
Copy link
Collaborator Author

I suggest to revert this. I'm hesitant to factor out a while loop body with loop control statements into a function. This is a non-trivial refactoring and objectively not necessarily simpler to understand

reverted

@crusaderky crusaderky force-pushed the shuffle/metrics2 branch 2 times, most recently from 9fe4b2e to d7d8558 Compare November 30, 2023 16:29
@crusaderky crusaderky force-pushed the shuffle/metrics2 branch 2 times, most recently from 00d5f57 to a0a0f03 Compare December 12, 2023 18:01
@crusaderky crusaderky force-pushed the shuffle/metrics2 branch 2 times, most recently from ac648fd to 8fc2f5e Compare December 18, 2023 14:06
Copy link
Member

@hendrikmakait hendrikmakait left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @crusaderky!

@crusaderky crusaderky force-pushed the shuffle/metrics2 branch 3 times, most recently from 0b80228 to faa5bd7 Compare December 19, 2023 15:51
@crusaderky crusaderky merged commit 952b650 into dask:main Dec 20, 2023
29 of 31 checks passed
@crusaderky crusaderky deleted the shuffle/metrics2 branch December 20, 2023 00:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use metering for P2P shuffling instrumentation
3 participants