Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Track active shuffle by stage #446

Merged
merged 5 commits into from
Nov 27, 2018
Merged

Conversation

rynorris
Copy link

@rynorris rynorris commented Nov 27, 2018

This is an optimization to #427

In the original PR we only track whether a shuffle dependency is active at the job-level, meaning we cannot scale up and down executors during a long job.

This PR extends the functionality by tracking the dependencies between stages and shuffles so we can mark the shuffle blocks as inactive earlier.

Unit tests included, and also tested on a local k8s cluster.

See toy example:
image

In this screenshot, stage 5.0 repartitions to 4 partitions, stage 6.0 to 2 partitions, and stage 7.0 to 1 partition. The exact query used was:

spark.range(0, 100).rdd
  .repartition(4).map(x => { Thread.sleep(1000); x })
  .repartition(2).map(x => { Thread.sleep(200); x })
  .repartition(1).map(x => { Thread.sleep(500); x })
  .collect()

You can see that once stage 6.0 is done, executors 6 and 8 are removed since their shuffle data is no longer needed. However executors 5 and 7 are both kept around during the last stage even though there's only one task, because they both hold shuffle data necessary for the final repartition to run.

Copy link

@lwwmanning lwwmanning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pushed a fix for a nit, otherwise lgtm

@robert3005 robert3005 merged commit 4baa2ce into master Nov 27, 2018
@robert3005 robert3005 deleted the rn/stage-level-shuffle-tracking branch November 27, 2018 14:01
robert3005 pushed a commit that referenced this pull request Jan 6, 2019
lwwmanning pushed a commit that referenced this pull request Mar 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants