Track active shuffle by stage #446

rynorris · 2018-11-27T04:06:00Z

This is an optimization to #427

In the original PR we only track whether a shuffle dependency is active at the job-level, meaning we cannot scale up and down executors during a long job.

This PR extends the functionality by tracking the dependencies between stages and shuffles so we can mark the shuffle blocks as inactive earlier.

Unit tests included, and also tested on a local k8s cluster.

See toy example:

In this screenshot, stage 5.0 repartitions to 4 partitions, stage 6.0 to 2 partitions, and stage 7.0 to 1 partition. The exact query used was:

spark.range(0, 100).rdd
  .repartition(4).map(x => { Thread.sleep(1000); x })
  .repartition(2).map(x => { Thread.sleep(200); x })
  .repartition(1).map(x => { Thread.sleep(500); x })
  .collect()

You can see that once stage 6.0 is done, executors 6 and 8 are removed since their shuffle data is no longer needed. However executors 5 and 7 are both kept around during the last stage even though there's only one task, because they both hold shuffle data necessary for the final repartition to run.

lwwmanning

pushed a fix for a nit, otherwise lgtm

Ryan Norris added 2 commits November 27, 2018 10:56

Track shuffle at the stage level to make dynamic allocation more dynamic

9e08bfc

More tests

65b71c1

rynorris requested review from robert3005, lwwmanning and mccheah November 27, 2018 04:06

Ryan Norris and others added 2 commits November 27, 2018 12:38

Scalastyle

e26f3d5

nit for concurrency safety

47d2878

lwwmanning approved these changes Nov 27, 2018

View reviewed changes

nit

2a9aa83

robert3005 approved these changes Nov 27, 2018

View reviewed changes

robert3005 merged commit 4baa2ce into master Nov 27, 2018

robert3005 deleted the rn/stage-level-shuffle-tracking branch November 27, 2018 14:01

robert3005 pushed a commit that referenced this pull request Jan 6, 2019

Track active shuffles by stage (#446)

92fe984

lwwmanning pushed a commit that referenced this pull request Mar 13, 2019

Track active shuffles by stage (#446)

3e8b1f4

lwwmanning mentioned this pull request Mar 13, 2019

[SPARK-24432] Support dynamic allocation without external shuffle service apache/spark#24083

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Track active shuffle by stage #446

Track active shuffle by stage #446

rynorris commented Nov 27, 2018 •

edited

Loading

lwwmanning left a comment

Track active shuffle by stage #446

Track active shuffle by stage #446

Conversation

rynorris commented Nov 27, 2018 • edited Loading

lwwmanning left a comment

Choose a reason for hiding this comment

rynorris commented Nov 27, 2018 •

edited

Loading