SPARK-2565. Update ShuffleReadMetrics as blocks are fetched #1507

sryza · 2014-07-21T06:37:06Z

No description provided.

SparkQA · 2014-07-21T06:43:31Z

QA tests have started for PR 1507. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16901/consoleFull

SparkQA · 2014-07-21T08:29:56Z

QA results for PR 1507:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16901/consoleFull

andrewor14 · 2014-07-22T02:10:26Z

@kayousterhout

kayousterhout · 2014-07-22T16:30:22Z

core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala

    }
+    shuffleReadMetrics = Some(merged)


Why is this outside the synchronized block?

kayousterhout · 2014-07-22T18:02:25Z

At a high level, this depends on one of your other patches (#1056?) to incrementally send updates right? Is the idea that mergeShuffleReadMetrics will get called incrementally as the task runs, before sending partial results back to the driver?

sryza · 2014-07-22T18:10:26Z

Exactly. The idea is to call mergeShuffleReadMetrics when we're about to send the metrics update.

SparkQA · 2014-07-22T18:43:35Z

QA tests have started for PR 1507. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16978/consoleFull

kayousterhout · 2014-07-22T19:33:29Z

core/src/main/scala/org/apache/spark/storage/BlockFetcherIterator.scala

        if (address == blockManagerId) {
-          numLocal = blockInfos.size
+          readMetrics.localBlocksFetched += blockInfos.size


Maybe we don't care about this...but this results in the metrics reporting that local blocks have been fetched before they're actually read from disk. Is it too annoying to move this to where the blocks actually get read?

kayousterhout · 2014-07-22T19:35:20Z

Thanks Sandy!! Just a few more small things.

SparkQA · 2014-07-22T20:20:12Z

QA results for PR 1507:
- This patch FAILED unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16978/consoleFull

kayousterhout · 2014-07-22T23:06:56Z

Jenkins, retest this please

SparkQA · 2014-07-22T23:13:20Z

QA tests have started for PR 1507. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16995/consoleFull

SparkQA · 2014-07-23T00:52:31Z

QA results for PR 1507:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16995/consoleFull

sryza · 2014-08-02T22:29:50Z

Just tested this and observed the shuffle bytes read going up for in-progress tasks.

andrewor14 · 2014-08-05T05:58:04Z

core/src/main/scala/org/apache/spark/storage/BlockFetcherIterator.scala

@@ -191,7 +184,7 @@ object BlockFetcherIterator {
        }
      }
      logInfo("Getting " + _numBlocksToFetch + " non-empty blocks out of " +
-        (numLocal + numRemote) + " blocks")
+        totalBlocks + " blocks")


Is this ever used other than for logging?

SparkQA · 2014-08-06T20:49:34Z

QA tests have started for PR 1507. This patch DID NOT merge cleanly!
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18047/consoleFull

pwendell · 2014-08-06T20:57:19Z

core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala

+   * dependency, and merge these metrics before reporting them to the driver. This method returns
+   * a ShuffleReadMetrics for a dependency and registers it for merging later.
+   */
+  def createShuffleReadMetricsForDependency(): ShuffleReadMetrics = synchronized {


can this be private[spark]?

SparkQA · 2014-08-06T20:59:33Z

QA tests have started for PR 1507. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18049/consoleFull

SparkQA · 2014-08-06T21:55:39Z

QA results for PR 1507:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18049/consoleFull

SparkQA · 2014-08-06T22:04:22Z

QA results for PR 1507:
- This patch PASSES unit tests.

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18047/consoleFull

SparkQA · 2014-08-07T07:24:39Z

QA tests have started for PR 1507. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18113/consoleFull

SparkQA · 2014-08-07T08:11:30Z

QA results for PR 1507:
- This patch FAILED unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18113/consoleFull

andrewor14 · 2014-08-07T16:35:16Z

It's failing the 3 flaky tests that have been failing many PRs lately... test this please

SparkQA · 2014-08-07T16:39:23Z

QA tests have started for PR 1507. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18126/consoleFull

SparkQA · 2014-08-07T17:29:50Z

QA results for PR 1507:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18126/consoleFull

pwendell · 2014-08-08T00:11:58Z

core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala

   */
  private var _shuffleReadMetrics: Option[ShuffleReadMetrics] = None

-  def shuffleReadMetrics = _shuffleReadMetrics
+  def shuffleReadMetrics() = _shuffleReadMetrics


nit: since this doesn't mutate internal state the original lack of parentheses is correct style.

pwendell · 2014-08-08T01:09:35Z

Okay I merged this with the minor style change. Thanks Sandy!

Author: Sandy Ryza <[email protected]> Closes #1507 from sryza/sandy-spark-2565 and squashes the following commits: 74dad41 [Sandy Ryza] SPARK-2565. Update ShuffleReadMetrics as blocks are fetched (cherry picked from commit 4c51098) Signed-off-by: Patrick Wendell <[email protected]>

Author: Sandy Ryza <[email protected]> Closes apache#1507 from sryza/sandy-spark-2565 and squashes the following commits: 74dad41 [Sandy Ryza] SPARK-2565. Update ShuffleReadMetrics as blocks are fetched

…nabled` by default (apache#1507) ### What changes were proposed in this pull request? This is a backport from Spark 3.3(d762205) to 3.2. rdar://99066157 (SPARK-39846][CORE] Enable spark.dynamicAllocation.shuffleTracking.enabled by default) This PR aims to enable `spark.dynamicAllocation.shuffleTracking.enabled` by default in Apache Spark 3.4 when `spark.dynamicAllocation.enabled=true` and `spark.shuffle.service.enabled=false` ### Why are the changes needed? Here is a brief history around `spark.dynamicAllocation.shuffleTracking.enabled`. - Apache Spark 3.0.0 added it via SPARK-27963 for K8s environment. > One immediate use case is the ability to use dynamic allocation on Kubernetes, which doesn't yet have that service. - Apache Spark 3.1.1 made K8s GA via SPARK-33005 and started to used it in K8s widely. - Apache Spark 3.2.0 started to support shuffle data recovery on the reused PVCs via SPARK-35593 - Apache Spark 3.3.0 removed `Experimental` tag from it via SPARK-39322. - Apache Spark 3.4.0 will enable it by default via SPARK-39846 (this PR) to help Spark K8s users to dynamic allocation more easily. ### Does this PR introduce _any_ user-facing change? The `Core` migration guide is updated. ### How was this patch tested? Pass the CIs including K8s IT GitHub Action job.

kayousterhout reviewed Jul 22, 2014
View reviewed changes

andrewor14 reviewed Aug 5, 2014
View reviewed changes

pwendell reviewed Aug 6, 2014
View reviewed changes

SPARK-2565. Update ShuffleReadMetrics as blocks are fetched

74dad41

pwendell reviewed Aug 8, 2014
View reviewed changes

asfgit closed this in 4c51098 Aug 8, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SPARK-2565. Update ShuffleReadMetrics as blocks are fetched #1507

SPARK-2565. Update ShuffleReadMetrics as blocks are fetched #1507

sryza commented Jul 21, 2014

SparkQA commented Jul 21, 2014

SparkQA commented Jul 21, 2014

andrewor14 commented Jul 22, 2014

kayousterhout Jul 22, 2014

kayousterhout commented Jul 22, 2014

sryza commented Jul 22, 2014

SparkQA commented Jul 22, 2014

kayousterhout Jul 22, 2014

kayousterhout commented Jul 22, 2014

SparkQA commented Jul 22, 2014

kayousterhout commented Jul 22, 2014

SparkQA commented Jul 22, 2014

SparkQA commented Jul 23, 2014

sryza commented Aug 2, 2014

andrewor14 Aug 5, 2014

sryza Aug 6, 2014

SparkQA commented Aug 6, 2014

pwendell Aug 6, 2014

SparkQA commented Aug 6, 2014

SparkQA commented Aug 6, 2014

SparkQA commented Aug 6, 2014

SparkQA commented Aug 7, 2014

SparkQA commented Aug 7, 2014

andrewor14 commented Aug 7, 2014

SparkQA commented Aug 7, 2014

SparkQA commented Aug 7, 2014

pwendell Aug 8, 2014

pwendell commented Aug 8, 2014

SPARK-2565. Update ShuffleReadMetrics as blocks are fetched #1507

SPARK-2565. Update ShuffleReadMetrics as blocks are fetched #1507

Conversation

sryza commented Jul 21, 2014

SparkQA commented Jul 21, 2014

SparkQA commented Jul 21, 2014

andrewor14 commented Jul 22, 2014

kayousterhout Jul 22, 2014

Choose a reason for hiding this comment

kayousterhout commented Jul 22, 2014

sryza commented Jul 22, 2014

SparkQA commented Jul 22, 2014

kayousterhout Jul 22, 2014

Choose a reason for hiding this comment

kayousterhout commented Jul 22, 2014

SparkQA commented Jul 22, 2014

kayousterhout commented Jul 22, 2014

SparkQA commented Jul 22, 2014

SparkQA commented Jul 23, 2014

sryza commented Aug 2, 2014

andrewor14 Aug 5, 2014

Choose a reason for hiding this comment

sryza Aug 6, 2014

Choose a reason for hiding this comment

SparkQA commented Aug 6, 2014

pwendell Aug 6, 2014

Choose a reason for hiding this comment

SparkQA commented Aug 6, 2014

SparkQA commented Aug 6, 2014

SparkQA commented Aug 6, 2014

SparkQA commented Aug 7, 2014

SparkQA commented Aug 7, 2014

andrewor14 commented Aug 7, 2014

SparkQA commented Aug 7, 2014

SparkQA commented Aug 7, 2014

pwendell Aug 8, 2014

Choose a reason for hiding this comment

pwendell commented Aug 8, 2014