Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPARK-2565. Update ShuffleReadMetrics as blocks are fetched #1507

Closed
wants to merge 1 commit into from

Conversation

sryza
Copy link
Contributor

@sryza sryza commented Jul 21, 2014

No description provided.

@SparkQA
Copy link

SparkQA commented Jul 21, 2014

QA tests have started for PR 1507. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16901/consoleFull

@SparkQA
Copy link

SparkQA commented Jul 21, 2014

QA results for PR 1507:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16901/consoleFull

@andrewor14
Copy link
Contributor

@kayousterhout

}
shuffleReadMetrics = Some(merged)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this outside the synchronized block?

@kayousterhout
Copy link
Contributor

At a high level, this depends on one of your other patches (#1056?) to incrementally send updates right? Is the idea that mergeShuffleReadMetrics will get called incrementally as the task runs, before sending partial results back to the driver?

@sryza
Copy link
Contributor Author

sryza commented Jul 22, 2014

Exactly. The idea is to call mergeShuffleReadMetrics when we're about to send the metrics update.

@SparkQA
Copy link

SparkQA commented Jul 22, 2014

QA tests have started for PR 1507. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16978/consoleFull

if (address == blockManagerId) {
numLocal = blockInfos.size
readMetrics.localBlocksFetched += blockInfos.size
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we don't care about this...but this results in the metrics reporting that local blocks have been fetched before they're actually read from disk. Is it too annoying to move this to where the blocks actually get read?

@kayousterhout
Copy link
Contributor

Thanks Sandy!! Just a few more small things.

@SparkQA
Copy link

SparkQA commented Jul 22, 2014

QA results for PR 1507:
- This patch FAILED unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16978/consoleFull

@kayousterhout
Copy link
Contributor

Jenkins, retest this please

@SparkQA
Copy link

SparkQA commented Jul 22, 2014

QA tests have started for PR 1507. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16995/consoleFull

@SparkQA
Copy link

SparkQA commented Jul 23, 2014

QA results for PR 1507:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16995/consoleFull

@sryza
Copy link
Contributor Author

sryza commented Aug 2, 2014

Just tested this and observed the shuffle bytes read going up for in-progress tasks.

@@ -191,7 +184,7 @@ object BlockFetcherIterator {
}
}
logInfo("Getting " + _numBlocksToFetch + " non-empty blocks out of " +
(numLocal + numRemote) + " blocks")
totalBlocks + " blocks")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this ever used other than for logging?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naw

@SparkQA
Copy link

SparkQA commented Aug 6, 2014

QA tests have started for PR 1507. This patch DID NOT merge cleanly!
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18047/consoleFull

* dependency, and merge these metrics before reporting them to the driver. This method returns
* a ShuffleReadMetrics for a dependency and registers it for merging later.
*/
def createShuffleReadMetricsForDependency(): ShuffleReadMetrics = synchronized {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be private[spark]?

@SparkQA
Copy link

SparkQA commented Aug 6, 2014

QA tests have started for PR 1507. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18049/consoleFull

@SparkQA
Copy link

SparkQA commented Aug 6, 2014

QA results for PR 1507:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18049/consoleFull

@SparkQA
Copy link

SparkQA commented Aug 6, 2014

QA results for PR 1507:
- This patch PASSES unit tests.

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18047/consoleFull

@SparkQA
Copy link

SparkQA commented Aug 7, 2014

QA tests have started for PR 1507. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18113/consoleFull

@SparkQA
Copy link

SparkQA commented Aug 7, 2014

QA results for PR 1507:
- This patch FAILED unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18113/consoleFull

@andrewor14
Copy link
Contributor

It's failing the 3 flaky tests that have been failing many PRs lately... test this please

@SparkQA
Copy link

SparkQA commented Aug 7, 2014

QA tests have started for PR 1507. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18126/consoleFull

@SparkQA
Copy link

SparkQA commented Aug 7, 2014

QA results for PR 1507:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18126/consoleFull

*/
private var _shuffleReadMetrics: Option[ShuffleReadMetrics] = None

def shuffleReadMetrics = _shuffleReadMetrics
def shuffleReadMetrics() = _shuffleReadMetrics
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: since this doesn't mutate internal state the original lack of parentheses is correct style.

@pwendell
Copy link
Contributor

pwendell commented Aug 8, 2014

Okay I merged this with the minor style change. Thanks Sandy!

asfgit pushed a commit that referenced this pull request Aug 8, 2014
Author: Sandy Ryza <[email protected]>

Closes #1507 from sryza/sandy-spark-2565 and squashes the following commits:

74dad41 [Sandy Ryza] SPARK-2565. Update ShuffleReadMetrics as blocks are fetched
(cherry picked from commit 4c51098)

Signed-off-by: Patrick Wendell <[email protected]>
@asfgit asfgit closed this in 4c51098 Aug 8, 2014
xiliu82 pushed a commit to xiliu82/spark that referenced this pull request Sep 4, 2014
Author: Sandy Ryza <[email protected]>

Closes apache#1507 from sryza/sandy-spark-2565 and squashes the following commits:

74dad41 [Sandy Ryza] SPARK-2565. Update ShuffleReadMetrics as blocks are fetched
sunchao pushed a commit to sunchao/spark that referenced this pull request Jun 2, 2023
…nabled` by default (apache#1507)

### What changes were proposed in this pull request?

This is a backport from Spark 3.3(d762205) to 3.2.

rdar://99066157 (SPARK-39846][CORE] Enable spark.dynamicAllocation.shuffleTracking.enabled by default)

This PR aims to enable `spark.dynamicAllocation.shuffleTracking.enabled` by default in Apache Spark 3.4 when `spark.dynamicAllocation.enabled=true` and `spark.shuffle.service.enabled=false`

### Why are the changes needed?

Here is a brief history around `spark.dynamicAllocation.shuffleTracking.enabled`.
- Apache Spark 3.0.0 added it via SPARK-27963 for K8s environment.
  > One immediate use case is the ability to use dynamic allocation on Kubernetes, which doesn't yet have that service.
- Apache Spark 3.1.1 made K8s GA via SPARK-33005 and started to used it in K8s widely.
- Apache Spark 3.2.0 started to support shuffle data recovery on the reused PVCs via SPARK-35593
- Apache Spark 3.3.0 removed `Experimental` tag from it via SPARK-39322.
- Apache Spark 3.4.0 will enable it by default via SPARK-39846 (this PR) to help Spark K8s users to dynamic allocation more easily.

### Does this PR introduce _any_ user-facing change?

The `Core` migration guide is updated.

### How was this patch tested?

Pass the CIs including K8s IT GitHub Action job.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants