Skip to content

Commit

Permalink
[SPARK-19545][YARN] Fix compile issue for Spark on Yarn when building…
Browse files Browse the repository at this point in the history
… against Hadoop 2.6.0~2.6.3

## What changes were proposed in this pull request?

Due to the newly added API in Hadoop 2.6.4+, Spark builds against Hadoop 2.6.0~2.6.3 will meet compile error. So here still reverting back to use reflection to handle this issue.

## How was this patch tested?

Manual verification.

Author: jerryshao <[email protected]>

Closes #16884 from jerryshao/SPARK-19545.
  • Loading branch information
jerryshao authored and srowen committed Feb 10, 2017
1 parent d5593f7 commit 8e8afb3
Show file tree
Hide file tree
Showing 2 changed files with 23 additions and 8 deletions.
6 changes: 3 additions & 3 deletions docs/running-on-yarn.md
Original file line number Diff line number Diff line change
Expand Up @@ -445,7 +445,7 @@ To use a custom metrics.properties for the application master and executors, upd
This will be used with YARN's rolling log aggregation, to enable this feature in YARN side
<code>yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds</code> should be
configured in yarn-site.xml.
This feature can only be used with Hadoop 2.6.1+. The Spark log4j appender needs be changed to use
This feature can only be used with Hadoop 2.6.4+. The Spark log4j appender needs be changed to use
FileAppender or another appender that can handle the files being removed while its running. Based
on the file name configured in the log4j configuration (like spark.log), the user should set the
regex (spark*) to include all the log files that need to be aggregated.
Expand Down Expand Up @@ -524,8 +524,8 @@ pre-packaged distribution.
1. In the `yarn-site.xml` on each node, add `spark_shuffle` to `yarn.nodemanager.aux-services`,
then set `yarn.nodemanager.aux-services.spark_shuffle.class` to
`org.apache.spark.network.yarn.YarnShuffleService`.
1. Increase `NodeManager's` heap size by setting `YARN_HEAPSIZE` (1000 by default) in `etc/hadoop/yarn-env.sh`
to avoid garbage collection issues during shuffle.
1. Increase `NodeManager's` heap size by setting `YARN_HEAPSIZE` (1000 by default) in `etc/hadoop/yarn-env.sh`
to avoid garbage collection issues during shuffle.
1. Restart all `NodeManager`s in your cluster.

The following extra configuration options are available when the shuffle service is running on YARN:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -245,12 +245,27 @@ private[spark] class Client(
}

sparkConf.get(ROLLED_LOG_INCLUDE_PATTERN).foreach { includePattern =>
val logAggregationContext = Records.newRecord(classOf[LogAggregationContext])
logAggregationContext.setRolledLogsIncludePattern(includePattern)
sparkConf.get(ROLLED_LOG_EXCLUDE_PATTERN).foreach { excludePattern =>
logAggregationContext.setRolledLogsExcludePattern(excludePattern)
try {
val logAggregationContext = Records.newRecord(classOf[LogAggregationContext])

// These two methods were added in Hadoop 2.6.4, so we still need to use reflection to
// avoid compile error when building against Hadoop 2.6.0 ~ 2.6.3.
val setRolledLogsIncludePatternMethod =
logAggregationContext.getClass.getMethod("setRolledLogsIncludePattern", classOf[String])
setRolledLogsIncludePatternMethod.invoke(logAggregationContext, includePattern)

sparkConf.get(ROLLED_LOG_EXCLUDE_PATTERN).foreach { excludePattern =>
val setRolledLogsExcludePatternMethod =
logAggregationContext.getClass.getMethod("setRolledLogsExcludePattern", classOf[String])
setRolledLogsExcludePatternMethod.invoke(logAggregationContext, excludePattern)
}

appContext.setLogAggregationContext(logAggregationContext)
} catch {
case NonFatal(e) =>
logWarning(s"Ignoring ${ROLLED_LOG_INCLUDE_PATTERN.key} because the version of YARN " +
"does not support it", e)
}
appContext.setLogAggregationContext(logAggregationContext)
}

appContext
Expand Down

0 comments on commit 8e8afb3

Please sign in to comment.