SPARK-1465: Spark compilation is broken with the latest hadoop-2.4.0 release #396

xgong · 2014-04-11T22:03:49Z

YARN-1824 changes the APIs (addToEnvironment, setEnvFromInputString) in Apps, which causes the spark build to break if built against a version 2.4.0. To fix this, create the spark own function to do that functionality which will not break compiling against 2.3 and other 2.x versions.

AmplabJenkins · 2014-04-11T22:08:11Z

Can one of the admins verify this patch?

pwendell · 2014-04-11T23:20:28Z

I thought that YARN 2.X offered stable API's?

xgong · 2014-04-11T23:26:35Z

These addToEnvironment and setEnvFromInputString apis in Apps has been changed in recent 2.4.0 release. Both of them are introduced an extra parameter. In that case, Spark compilation is broken.

pwendell · 2014-04-11T23:29:55Z

@sryza @tgraves - could you guys clarify the YARN policy on breaking API's? In cases like this should we be filing bug reports against YARN for breaking API stability?

pwendell · 2014-04-11T23:31:20Z

I noticed this API has a @Private annotation in the YARN code - does that mean we shouldn't be using it?

sryza · 2014-04-11T23:37:50Z

The justification for breaking compatibility was that the API was marked @Private. We're working to fix this in YARN-1931 - the missing API will be added back in for 2.4.1 and further. There's still some discussion there on whether the make the API public. The issue with the old API is that it couldn't handle Windows clients submitting to Linux servers (and vice versa). So we'll want to switch to the new API anyway to avoid this issue. If the API is made public, we can use reflection to call it going forward. Else, we'll need to go with @xgong 's approach.

pwendell · 2014-04-12T00:55:45Z

@sryza so wondering - is YARN supposed to be usable entirely without @Private API's for framework writers?

sryza · 2014-04-12T00:57:50Z

Yeah, it is supposed to be.

pwendell · 2014-04-12T01:00:41Z

So is this just a gap between the intention and reality? Or is this something that Spark really shouldn't be using direclty.

tgravescs · 2014-04-14T14:13:58Z

Spark shouldn't be using it directly since it got marked as private in the Hadoop 2.2 release. I believe Spark was using that api before the 2.2 release so it was easy to miss.
Also when it was changed it to private, MapReduce was not updated to stop using it, so Hadoop is breaking its own api rules.

These functions are utility functions and could be used by many types of applications so ideally some new class in YARN with these functions is created that is public.

I think we should commit this pr (after review) since spark on yarn can't be run against 2.4 release now and then if a new Yarn utility class is created we can look at using that.

tgravescs · 2014-04-14T14:14:27Z

Also note I filed https://issues.apache.org/jira/browse/SPARK-1472 to go through the rest of the YARN apis.

tgravescs · 2014-04-15T15:44:39Z

yarn/common/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala

 import org.apache.hadoop.io.Text
 import org.apache.hadoop.mapred.JobConf
 import org.apache.hadoop.security.Credentials
 import org.apache.hadoop.security.UserGroupInformation
+import org.apache.hadoop.util.Shell


Shell is marked as limited private:

@InterfaceAudience.LimitedPrivate({"HDFS", "MapReduce"})

So we shouldn't use it.

This doesn't really matter now but this also doesn't compile for 0.23.

Please make sure to try it on both 0.23 and 2.x builds. If you don't have those environments let me know.

xgong · 2014-04-15T18:45:58Z

Removed the usage of org.apache.hadoop.util.Shell, and create spark version of getEnvironmentVariableRegex() function.
Successfully do the maven compilation for hadoop-2.4.0 and hadoop-0.23.10

xgong · 2014-04-15T18:46:25Z

@tgravescs Would you mind to review this again ? Thanks

pwendell · 2014-04-15T19:30:44Z

yarn/common/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala

+      var childEnvs = envString.split(",")
+      var p = Pattern.compile(getEnvironmentVariableRegex())
+      for (cEnv <- childEnvs) {
+        var parts = cEnv.split("=") // split on '='


@sryza @tgravescs does Hadoop not support env variables that have = inside of quoted strings?

I've noticed this as an issue as well. There's definitely room for improvement here.

Yeah unfortunately there are a few issues with Hadoop/MR with parsing configs and env variables that can limit there use.

tgravescs · 2014-04-16T15:01:11Z

Changes look good other then the 2 extra imports. Atleast its equivalent to what we had before, we can perhaps improve upon the quoting under another jira.

xgong · 2014-04-16T17:26:48Z

@tgravescs Thanks for the review. I have removed these two extra imports..

tgravescs · 2014-04-16T19:44:27Z

Looks good. Thanks Xuan. I committed this to master and to branch-1.0

…release YARN-1824 changes the APIs (addToEnvironment, setEnvFromInputString) in Apps, which causes the spark build to break if built against a version 2.4.0. To fix this, create the spark own function to do that functionality which will not break compiling against 2.3 and other 2.x versions. Author: xuan <[email protected]> Author: xuan <[email protected]> Closes #396 from xgong/master and squashes the following commits: 42b5984 [xuan] Remove two extra imports bc0926f [xuan] Remove usage of org.apache.hadoop.util.Shell be89fa7 [xuan] fix Spark compilation is broken with the latest hadoop-2.4.0 release (cherry picked from commit 725925c) Signed-off-by: Thomas Graves <[email protected]>

@andrewor14

Setting load defaults to true in executor This preserves the behavior in earlier releases. If properties are set for the executors via `spark-env.sh` on the slaves, then they should take precedence over spark defaults. This is useful for if system administrators are setting properties for a standalone cluster, such as shuffle locations. /cc @andrewor14 who initially reported this issue.

…release YARN-1824 changes the APIs (addToEnvironment, setEnvFromInputString) in Apps, which causes the spark build to break if built against a version 2.4.0. To fix this, create the spark own function to do that functionality which will not break compiling against 2.3 and other 2.x versions. Author: xuan <[email protected]> Author: xuan <[email protected]> Closes apache#396 from xgong/master and squashes the following commits: 42b5984 [xuan] Remove two extra imports bc0926f [xuan] Remove usage of org.apache.hadoop.util.Shell be89fa7 [xuan] fix Spark compilation is broken with the latest hadoop-2.4.0 release

AKSK auth need OS_PROJECT_ID to add into HTTP HEAD, os we should add it into secrets.yaml and role. Related-Bug: theopenlab/openlab#130

fix Spark compilation is broken with the latest hadoop-2.4.0 release

be89fa7

tgravescs reviewed Apr 15, 2014
View reviewed changes

Remove usage of org.apache.hadoop.util.Shell

bc0926f

pwendell reviewed Apr 15, 2014
View reviewed changes

Remove two extra imports

42b5984

asfgit closed this in 725925c Apr 16, 2014

mccheah pushed a commit to mccheah/spark that referenced this pull request Nov 28, 2018

Merge pull request apache#396 from palantir/rk/upstream-again

1f6eba3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SPARK-1465: Spark compilation is broken with the latest hadoop-2.4.0 release #396

SPARK-1465: Spark compilation is broken with the latest hadoop-2.4.0 release #396

xgong commented Apr 11, 2014

AmplabJenkins commented Apr 11, 2014

pwendell commented Apr 11, 2014

xgong commented Apr 11, 2014

pwendell commented Apr 11, 2014

pwendell commented Apr 11, 2014

sryza commented Apr 11, 2014

pwendell commented Apr 12, 2014

sryza commented Apr 12, 2014

pwendell commented Apr 12, 2014

tgravescs commented Apr 14, 2014

tgravescs commented Apr 14, 2014

tgravescs Apr 15, 2014

tgravescs Apr 15, 2014

xgong commented Apr 15, 2014

xgong commented Apr 15, 2014

pwendell Apr 15, 2014

sryza Apr 15, 2014

tgravescs Apr 15, 2014

tgravescs commented Apr 16, 2014

xgong commented Apr 16, 2014

tgravescs commented Apr 16, 2014

SPARK-1465: Spark compilation is broken with the latest hadoop-2.4.0 release #396

SPARK-1465: Spark compilation is broken with the latest hadoop-2.4.0 release #396

Conversation

xgong commented Apr 11, 2014

AmplabJenkins commented Apr 11, 2014

pwendell commented Apr 11, 2014

xgong commented Apr 11, 2014

pwendell commented Apr 11, 2014

pwendell commented Apr 11, 2014

sryza commented Apr 11, 2014

pwendell commented Apr 12, 2014

sryza commented Apr 12, 2014

pwendell commented Apr 12, 2014

tgravescs commented Apr 14, 2014

tgravescs commented Apr 14, 2014

tgravescs Apr 15, 2014

Choose a reason for hiding this comment

tgravescs Apr 15, 2014

Choose a reason for hiding this comment

xgong commented Apr 15, 2014

xgong commented Apr 15, 2014

pwendell Apr 15, 2014

Choose a reason for hiding this comment

sryza Apr 15, 2014

Choose a reason for hiding this comment

tgravescs Apr 15, 2014

Choose a reason for hiding this comment

tgravescs commented Apr 16, 2014

xgong commented Apr 16, 2014

tgravescs commented Apr 16, 2014