Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPARK-1465: Spark compilation is broken with the latest hadoop-2.4.0 release #396

Closed
wants to merge 3 commits into from

Conversation

xgong
Copy link

@xgong xgong commented Apr 11, 2014

YARN-1824 changes the APIs (addToEnvironment, setEnvFromInputString) in Apps, which causes the spark build to break if built against a version 2.4.0. To fix this, create the spark own function to do that functionality which will not break compiling against 2.3 and other 2.x versions.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@pwendell
Copy link
Contributor

I thought that YARN 2.X offered stable API's?

@xgong
Copy link
Author

xgong commented Apr 11, 2014

These addToEnvironment and setEnvFromInputString apis in Apps has been changed in recent 2.4.0 release. Both of them are introduced an extra parameter. In that case, Spark compilation is broken.

@pwendell
Copy link
Contributor

@sryza @tgraves - could you guys clarify the YARN policy on breaking API's? In cases like this should we be filing bug reports against YARN for breaking API stability?

@pwendell
Copy link
Contributor

I noticed this API has a @Private annotation in the YARN code - does that mean we shouldn't be using it?

@sryza
Copy link
Contributor

sryza commented Apr 11, 2014

The justification for breaking compatibility was that the API was marked @Private. We're working to fix this in YARN-1931 - the missing API will be added back in for 2.4.1 and further. There's still some discussion there on whether the make the API public. The issue with the old API is that it couldn't handle Windows clients submitting to Linux servers (and vice versa). So we'll want to switch to the new API anyway to avoid this issue. If the API is made public, we can use reflection to call it going forward. Else, we'll need to go with @xgong 's approach.

@pwendell
Copy link
Contributor

@sryza so wondering - is YARN supposed to be usable entirely without @Private API's for framework writers?

@sryza
Copy link
Contributor

sryza commented Apr 12, 2014

Yeah, it is supposed to be.

@pwendell
Copy link
Contributor

So is this just a gap between the intention and reality? Or is this something that Spark really shouldn't be using direclty.

@tgravescs
Copy link
Contributor

Spark shouldn't be using it directly since it got marked as private in the Hadoop 2.2 release. I believe Spark was using that api before the 2.2 release so it was easy to miss.
Also when it was changed it to private, MapReduce was not updated to stop using it, so Hadoop is breaking its own api rules.

These functions are utility functions and could be used by many types of applications so ideally some new class in YARN with these functions is created that is public.

I think we should commit this pr (after review) since spark on yarn can't be run against 2.4 release now and then if a new Yarn utility class is created we can look at using that.

@tgravescs
Copy link
Contributor

Also note I filed https://issues.apache.org/jira/browse/SPARK-1472 to go through the rest of the YARN apis.

import org.apache.hadoop.io.Text
import org.apache.hadoop.mapred.JobConf
import org.apache.hadoop.security.Credentials
import org.apache.hadoop.security.UserGroupInformation
import org.apache.hadoop.util.Shell
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shell is marked as limited private:

@InterfaceAudience.LimitedPrivate({"HDFS", "MapReduce"})

So we shouldn't use it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't really matter now but this also doesn't compile for 0.23.

Please make sure to try it on both 0.23 and 2.x builds. If you don't have those environments let me know.

@xgong
Copy link
Author

xgong commented Apr 15, 2014

Removed the usage of org.apache.hadoop.util.Shell, and create spark version of getEnvironmentVariableRegex() function.
Successfully do the maven compilation for hadoop-2.4.0 and hadoop-0.23.10

@xgong
Copy link
Author

xgong commented Apr 15, 2014

@tgravescs Would you mind to review this again ? Thanks

var childEnvs = envString.split(",")
var p = Pattern.compile(getEnvironmentVariableRegex())
for (cEnv <- childEnvs) {
var parts = cEnv.split("=") // split on '='
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sryza @tgravescs does Hadoop not support env variables that have = inside of quoted strings?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've noticed this as an issue as well. There's definitely room for improvement here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah unfortunately there are a few issues with Hadoop/MR with parsing configs and env variables that can limit there use.

@tgravescs
Copy link
Contributor

Changes look good other then the 2 extra imports. Atleast its equivalent to what we had before, we can perhaps improve upon the quoting under another jira.

@xgong
Copy link
Author

xgong commented Apr 16, 2014

@tgravescs Thanks for the review. I have removed these two extra imports..

@tgravescs
Copy link
Contributor

Looks good. Thanks Xuan. I committed this to master and to branch-1.0

asfgit pushed a commit that referenced this pull request Apr 16, 2014
…release

YARN-1824 changes the APIs (addToEnvironment, setEnvFromInputString) in Apps, which causes the spark build to break if built against a version 2.4.0. To fix this, create the spark own function to do that functionality which will not break compiling against 2.3 and other 2.x versions.

Author: xuan <[email protected]>
Author: xuan <[email protected]>

Closes #396 from xgong/master and squashes the following commits:

42b5984 [xuan] Remove two extra imports
bc0926f [xuan] Remove usage of org.apache.hadoop.util.Shell
be89fa7 [xuan] fix Spark compilation is broken with the latest hadoop-2.4.0 release

(cherry picked from commit 725925c)
Signed-off-by: Thomas Graves <[email protected]>
@asfgit asfgit closed this in 725925c Apr 16, 2014
pwendell added a commit to pwendell/spark that referenced this pull request May 12, 2014
Setting load defaults to true in executor

This preserves the behavior in earlier releases. If properties are set for the executors via `spark-env.sh` on the slaves, then they should take precedence over spark defaults. This is useful for if system administrators are setting properties for a standalone cluster, such as shuffle locations.

/cc @andrewor14 who initially reported this issue.
pdeyhim pushed a commit to pdeyhim/spark-1 that referenced this pull request Jun 25, 2014
…release

YARN-1824 changes the APIs (addToEnvironment, setEnvFromInputString) in Apps, which causes the spark build to break if built against a version 2.4.0. To fix this, create the spark own function to do that functionality which will not break compiling against 2.3 and other 2.x versions.

Author: xuan <[email protected]>
Author: xuan <[email protected]>

Closes apache#396 from xgong/master and squashes the following commits:

42b5984 [xuan] Remove two extra imports
bc0926f [xuan] Remove usage of org.apache.hadoop.util.Shell
be89fa7 [xuan] fix Spark compilation is broken with the latest hadoop-2.4.0 release
mccheah pushed a commit to mccheah/spark that referenced this pull request Nov 28, 2018
bzhaoopenstack pushed a commit to bzhaoopenstack/spark that referenced this pull request Sep 11, 2019
AKSK auth need OS_PROJECT_ID to add into HTTP HEAD, os
we should add it into secrets.yaml and role.

Related-Bug: theopenlab/openlab#130
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants