-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SPARK-1465: Spark compilation is broken with the latest hadoop-2.4.0 release #396
Conversation
Can one of the admins verify this patch? |
I thought that YARN 2.X offered stable API's? |
These addToEnvironment and setEnvFromInputString apis in Apps has been changed in recent 2.4.0 release. Both of them are introduced an extra parameter. In that case, Spark compilation is broken. |
I noticed this API has a @Private annotation in the YARN code - does that mean we shouldn't be using it? |
The justification for breaking compatibility was that the API was marked @Private. We're working to fix this in YARN-1931 - the missing API will be added back in for 2.4.1 and further. There's still some discussion there on whether the make the API public. The issue with the old API is that it couldn't handle Windows clients submitting to Linux servers (and vice versa). So we'll want to switch to the new API anyway to avoid this issue. If the API is made public, we can use reflection to call it going forward. Else, we'll need to go with @xgong 's approach. |
Yeah, it is supposed to be. |
So is this just a gap between the intention and reality? Or is this something that Spark really shouldn't be using direclty. |
Spark shouldn't be using it directly since it got marked as private in the Hadoop 2.2 release. I believe Spark was using that api before the 2.2 release so it was easy to miss. These functions are utility functions and could be used by many types of applications so ideally some new class in YARN with these functions is created that is public. I think we should commit this pr (after review) since spark on yarn can't be run against 2.4 release now and then if a new Yarn utility class is created we can look at using that. |
Also note I filed https://issues.apache.org/jira/browse/SPARK-1472 to go through the rest of the YARN apis. |
import org.apache.hadoop.io.Text | ||
import org.apache.hadoop.mapred.JobConf | ||
import org.apache.hadoop.security.Credentials | ||
import org.apache.hadoop.security.UserGroupInformation | ||
import org.apache.hadoop.util.Shell |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shell is marked as limited private:
@InterfaceAudience.LimitedPrivate({"HDFS", "MapReduce"})
So we shouldn't use it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't really matter now but this also doesn't compile for 0.23.
Please make sure to try it on both 0.23 and 2.x builds. If you don't have those environments let me know.
Removed the usage of org.apache.hadoop.util.Shell, and create spark version of getEnvironmentVariableRegex() function. |
@tgravescs Would you mind to review this again ? Thanks |
var childEnvs = envString.split(",") | ||
var p = Pattern.compile(getEnvironmentVariableRegex()) | ||
for (cEnv <- childEnvs) { | ||
var parts = cEnv.split("=") // split on '=' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sryza @tgravescs does Hadoop not support env variables that have =
inside of quoted strings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've noticed this as an issue as well. There's definitely room for improvement here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah unfortunately there are a few issues with Hadoop/MR with parsing configs and env variables that can limit there use.
Changes look good other then the 2 extra imports. Atleast its equivalent to what we had before, we can perhaps improve upon the quoting under another jira. |
@tgravescs Thanks for the review. I have removed these two extra imports.. |
Looks good. Thanks Xuan. I committed this to master and to branch-1.0 |
…release YARN-1824 changes the APIs (addToEnvironment, setEnvFromInputString) in Apps, which causes the spark build to break if built against a version 2.4.0. To fix this, create the spark own function to do that functionality which will not break compiling against 2.3 and other 2.x versions. Author: xuan <[email protected]> Author: xuan <[email protected]> Closes #396 from xgong/master and squashes the following commits: 42b5984 [xuan] Remove two extra imports bc0926f [xuan] Remove usage of org.apache.hadoop.util.Shell be89fa7 [xuan] fix Spark compilation is broken with the latest hadoop-2.4.0 release (cherry picked from commit 725925c) Signed-off-by: Thomas Graves <[email protected]>
Setting load defaults to true in executor This preserves the behavior in earlier releases. If properties are set for the executors via `spark-env.sh` on the slaves, then they should take precedence over spark defaults. This is useful for if system administrators are setting properties for a standalone cluster, such as shuffle locations. /cc @andrewor14 who initially reported this issue.
…release YARN-1824 changes the APIs (addToEnvironment, setEnvFromInputString) in Apps, which causes the spark build to break if built against a version 2.4.0. To fix this, create the spark own function to do that functionality which will not break compiling against 2.3 and other 2.x versions. Author: xuan <[email protected]> Author: xuan <[email protected]> Closes apache#396 from xgong/master and squashes the following commits: 42b5984 [xuan] Remove two extra imports bc0926f [xuan] Remove usage of org.apache.hadoop.util.Shell be89fa7 [xuan] fix Spark compilation is broken with the latest hadoop-2.4.0 release
AKSK auth need OS_PROJECT_ID to add into HTTP HEAD, os we should add it into secrets.yaml and role. Related-Bug: theopenlab/openlab#130
YARN-1824 changes the APIs (addToEnvironment, setEnvFromInputString) in Apps, which causes the spark build to break if built against a version 2.4.0. To fix this, create the spark own function to do that functionality which will not break compiling against 2.3 and other 2.x versions.