[SPARK-12699][SPARKR] R driver process should start in a clean state #10652

felixcheung · 2016-01-07T22:52:15Z

Currently we have R worker process launched with the --vanilla option that brings it up in a clean state (without init profile or workspace data, https://stat.ethz.ch/R-manual/R-devel/library/base/html/Startup.html). However, the R process for the Spark driver is not.

We should do that because

That would make driver consistent with the worker process in R - for instance, a library might be load in driver but not worker
Since SparkR depends on .libPath and .First() it could be broken by something in the user workspace, for example

Here are the changes proposed:

When starting sparkR shell (except: allow save/restore workspace, since the driver/shell is local)
When launching R driver in cluster mode
In cluster mode, when calling R to install shipped R package

This is discussed in PR #10171

@shivaram @sun-rui

felixcheung · 2016-01-08T01:33:38Z

jenkins, retest this please

sun-rui · 2016-01-08T02:17:25Z

core/src/main/scala/org/apache/spark/deploy/RPackageUtils.scala

@@ -36,7 +36,8 @@ private[deploy] object RPackageUtils extends Logging {
  private final val hasRPackage = "Spark-HasRPackage"

  /** Base of the shell command used in order to install R packages. */
-  private final val baseInstallCmd = Seq("R", "CMD", "INSTALL", "-l")
+  private final val baseInstallCmd = Seq("R", "--no-save", "--no-site-file", "--no-environ",
+    "--no-restore", "CMD", "INSTALL", "-l")


I guess these options do not make sense for R package installation?

I actually think it does - it's easier to try to install package in a clean state than trying to debug when the job failed because the package failed to install.

This is just installation that will not start R session, so these options won't be used?

It actually would load the same site file, saved session etc when launching R with R CMD - look for R CMD in https://stat.ethz.ch/R-manual/R-devel/library/base/html/Startup.html

shivaram · 2016-01-19T22:56:16Z

So I'm not completely sure this is a good idea. Users might have their own R environment setup scripts in their home directory (site-file or init-file as in the R docs you linked to) that they expect to work on the driver side. On the executor side it is much more limited in terms of what code runs (i.e. invisible to the user) so I don't think the same expectations can be matched with respect to that ?

felixcheung · 2016-01-19T23:51:14Z

Driver could also be running in YARN cluster mode in which a clean state might make sense?
To me this is just to reduce the level of variability. And this was brought up in PR #10171
I could also change this to only for driver in cluster mode but not from sparkR shell.

shivaram · 2016-01-20T00:24:43Z

Yeah doing it just for the cluster mode driver seems fine to me.

SparkQA · 2016-01-20T02:21:15Z

Test build #49735 has finished for PR 10652 at commit 78eb194.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

sun-rui · 2016-01-20T03:00:21Z

RRunner is not only for running driver on cluster, but also for running an R script locally in client mode.

felixcheung · 2016-01-20T03:59:31Z

@sun-rui is it spark-submit foo.R?

sun-rui · 2016-01-20T04:37:02Z

@felixcheung, yes, something like that

felixcheung · 2016-01-20T06:46:13Z

I don't know if there is a way to distinguish that.
It could be spark-submit or calling SparkSubmit class from Oozie and running the job in YARN client mode in which case the driver is actually running on a worker, which could be the same worker running executors.

I guess we could explicitly bypass this if the cluster manager is LOCAL?

sun-rui · 2016-01-20T07:22:01Z

It is possible to get deploy mode from "spark.submit.deployMode", and check if it is "client". You can take a look at https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/api/r/RUtils.scala#L49

felixcheung · 2016-01-20T07:49:10Z

I realize that, my point is even in client mode the driver could be running on a worker machine, as in the case Spark job is submitted from another YARN app.

To elaborate more, one of the possible source of issue is running Spark job in YARN client mode from a workflow engine (eg. Oozie). In such a case, the driver/client is actually being run on a random worker node of the cluster.

If we think picking up random profile that way is ok, then I guess I could change it to add the flags only when deployMode is cluster.

felixcheung added 2 commits January 7, 2016 01:13

add R command line options

c3488c9

allow save/restore user workspace when running shell

24fee57

sun-rui reviewed Jan 8, 2016
View reviewed changes

remove change to sparkR shell

78eb194

felixcheung closed this Apr 19, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-12699][SPARKR] R driver process should start in a clean state #10652

[SPARK-12699][SPARKR] R driver process should start in a clean state #10652

felixcheung commented Jan 7, 2016

felixcheung commented Jan 8, 2016

sun-rui Jan 8, 2016

felixcheung Jan 8, 2016

sun-rui Jan 20, 2016

felixcheung Jan 21, 2016

shivaram commented Jan 19, 2016

felixcheung commented Jan 19, 2016

shivaram commented Jan 20, 2016

SparkQA commented Jan 20, 2016

sun-rui commented Jan 20, 2016

felixcheung commented Jan 20, 2016

sun-rui commented Jan 20, 2016

felixcheung commented Jan 20, 2016

sun-rui commented Jan 20, 2016

felixcheung commented Jan 20, 2016

[SPARK-12699][SPARKR] R driver process should start in a clean state #10652

[SPARK-12699][SPARKR] R driver process should start in a clean state #10652

Conversation

felixcheung commented Jan 7, 2016

felixcheung commented Jan 8, 2016

sun-rui Jan 8, 2016

Choose a reason for hiding this comment

felixcheung Jan 8, 2016

Choose a reason for hiding this comment

sun-rui Jan 20, 2016

Choose a reason for hiding this comment

felixcheung Jan 21, 2016

Choose a reason for hiding this comment

shivaram commented Jan 19, 2016

felixcheung commented Jan 19, 2016

shivaram commented Jan 20, 2016

SparkQA commented Jan 20, 2016

sun-rui commented Jan 20, 2016

felixcheung commented Jan 20, 2016

sun-rui commented Jan 20, 2016

felixcheung commented Jan 20, 2016

sun-rui commented Jan 20, 2016

felixcheung commented Jan 20, 2016