Skip to content

Commit

Permalink
[SPARK-19387][SPARKR] Tests do not run with SparkR source package in …
Browse files Browse the repository at this point in the history
…CRAN check

## What changes were proposed in this pull request?

- this is cause by changes in SPARK-18444, SPARK-18643 that we no longer install Spark when `master = ""` (default), but also related to SPARK-18449 since the real `master` value is not known at the time the R code in `sparkR.session` is run. (`master` cannot default to "local" since it could be overridden by spark-submit commandline or spark config)
- as a result, while running SparkR as a package in IDE is working fine, CRAN check is not as it is launching it via non-interactive script
- fix is to add check to the beginning of each test and vignettes; the same would also work by changing `sparkR.session()` to `sparkR.session(master = "local")` in tests, but I think being more explicit is better.

## How was this patch tested?

Tested this by reverting version to 2.1, since it needs to download the release jar with matching version. But since there are changes in 2.2 (specifically around SparkR ML) that are incompatible with 2.1, some tests are failing in this config. Will need to port this to branch-2.1 and retest with 2.1 release jar.

manually as:
```
# modify DESCRIPTION to revert version to 2.1.0
SPARK_HOME=/usr/spark R CMD build pkg
# run cran check without SPARK_HOME
R CMD check --as-cran SparkR_2.1.0.tar.gz
```

Author: Felix Cheung <[email protected]>

Closes #16720 from felixcheung/rcranchecktest.

(cherry picked from commit a3626ca)
Signed-off-by: Shivaram Venkataraman <[email protected]>
  • Loading branch information
felixcheung authored and shivaram committed Feb 14, 2017
1 parent f837ced commit 7763b0b
Show file tree
Hide file tree
Showing 4 changed files with 21 additions and 7 deletions.
16 changes: 13 additions & 3 deletions R/pkg/R/install.R
Original file line number Diff line number Diff line change
Expand Up @@ -21,9 +21,9 @@
#' Download and Install Apache Spark to a Local Directory
#'
#' \code{install.spark} downloads and installs Spark to a local directory if
#' it is not found. The Spark version we use is the same as the SparkR version.
#' Users can specify a desired Hadoop version, the remote mirror site, and
#' the directory where the package is installed locally.
#' it is not found. If SPARK_HOME is set in the environment, and that directory is found, that is
#' returned. The Spark version we use is the same as the SparkR version. Users can specify a desired
#' Hadoop version, the remote mirror site, and the directory where the package is installed locally.
#'
#' The full url of remote file is inferred from \code{mirrorUrl} and \code{hadoopVersion}.
#' \code{mirrorUrl} specifies the remote path to a Spark folder. It is followed by a subfolder
Expand Down Expand Up @@ -68,6 +68,16 @@
#' \href{http://spark.apache.org/downloads.html}{Apache Spark}
install.spark <- function(hadoopVersion = "2.7", mirrorUrl = NULL,
localDir = NULL, overwrite = FALSE) {
sparkHome <- Sys.getenv("SPARK_HOME")
if (isSparkRShell()) {
stopifnot(nchar(sparkHome) > 0)
message("Spark is already running in sparkR shell.")
return(invisible(sparkHome))
} else if (!is.na(file.info(sparkHome)$isdir)) {
message("Spark package found in SPARK_HOME: ", sparkHome)
return(invisible(sparkHome))
}

version <- paste0("spark-", packageVersion("SparkR"))
hadoopVersion <- tolower(hadoopVersion)
hadoopVersionName <- hadoopVersionName(hadoopVersion)
Expand Down
6 changes: 2 additions & 4 deletions R/pkg/R/sparkR.R
Original file line number Diff line number Diff line change
Expand Up @@ -588,13 +588,11 @@ processSparkPackages <- function(packages) {
sparkCheckInstall <- function(sparkHome, master, deployMode) {
if (!isSparkRShell()) {
if (!is.na(file.info(sparkHome)$isdir)) {
msg <- paste0("Spark package found in SPARK_HOME: ", sparkHome)
message(msg)
message("Spark package found in SPARK_HOME: ", sparkHome)
NULL
} else {
if (interactive() || isMasterLocal(master)) {
msg <- paste0("Spark not found in SPARK_HOME: ", sparkHome)
message(msg)
message("Spark not found in SPARK_HOME: ", sparkHome)
packageLocalDir <- install.spark()
packageLocalDir
} else if (isClientMode(master) || deployMode == "client") {
Expand Down
3 changes: 3 additions & 0 deletions R/pkg/tests/run-all.R
Original file line number Diff line number Diff line change
Expand Up @@ -21,4 +21,7 @@ library(SparkR)
# Turn all warnings into errors
options("warn" = 2)

# Setup global test environment
install.spark()

test_package("SparkR")
3 changes: 3 additions & 0 deletions R/pkg/vignettes/sparkr-vignettes.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,9 @@ library(SparkR)

We use default settings in which it runs in local mode. It auto downloads Spark package in the background if no previous installation is found. For more details about setup, see [Spark Session](#SetupSparkSession).

```{r, include=FALSE}
install.spark()
```
```{r, message=FALSE, results="hide"}
sparkR.session()
```
Expand Down

0 comments on commit 7763b0b

Please sign in to comment.