Skip to content
This repository has been archived by the owner on Jan 9, 2020. It is now read-only.

SparkR Support #507

Conversation

ifilonenko
Copy link
Member

@ifilonenko ifilonenko commented Sep 24, 2017

What changes were proposed in this pull request?

Initial Spark R support

How was this patch tested?

  • Initial submission step
  • Unit Tests
  • Docker files (tested)
  • Integration Tests

@@ -71,6 +76,7 @@ private[spark] class DriverConfigurationStepsOrchestrator(
.map(_.split(","))
.getOrElse(Array.empty[String]) ++
additionalMainAppPythonFile.toSeq ++
additionalMainAppRFile.toSeq ++
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important here that, similar to Python Primary Resource, that the R File is distributed via --files.

ADD R /opt/spark/R

RUN apk add --no-cache R && \
rm -r /root/.cache
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Open here for any recommendations?

@ifilonenko ifilonenko self-assigned this Sep 24, 2017
@ifilonenko
Copy link
Member Author

Upon merging, this PR closes #506

@ifilonenko
Copy link
Member Author

ifilonenko commented Sep 24, 2017

To run integration tests in a proper R environment, it is required that R_HOME is defined in the testing environment which means everyone would need to install R. Is that something that would be an issue or something that can be assumed if someone is building out a full dev environment for Spark? @foxish @erikerlandson

@foxish
Copy link
Member

foxish commented Sep 24, 2017

Hmm, interesting. The submitting node has an R dependency? or the driver?

@ifilonenko ifilonenko changed the title [WIP] Spark R Support SparkR Support Sep 24, 2017
@ifilonenko
Copy link
Member Author

The R dependency is because there is a need to mimic the make-distribution environment in target/docker/R so that when we run ADD R opt/spark/R it is already packaged in the Docker environment; this is similar to how we setup PySpark.

@ifilonenko
Copy link
Member Author

@ssuchter @varunkatta Integration test will pass after R is installed and R_HOME is defined in the jenkins environment.

PR is otherwise ready for review

@ifilonenko
Copy link
Member Author

rerun integration tests please

2 similar comments
@ifilonenko
Copy link
Member Author

rerun integration tests please

@ifilonenko
Copy link
Member Author

rerun integration tests please

@ifilonenko
Copy link
Member Author

PR is ready for review @foxish @erikerlandson

@liyinan926
Copy link
Member

@ifilonenko you also need to update sbin/build-push-docker-images.sh to add the new images.

@ifilonenko
Copy link
Member Author

rerun unit tests please

@ifilonenko
Copy link
Member Author

Ready for merging to branch-2.2 unless any other concerns @erikerlandson @foxish @liyinan926

@liyinan926
Copy link
Member

LGTM. Thanks for the work!

@ifilonenko ifilonenko force-pushed the branch-2.2-kubernetes branch from 2e71189 to 71bbbf0 Compare September 28, 2017 03:23
@ifilonenko
Copy link
Member Author

rerun unit tests please

@ifilonenko
Copy link
Member Author

Unless there are any further comments, I think this is ready to merge

@ifilonenko
Copy link
Member Author

All set to merge? @foxish @erikerlandson @liyinan926

@liyinan926
Copy link
Member

I think it's all good.

test("Run SparkR Job on file locally") {
assume(testBackend.name == MINIKUBE_TEST_BACKEND)

launchStagingServer(SSLOptions(), None)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think we need a staging server if the file is in the image.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I just think the test is misnamed - looks like this test is shipping the file while the test below expects the file to be on the container.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

src/test/R/dataframe.R is what we are pushing up to the RSS, which is why it is being launched. Hmm, isn't that file locally stored? What would be the correct naming convention?

val exitCode = process.waitFor()
if (exitCode != 0) {
logInfo(s"exitCode: $exitCode")
val exitCodePython = process.waitFor()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've always been wondering - would it be possible to do this bootstrapping from Maven and not in the test code? Seems like this should be an environment step. Docker images should theoretically be built at the Maven step as well but we know that this is harder to do.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It technically is possible, but as we discussed for PySpark. I wasn't able to figure that out. Any recommendations would be helpful.

@felixcheung
Copy link

hi - what's left for this work? this #507 (comment)?

@mccheah
Copy link

mccheah commented Oct 13, 2017

I left some comments. Also would like to see this tested in a production environment, but maybe we can just merge it and follow up as feedback comes in.

@ifilonenko
Copy link
Member Author

@mccheah as per your last comment. Is this okay to merge then?

@mccheah
Copy link

mccheah commented Oct 16, 2017

Can merge when CI passes - I just updated the branch.

@ifilonenko
Copy link
Member Author

ready for merging: @foxish

@foxish
Copy link
Member

foxish commented Oct 18, 2017

Nvm. Please ignore last comment. That was in the merge commit.

@foxish foxish merged commit f94499b into apache-spark-on-k8s:branch-2.2-kubernetes Oct 18, 2017
puneetloya pushed a commit to puneetloya/spark that referenced this pull request Mar 11, 2019
* initial R support without integration tests

* finished sparkR integration

* case sensitive file names in unix

* revert back to previous lower case in dockerfile

* addition into the build-push-docker-images
ifilonenko pushed a commit to bloomberg/apache-spark-on-k8s that referenced this pull request Mar 19, 2019
ifilonenko pushed a commit to bloomberg/apache-spark-on-k8s that referenced this pull request Apr 4, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants