-
Notifications
You must be signed in to change notification settings - Fork 118
SparkR Support #507
SparkR Support #507
Conversation
@@ -71,6 +76,7 @@ private[spark] class DriverConfigurationStepsOrchestrator( | |||
.map(_.split(",")) | |||
.getOrElse(Array.empty[String]) ++ | |||
additionalMainAppPythonFile.toSeq ++ | |||
additionalMainAppRFile.toSeq ++ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important here that, similar to Python Primary Resource, that the R File is distributed via --files.
ADD R /opt/spark/R | ||
|
||
RUN apk add --no-cache R && \ | ||
rm -r /root/.cache |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Open here for any recommendations?
Upon merging, this PR closes #506 |
To run integration tests in a proper R environment, it is required that R_HOME is defined in the testing environment which means everyone would need to install R. Is that something that would be an issue or something that can be assumed if someone is building out a full dev environment for Spark? @foxish @erikerlandson |
Hmm, interesting. The submitting node has an R dependency? or the driver? |
The R dependency is because there is a need to mimic the make-distribution environment in |
@ssuchter @varunkatta Integration test will pass after R is installed and R_HOME is defined in the jenkins environment. PR is otherwise ready for review |
rerun integration tests please |
2 similar comments
rerun integration tests please |
rerun integration tests please |
PR is ready for review @foxish @erikerlandson |
@ifilonenko you also need to update |
rerun unit tests please |
Ready for merging to branch-2.2 unless any other concerns @erikerlandson @foxish @liyinan926 |
LGTM. Thanks for the work! |
2e71189
to
71bbbf0
Compare
rerun unit tests please |
Unless there are any further comments, I think this is ready to merge |
All set to merge? @foxish @erikerlandson @liyinan926 |
I think it's all good. |
test("Run SparkR Job on file locally") { | ||
assume(testBackend.name == MINIKUBE_TEST_BACKEND) | ||
|
||
launchStagingServer(SSLOptions(), None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't think we need a staging server if the file is in the image.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I just think the test is misnamed - looks like this test is shipping the file while the test below expects the file to be on the container.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src/test/R/dataframe.R
is what we are pushing up to the RSS, which is why it is being launched. Hmm, isn't that file locally stored? What would be the correct naming convention?
val exitCode = process.waitFor() | ||
if (exitCode != 0) { | ||
logInfo(s"exitCode: $exitCode") | ||
val exitCodePython = process.waitFor() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've always been wondering - would it be possible to do this bootstrapping from Maven and not in the test code? Seems like this should be an environment step. Docker images should theoretically be built at the Maven step as well but we know that this is harder to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It technically is possible, but as we discussed for PySpark. I wasn't able to figure that out. Any recommendations would be helpful.
hi - what's left for this work? this #507 (comment)? |
I left some comments. Also would like to see this tested in a production environment, but maybe we can just merge it and follow up as feedback comes in. |
@mccheah as per your last comment. Is this okay to merge then? |
Can merge when CI passes - I just updated the branch. |
…apache-spark-on-k8s into branch-2.2-kubernetes
ready for merging: @foxish |
Nvm. Please ignore last comment. That was in the merge commit. |
* initial R support without integration tests * finished sparkR integration * case sensitive file names in unix * revert back to previous lower case in dockerfile * addition into the build-push-docker-images
What changes were proposed in this pull request?
Initial Spark R support
How was this patch tested?