Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-12168][SPARKR] Add automated tests for conflicted function in R #10171

Closed
wants to merge 2 commits into from

Conversation

felixcheung
Copy link
Member

Currently this is reported when loading the SparkR package in R (probably would add is.nan)

Loading required package: methods

Attaching package: ‘SparkR’

The following objects are masked from ‘package:stats’:

    cov, filter, lag, na.omit, predict, sd, var

The following objects are masked from ‘package:base’:

    colnames, colnames<-, intersect, rank, rbind, sample, subset,
    summary, table, transform

Adding this test adds an automated way to track changes to masked method.
Also, the second part of this test check for those functions that would not be accessible without namespace/package prefix.

Incidentally, this might point to how we would fix those inaccessible functions in base or stats.
Looking for feedback for adding this test.

@SparkQA
Copy link

SparkQA commented Dec 7, 2015

Test build #47252 has finished for PR 10171 at commit 7c7c5d3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@shivaram
Copy link
Contributor

shivaram commented Dec 7, 2015

Thanks @felixcheung -- this is an interesting approach. Do you know if conflicts captures data structures or S4 classes with same name as well ? Also we'd need to start R with --vanilla for this test case to pass ? If somebody has loaded dplyr say then this will fail ?

@felixcheung
Copy link
Member Author

@shivaram conflicts would return any object that is masked, so that could involve local objects/variables. You can see the example with lm here.

I have verified this via SPARK_HOME/R/run_tests.sh - with this load order:

base ... -> stats ... -> testthat -> SparkR

I guess it is possible that something in profile or loading with an explicit library(... pos = "package:base") that changes it, but since testthat suggests running test from R CMD check I think we should have full control over what packages are loaded.

For now I assume we want to detect conflicts with base and stats only, but we could certainly add dplyr or S4Vectors to additional tests (though we would have a lot more conflicts then, as discovered from that earlier investigation)

@sun-rui
Copy link
Contributor

sun-rui commented Dec 14, 2015

To avoid the interference of user loaded packages for this test, we may:

  1. Add --vanilla option to R when running tests;
  2. or is it possible to pick only conflicts with base/stats package and ignore conflicts with other packages?

@felixcheung
Copy link
Member Author

I think we should run tests with --vanilla (or equivalent) to make it cleaner?

Though it seems we don't have a way to specify options to Rscript
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/RRunner.scala#L49

I have a hack to get this to work but shouldn't we support that officially? Like a spark.r.driver.command.options conf?

@gatorsmile
Copy link
Member

Can anybody stop @3ourroom ?

@sun-rui
Copy link
Contributor

sun-rui commented Dec 15, 2015

Just set "spark.r.driver.command" to be "Rscript --vanilla " seems workable?

@felixcheung
Copy link
Member Author

Well, that was the first thing I tried :)

Exception in thread "main" java.io.IOException: Cannot run program "Rscript --vanilla": error=2, No such file or directory
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
    at org.apache.spark.deploy.RRunner$.main(RRunner.scala:90)

ProcessBuilder assumes that entire string is the runnable.
One possible trick is to do a rCommand.split(' ').toSeq at
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/RRunner.scala#L81

Another trick is to set spark.r.driver.command to another script that calls Rscript --vanilla - I have this working.

@sun-rui
Copy link
Contributor

sun-rui commented Dec 15, 2015

@felixcheung, maybe we always set the "--vanilla" option in RRuner? as this is done in RRDD, see https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/api/r/RRDD.scala#L401.
Another option is to provide user a way to specify not only the R command but also command line options. Question is that is this really needed?

@srowen
Copy link
Member

srowen commented Dec 15, 2015

@gatorsmile I reported the spam to Github, who said Apache had to block them. I was about to contact them, but, realized I don't see the spam comments anymore. Is it because I blocked the user or do you all also see they're gone?

@gatorsmile
Copy link
Member

@srowen ! It sounds like his mailing app is hacked. Hopefully, it will not happen again! Thank you! I did not see any new spam message starting from this morning.

@felixcheung
Copy link
Member Author

@shivaram what do you think about adding --vanilla to RRunner here?
It'd be consistent since worker/demon is already running with --vanilla here, and users would still be able to have their desired environ/profile/init file/workspace when starting SparkR programmatically (ie. when with sparkR.init(), but not with sparkR or spark-submit something.R)

@SparkQA
Copy link

SparkQA commented Dec 28, 2015

Test build #48357 has finished for PR 10171 at commit fa4869d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@felixcheung
Copy link
Member Author

@shivaram how about I open a different PR on adding --vanilla to RRunner?

@felixcheung
Copy link
Member Author

@shivaram please review when you get a chance!

@shivaram
Copy link
Contributor

LGTM. Thanks for this PR and apologies for the delay in getting back on this. Merging to master.

@asfgit asfgit closed this in 37fefa6 Jan 20, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants