-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-6807] [SparkR] Merge recent SparkR-pkg changes #5436
Conversation
Test build #29928 has finished for PR 5436 at commit
|
@davies could you reopen this PR or try to create it from a fresh branch ? Even though the diff looks fine, this has 250 commits or so and will mess up the commit message. |
@shivaram Should we combine these commit into a single huge commit? We will lose the history anyway, I think it's fine. |
Hmm does it work if you cherry-pick these new commits from sparkr-sql branch to a new spark branch ? If we are doing the one big commit lets add the SparkR PR numbers or JIRA links in the description. |
I think that the cherry-pick may not work, because we change the directory. I will try to collect all the commit messages into the description. |
Instead of using a list[list[list[]]], use specific constructors for schema and field objects.
Fail worker early if dependency is missing
[SPARKR-92] Phase 2: implement sum(rdd)
[SPARKR-199] Change takeOrdered, top to fetch one partition at a time
[SPARKR-188] Add profiling of R execution on worker side Conflicts: pkg/inst/worker/worker.R
[SPARKR-154] Phase 3: implement intersection().
[SPARKR-163] Support sampleByKey() Conflicts: pkg/R/pairRDD.R
[SPARKR-154] Phase 4: implement subtract() and subtractByKey().
Refactored `structType` and `structField` so that they can be used to create schemas from R for use with `createDataFrame`. Moved everything to `schema.R` Added new methods to `SQLUtils.scala` for handling `StructType` and `StructField` on the JVM side
Refactored to use the new `structType` and `structField` functions.
New version uses takes a `StructType` from R and creates a DataFrame. Commented out the `tojson` version since we don't currently use it.
Updated `NAMESPACE`, `DESCRIPTION`, and unit tests for new schema functions. Deleted `SQLTypes.R` since everything has been moved to `schema.R`.
Fixes combineByKey
…tractByKey() for RDD.
[SPARKR-154] Phase 2: implement cartesian().
Test build #30275 has finished for PR 5436 at commit
|
@davies We need to add license to schema.R |
Test build #672 has started for PR 5436 at commit |
Test build #30276 has finished for PR 5436 at commit
|
Test build #30277 has finished for PR 5436 at commit
|
@shivaram this PR is ready to review |
@@ -39,8 +39,34 @@ private[r] object SQLUtils { | |||
arr.toSeq | |||
} | |||
|
|||
def createDF(rdd: RDD[Array[Byte]], schemaString: String, sqlContext: SQLContext): DataFrame = { | |||
val schema = DataType.fromJson(schemaString).asInstanceOf[StructType] | |||
def createStructType(fields : Seq[StructField]): StructType = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor style nit: no space between fields and :
here
Thanks @davies -- This is looking pretty good to me. I had a minor style comment. cc @cafreeman @sun-rui (who authored some of the original changes) |
Test build #30435 timed out for PR 5436 at commit |
Test build #687 has started for PR 5436 at commit |
Jenkins, retest this please |
Test build #30465 has finished for PR 5436 at commit
|
LGTM. Merging this |
This PR pulls in recent changes in SparkR-pkg, including
cartesian, intersection, sampleByKey, subtract, subtractByKey, except, and some API for StructType and StructField.