Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-5990] [MLLIB] Model import/export for IsotonicRegression #5270

Closed
wants to merge 5 commits into from

Conversation

yanboliang
Copy link
Contributor

Model import/export for IsotonicRegression

@SparkQA
Copy link

SparkQA commented Mar 30, 2015

Test build #29410 has started for PR 5270 at commit 2b2f5a1.

@SparkQA
Copy link

SparkQA commented Mar 30, 2015

Test build #29410 has finished for PR 5270 at commit 2b2f5a1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class Data(boundaries: Array[Double], predictions: Array[Double], isotonic: Boolean)
  • This patch does not change any dependencies.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29410/
Test PASSed.

def thisClassName: String = "org.apache.spark.mllib.regression.IsotonicRegressionModel"

/** Model data for model import/export */
case class Data(boundaries: Array[Double], predictions: Array[Double], isotonic: Boolean)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be easier to inspect the data file if we put each interval as a record. For example:

boundary prediction
0.0 -1.0
1.0 0.5
2.0 1.0

We can save isotonic as a value in the metadata.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29715/
Test FAILed.

def thisClassName: String = "org.apache.spark.mllib.regression.IsotonicRegressionModel"

/** Model data for model import/export */
case class Data(intervals: Array[(Double, Double)])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My suggestion was

case class Data(boundary: Double, prediction: Double)

And then save each (boundary, prediction) pair as a record:

sqlContext.createDataFrame(boundaries.zip(predictions).map { case (b, p) => Data(b, p) })
  .saveAsParquetFile(dataPath(path))

@SparkQA
Copy link

SparkQA commented Apr 9, 2015

Test build #29953 has started for PR 5270 at commit 49600cc.

@SparkQA
Copy link

SparkQA commented Apr 9, 2015

Test build #29953 has finished for PR 5270 at commit 49600cc.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class Data(boundary: Double, prediction: Double)
  • This patch does not change any dependencies.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29953/
Test FAILed.

@mengxr
Copy link
Contributor

mengxr commented Apr 9, 2015

test this please

@SparkQA
Copy link

SparkQA commented Apr 9, 2015

Test build #29960 has started for PR 5270 at commit 49600cc.

@SparkQA
Copy link

SparkQA commented Apr 9, 2015

Test build #29960 has finished for PR 5270 at commit 49600cc.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class Data(boundary: Double, prediction: Double)
  • This patch does not change any dependencies.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29960/
Test FAILed.


import org.apache.spark.mllib.util.Loader._

private object SaveLoadV1_0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove one space after private

@SparkQA
Copy link

SparkQA commented Apr 20, 2015

Test build #30593 has started for PR 5270 at commit f80ec1b.

@SparkQA
Copy link

SparkQA commented Apr 20, 2015

Test build #30593 has finished for PR 5270 at commit f80ec1b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class Data(boundary: Double, prediction: Double)
  • This patch does not change any dependencies.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30593/
Test PASSed.

predictions: Array[Double],
isotonic: Boolean): Unit = {
val sqlContext = new SQLContext(sc)
import sqlContext.implicits._
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this line because no implicits are used.

@mengxr
Copy link
Contributor

mengxr commented Apr 20, 2015

LGTM except minor inline comments.

val sameModel = IsotonicRegressionModel.load(sc, path)
assert(model.boundaries === sameModel.boundaries)
assert(model.predictions === sameModel.predictions)
assert(model.isotonic == model.isotonic)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

== -> ===

@SparkQA
Copy link

SparkQA commented Apr 21, 2015

Test build #30635 has started for PR 5270 at commit 872028d.

@SparkQA
Copy link

SparkQA commented Apr 21, 2015

Test build #30635 has finished for PR 5270 at commit 872028d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class Data(boundary: Double, prediction: Double)
  • This patch does not change any dependencies.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30635/
Test PASSed.

@mengxr
Copy link
Contributor

mengxr commented Apr 21, 2015

Merged into master. Thanks!

@asfgit asfgit closed this in 1f2f723 Apr 21, 2015
@yanboliang yanboliang deleted the spark-5990 branch April 24, 2015 10:02
nemccarthy pushed a commit to nemccarthy/spark that referenced this pull request Jun 19, 2015
Model import/export for IsotonicRegression

Author: Yanbo Liang <[email protected]>

Closes apache#5270 from yanboliang/spark-5990 and squashes the following commits:

872028d [Yanbo Liang] fix code style
f80ec1b [Yanbo Liang] address comments
49600cc [Yanbo Liang] address comments
429ff7d [Yanbo Liang] store each interval as a record
2b2f5a1 [Yanbo Liang] Model import/export for IsotonicRegression
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants