-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-5990] [MLLIB] Model import/export for IsotonicRegression #5270
Conversation
Test build #29410 has started for PR 5270 at commit |
Test build #29410 has finished for PR 5270 at commit
|
Test PASSed. |
def thisClassName: String = "org.apache.spark.mllib.regression.IsotonicRegressionModel" | ||
|
||
/** Model data for model import/export */ | ||
case class Data(boundaries: Array[Double], predictions: Array[Double], isotonic: Boolean) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be easier to inspect the data file if we put each interval as a record. For example:
boundary | prediction |
---|---|
0.0 | -1.0 |
1.0 | 0.5 |
2.0 | 1.0 |
We can save isotonic
as a value in the metadata.
Test FAILed. |
def thisClassName: String = "org.apache.spark.mllib.regression.IsotonicRegressionModel" | ||
|
||
/** Model data for model import/export */ | ||
case class Data(intervals: Array[(Double, Double)]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My suggestion was
case class Data(boundary: Double, prediction: Double)
And then save each (boundary, prediction)
pair as a record:
sqlContext.createDataFrame(boundaries.zip(predictions).map { case (b, p) => Data(b, p) })
.saveAsParquetFile(dataPath(path))
Test build #29953 has started for PR 5270 at commit |
Test build #29953 has finished for PR 5270 at commit
|
Test FAILed. |
test this please |
Test build #29960 has started for PR 5270 at commit |
Test build #29960 has finished for PR 5270 at commit
|
Test FAILed. |
|
||
import org.apache.spark.mllib.util.Loader._ | ||
|
||
private object SaveLoadV1_0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove one space after private
Test build #30593 has started for PR 5270 at commit |
Test build #30593 has finished for PR 5270 at commit
|
Test PASSed. |
predictions: Array[Double], | ||
isotonic: Boolean): Unit = { | ||
val sqlContext = new SQLContext(sc) | ||
import sqlContext.implicits._ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this line because no implicits are used.
LGTM except minor inline comments. |
val sameModel = IsotonicRegressionModel.load(sc, path) | ||
assert(model.boundaries === sameModel.boundaries) | ||
assert(model.predictions === sameModel.predictions) | ||
assert(model.isotonic == model.isotonic) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
==
-> ===
Test build #30635 has started for PR 5270 at commit |
Test build #30635 has finished for PR 5270 at commit
|
Test PASSed. |
Merged into master. Thanks! |
Model import/export for IsotonicRegression Author: Yanbo Liang <[email protected]> Closes apache#5270 from yanboliang/spark-5990 and squashes the following commits: 872028d [Yanbo Liang] fix code style f80ec1b [Yanbo Liang] address comments 49600cc [Yanbo Liang] address comments 429ff7d [Yanbo Liang] store each interval as a record 2b2f5a1 [Yanbo Liang] Model import/export for IsotonicRegression
Model import/export for IsotonicRegression