-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-2852][MLLIB] Separate model from IDF/StandardScaler algorithms #1814
Conversation
QA tests have started for PR 1814. This patch merges cleanly. |
val withMean: Boolean, | ||
val withStd: Boolean, | ||
val mean: BV[Double], | ||
val factor: BV[Double]) extends VectorTransformer { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since users may want to know the variance of the training set, should we have constructor
class StandardScalerModel private[mllib] (
val withMean: Boolean,
val withStd: Boolean,
val mean: BV[Double],
val variance: BV[Double]) {
lazy val factor = {
val temp = variance.clone
while (i < temp.size) {
temp(i) = if (temp(i) != 0.0) 1.0 / math.sqrt(temp(i)) else 0.0
i += 1
temp
}
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
QA results for PR 1814: |
QA tests have started for PR 1814. This patch merges cleanly. |
QA results for PR 1814: |
LGTM. Merged into both master and branch-1.1. Thanks! |
This is part of SPARK-2828: 1. separate IDF model from IDF algorithm (which generates a model) 2. separate StandardScaler model from StandardScaler CC: dbtsai Author: Xiangrui Meng <[email protected]> Closes #1814 from mengxr/feature-api-update and squashes the following commits: 40d863b [Xiangrui Meng] move mean and variance to model 48a0fff [Xiangrui Meng] separate Model from StandardScaler algorithm 89f3486 [Xiangrui Meng] update IDF to separate Model from Algorithm (cherry picked from commit b9e9e53) Signed-off-by: Xiangrui Meng <[email protected]>
This is part of SPARK-2828: 1. separate IDF model from IDF algorithm (which generates a model) 2. separate StandardScaler model from StandardScaler CC: dbtsai Author: Xiangrui Meng <[email protected]> Closes apache#1814 from mengxr/feature-api-update and squashes the following commits: 40d863b [Xiangrui Meng] move mean and variance to model 48a0fff [Xiangrui Meng] separate Model from StandardScaler algorithm 89f3486 [Xiangrui Meng] update IDF to separate Model from Algorithm
This is part of SPARK-2828:
CC: @dbtsai