Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-2852][MLLIB] Separate model from IDF/StandardScaler algorithms #1814

Closed
wants to merge 3 commits into from

Conversation

mengxr
Copy link
Contributor

@mengxr mengxr commented Aug 6, 2014

This is part of SPARK-2828:

  1. separate IDF model from IDF algorithm (which generates a model)
  2. separate StandardScaler model from StandardScaler

CC: @dbtsai

@SparkQA
Copy link

SparkQA commented Aug 6, 2014

QA tests have started for PR 1814. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18063/consoleFull

val withMean: Boolean,
val withStd: Boolean,
val mean: BV[Double],
val factor: BV[Double]) extends VectorTransformer {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since users may want to know the variance of the training set, should we have constructor

class StandardScalerModel private[mllib] (
    val withMean: Boolean,
    val withStd: Boolean,
    val mean: BV[Double],
    val variance: BV[Double]) {

  lazy val factor = { 
    val temp = variance.clone
    while (i < temp.size) {
    temp(i) = if (temp(i) != 0.0) 1.0 / math.sqrt(temp(i)) else 0.0
     i += 1
     temp
    }
  }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@SparkQA
Copy link

SparkQA commented Aug 6, 2014

QA results for PR 1814:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds the following public classes (experimental):
class StandardScaler(withMean: Boolean, withStd: Boolean) {

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18063/consoleFull

@SparkQA
Copy link

SparkQA commented Aug 7, 2014

QA tests have started for PR 1814. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18111/consoleFull

@SparkQA
Copy link

SparkQA commented Aug 7, 2014

QA results for PR 1814:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds the following public classes (experimental):
class StandardScaler(withMean: Boolean, withStd: Boolean) extends Logging {

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18111/consoleFull

@dbtsai
Copy link
Member

dbtsai commented Aug 7, 2014

LGTM. Merged into both master and branch-1.1. Thanks!

asfgit pushed a commit that referenced this pull request Aug 7, 2014
This is part of SPARK-2828:

1. separate IDF model from IDF algorithm (which generates a model)
2. separate StandardScaler model from StandardScaler

CC: dbtsai

Author: Xiangrui Meng <[email protected]>

Closes #1814 from mengxr/feature-api-update and squashes the following commits:

40d863b [Xiangrui Meng] move mean and variance to model
48a0fff [Xiangrui Meng] separate Model from StandardScaler algorithm
89f3486 [Xiangrui Meng] update IDF to separate Model from Algorithm

(cherry picked from commit b9e9e53)
Signed-off-by: Xiangrui Meng <[email protected]>
@asfgit asfgit closed this in b9e9e53 Aug 7, 2014
xiliu82 pushed a commit to xiliu82/spark that referenced this pull request Sep 4, 2014
This is part of SPARK-2828:

1. separate IDF model from IDF algorithm (which generates a model)
2. separate StandardScaler model from StandardScaler

CC: dbtsai

Author: Xiangrui Meng <[email protected]>

Closes apache#1814 from mengxr/feature-api-update and squashes the following commits:

40d863b [Xiangrui Meng] move mean and variance to model
48a0fff [Xiangrui Meng] separate Model from StandardScaler algorithm
89f3486 [Xiangrui Meng] update IDF to separate Model from Algorithm
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants