Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-11520][ML] RegressionMetrics should support instance weights #9907

Closed
wants to merge 1 commit into from

Conversation

Lewuathe
Copy link
Contributor

This will be important to improve LinearRegressionSummary, which currently has a mix of weighted and unweighted metrics.

@SparkQA
Copy link

SparkQA commented Nov 23, 2015

Test build #46525 has finished for PR 9907 at commit 1d4a5fd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@Lewuathe
Copy link
Contributor Author

@mengxr @jkbradley Could you review this? Thanks.

}.aggregate(new MultivariateOnlineSummarizer())(
(summary, v) => summary.add(v),
(summary, sample) => summary.add(sample._1, sample._2),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should rename this summary variable, it is used 3 times for different objects.

Also, sample is a bit too generic here, and using the _xx methods are not very intuitive. I suggest you unpack the tuple fully: { case (currentSummary, (vec, weight)) => currentSummary.add(vec, weight) }

@thunterdb
Copy link
Contributor

@Lewuathe thanks for your patch. I think it will require more work in RegressionMetrics to fully implement weighted metrics. We need to do the following changes:

  • expose weightSum in MultivariateStatisticalSummary (as a developer API)
  • the computations of SSreg and SStot should take the weights into account
  • in RegressionMetrics, all references to summary.count should be replaced by summary.weightSum
    Given that the default weights are 0, it should give the same result.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants