-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SPARK-4111 [MLlib] add regression metrics #2978
Conversation
Can one of the admins verify this patch? |
* Computes R^2^, the coefficient of determination. | ||
* @return | ||
*/ | ||
def r2_socre(): Double = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo in name
Update the title with SPARK-XXXX [MLLIB] |
private lazy val summarizer: MultivariateOnlineSummarizer = { | ||
val summarizer: MultivariateOnlineSummarizer = valuesAndPreds.map{ | ||
case (value,pred) => Vectors.dense( | ||
Array(value, pred, value - pred, math.abs(value - pred), math.pow(value - pred, 2.0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see that stats for pred
are used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's not used and I have remove it in a new commit.
Rename re_score() and remove unused column. |
* @return | ||
*/ | ||
def r2_score(): Double = { | ||
1 - summarizer.mean(3) * summarizer.count / (summarizer.variance(0) * (summarizer.count - 1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this might be worth a comment to explain what sums of squares you are trying to compute in the numerator and denominator. A link to the definition might be good, here and for explained variance, since they are related.
This is picky now, but you might write out "meanAverageError" instead of saying "mae". Is "r2_score" style-wise correct vs "r2Score"? (Sorry should have thought of that.) Finally consider using |
private lazy val summarizer: MultivariateOnlineSummarizer = { | ||
val summarizer: MultivariateOnlineSummarizer = valuesAndPreds.map{ | ||
case (value,pred) => Vectors.dense( | ||
Array(value, value - pred, math.abs(value - pred), math.pow(value - pred, 2.0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also picky but you can avoid math.pow
and avoid computing value - pred
3 times here with a local var. Might be cleaner. This LGTM for what it's worth.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The third and the forth columns are not necessary. You can use normL1
and normL2
on the second column:
Rename parameter and function names to be consistent with spark naming rules. |
ok to test |
test this please |
Test build #22457 has started for PR 2978 at commit
|
Test build #22457 timed out for PR 2978 at commit |
Test FAILed. |
* @param predictionAndObservations an RDD of (prediction,observation) pairs. | ||
*/ | ||
@Experimental | ||
class RegressionMetrics(predictionAndObservations: RDD[(Double, Double)]) extends Logging { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
space after ,
*/ | ||
private lazy val summarizer: MultivariateOnlineSummarizer = { | ||
val summarizer: MultivariateOnlineSummarizer = predictionAndObservations.map{ | ||
case (prediction,observation) => Vectors.dense( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
space after ,
Test build #22527 has started for PR 2978 at commit
|
Test build #22528 has started for PR 2978 at commit
|
Test build #22527 has finished for PR 2978 at commit
|
Test PASSed. |
Test build #22528 has finished for PR 2978 at commit
|
Test PASSed. |
LGTM. Merged into master. Thanks! |
Add RegressionMetrics.scala as regression metrics used for evaluation and corresponding test case RegressionMetricsSuite.scala.