-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-24103][ML][MLLIB] ML Evaluators should use weight column - added weight column for binary classification evaluator #17084
Conversation
Test build #73526 has finished for PR 17084 at commit
|
@sethah @Lewuathe @thunterdb @WeichenXu123 @jkbradley @actuaryzhang @srowen would you be able to take a look? I've split the larger pull request into three parts as suggested. |
ping @sethah @Lewuathe @thunterdb @WeichenXu123 @jkbradley @actuaryzhang @srowen could you please take a look? thank you! |
Thanks for the PR. I think this is helpful. Will take a look next week. Quite swamped recently. |
gentle ping @actuaryzhang |
mllib/src/main/scala/org/apache/spark/ml/evaluation/BinaryClassificationEvaluator.scala
Outdated
Show resolved
Hide resolved
mllib/src/main/scala/org/apache/spark/ml/evaluation/BinaryClassificationEvaluator.scala
Outdated
Show resolved
Hide resolved
mllib/src/main/scala/org/apache/spark/mllib/evaluation/BinaryClassificationMetrics.scala
Outdated
Show resolved
Hide resolved
mllib/src/main/scala/org/apache/spark/mllib/evaluation/binary/BinaryConfusionMatrix.scala
Outdated
Show resolved
Hide resolved
mllib/src/main/scala/org/apache/spark/mllib/evaluation/BinaryClassificationMetrics.scala
Outdated
Show resolved
Hide resolved
@imatiach-msft Thanks for the PR. Added a couple of comments. Sorry for the delayed review. |
gentle ping @imatiach-msft . |
yes, will update the PR, thanks for the ping |
98652cf
to
cf59c62
Compare
Jenkins, test this please |
Test build #78597 has finished for PR 17084 at commit
|
Test build #78598 has finished for PR 17084 at commit
|
the pip packaging failing seems to be unrelated to the code... let me try this again |
Jenkins, test this please |
Test build #78606 has finished for PR 17084 at commit
|
gentle ping @actuaryzhang, thanks! |
60fc2a7
to
e89a030
Compare
Jenkins, test this please |
Test build #89405 has finished for PR 17084 at commit
|
gently ping @sethah @Lewuathe @thunterdb @WeichenXu123 @jkbradley @actuaryzhang @srowen could you please take a look? I've merged with latest code. Thank you! |
e89a030
to
1d182a2
Compare
Test build #102046 has finished for PR 17084 at commit
|
793a284
to
e78076b
Compare
Test build #102177 has finished for PR 17084 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a quick first pass review, thanks for working on this :)
mllib/src/main/scala/org/apache/spark/mllib/evaluation/binary/BinaryLabelCounter.scala
Outdated
Show resolved
Hide resolved
...ain/scala/org/apache/spark/mllib/evaluation/binary/BinaryClassificationMetricComputers.scala
Outdated
Show resolved
Hide resolved
...ain/scala/org/apache/spark/mllib/evaluation/binary/BinaryClassificationMetricComputers.scala
Outdated
Show resolved
Hide resolved
...ain/scala/org/apache/spark/mllib/evaluation/binary/BinaryClassificationMetricComputers.scala
Outdated
Show resolved
Hide resolved
...ain/scala/org/apache/spark/mllib/evaluation/binary/BinaryClassificationMetricComputers.scala
Outdated
Show resolved
Hide resolved
mllib/src/main/scala/org/apache/spark/ml/evaluation/BinaryClassificationEvaluator.scala
Outdated
Show resolved
Hide resolved
3c459bd
to
955b022
Compare
Test build #102510 has finished for PR 17084 at commit
|
@holdenk @srowen @actuaryzhang would you be able to take another look at this PR when you have a chance? I've updated to latest, fixed all failing tests and I think I have fixed all of the comments. Thank you! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's looking OK; does Pyspark need an update, and does the multiclass evaluator need an update to match?
mllib/src/main/scala/org/apache/spark/mllib/evaluation/BinaryClassificationMetrics.scala
Outdated
Show resolved
Hide resolved
Ah right, long since lost track. Yeah it would make sense to update it here too. I don't think R has an evaluator? |
Test build #102621 has finished for PR 17084 at commit
|
Test build #102720 has finished for PR 17084 at commit
|
mllib/src/test/scala/org/apache/spark/mllib/evaluation/BinaryClassificationMetricsSuite.scala
Show resolved
Hide resolved
mllib/src/test/scala/org/apache/spark/mllib/evaluation/BinaryClassificationMetricsSuite.scala
Outdated
Show resolved
Hide resolved
mllib/src/test/scala/org/apache/spark/mllib/evaluation/BinaryClassificationMetricsSuite.scala
Outdated
Show resolved
Hide resolved
mllib/src/test/scala/org/apache/spark/ml/evaluation/BinaryClassificationEvaluatorSuite.scala
Outdated
Show resolved
Hide resolved
Test build #102733 has finished for PR 17084 at commit
|
d8a5865
to
00bfec1
Compare
Test build #102734 has finished for PR 17084 at commit
|
@srowen I think I have fixed all pending comments and the tests are currently passing for this PR. I couldn't find any evaluators in the R code, but then again I am not as familiar with sparkR (however, I've used sparklyr before https://github.com/rstudio/sparklyr). Please let me know if there are any other comments that need to be addressed. Thank you! |
Merged to master |
What changes were proposed in this pull request?
The evaluators BinaryClassificationEvaluator, RegressionEvaluator, and MulticlassClassificationEvaluator and the corresponding metrics classes BinaryClassificationMetrics, RegressionMetrics and MulticlassMetrics should use sample weight data.
I've closed the PR: #16557
as recommended in favor of creating three pull requests, one for each of the evaluators (binary/regression/multiclass) to make it easier to review/update.
How was this patch tested?
I added tests to the metrics and evaluators classes.