[SPARK-24103][ML][MLLIB] ML Evaluators should use weight column - added weight column for binary classification evaluator #17084

imatiach-msft · 2017-02-27T18:16:35Z

What changes were proposed in this pull request?

The evaluators BinaryClassificationEvaluator, RegressionEvaluator, and MulticlassClassificationEvaluator and the corresponding metrics classes BinaryClassificationMetrics, RegressionMetrics and MulticlassMetrics should use sample weight data.

I've closed the PR: #16557
as recommended in favor of creating three pull requests, one for each of the evaluators (binary/regression/multiclass) to make it easier to review/update.

How was this patch tested?

I added tests to the metrics and evaluators classes.

SparkQA · 2017-02-27T19:10:24Z

Test build #73526 has finished for PR 17084 at commit 98652cf.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
class BinaryClassificationMetrics @Since(\"2.2.0\") (

imatiach-msft · 2017-02-27T22:39:57Z

@sethah @Lewuathe @thunterdb @WeichenXu123 @jkbradley @actuaryzhang @srowen would you be able to take a look? I've split the larger pull request into three parts as suggested.

imatiach-msft · 2017-03-16T04:51:53Z

ping @sethah @Lewuathe @thunterdb @WeichenXu123 @jkbradley @actuaryzhang @srowen could you please take a look? thank you!

actuaryzhang · 2017-03-16T05:19:02Z

Thanks for the PR. I think this is helpful. Will take a look next week. Quite swamped recently.

HyukjinKwon · 2017-05-11T14:36:59Z

gentle ping @actuaryzhang

mllib/src/main/scala/org/apache/spark/ml/evaluation/BinaryClassificationEvaluator.scala

mllib/src/main/scala/org/apache/spark/mllib/evaluation/BinaryClassificationMetrics.scala

mllib/src/main/scala/org/apache/spark/mllib/evaluation/binary/BinaryConfusionMatrix.scala

mllib/src/main/scala/org/apache/spark/mllib/evaluation/BinaryClassificationMetrics.scala

actuaryzhang · 2017-05-13T18:17:10Z

@imatiach-msft Thanks for the PR. Added a couple of comments. Sorry for the delayed review.

HyukjinKwon · 2017-06-19T04:36:11Z

gentle ping @imatiach-msft .

imatiach-msft · 2017-06-26T03:41:52Z

yes, will update the PR, thanks for the ping

imatiach-msft · 2017-06-26T04:20:36Z

Jenkins, test this please

SparkQA · 2017-06-26T04:56:36Z

Test build #78597 has finished for PR 17084 at commit cf59c62.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
class BinaryClassificationMetrics @Since(\"2.2.0\") (

SparkQA · 2017-06-26T04:59:43Z

Test build #78598 has finished for PR 17084 at commit 60fc2a7.

This patch fails PySpark pip packaging tests.
This patch merges cleanly.
This patch adds no public classes.

imatiach-msft · 2017-06-26T05:25:07Z

the pip packaging failing seems to be unrelated to the code... let me try this again

imatiach-msft · 2017-06-26T05:25:15Z

Jenkins, test this please

SparkQA · 2017-06-26T06:27:20Z

Test build #78606 has finished for PR 17084 at commit 60fc2a7.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

imatiach-msft · 2017-06-27T04:11:16Z

gentle ping @actuaryzhang, thanks!

imatiach-msft · 2018-04-16T16:43:38Z

Jenkins, test this please

SparkQA · 2018-04-16T17:53:07Z

Test build #89405 has finished for PR 17084 at commit e89a030.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

imatiach-msft · 2018-04-16T18:39:42Z

gently ping @sethah @Lewuathe @thunterdb @WeichenXu123 @jkbradley @actuaryzhang @srowen could you please take a look? I've merged with latest code. Thank you!

SparkQA · 2019-02-05T07:07:28Z

Test build #102046 has finished for PR 17084 at commit 793a284.

This patch fails PySpark pip packaging tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
class BinaryClassificationMetrics @Since(\"3.0.0\") (

SparkQA · 2019-02-11T07:10:43Z

Test build #102177 has finished for PR 17084 at commit e78076b.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
class BinaryClassificationMetrics @Since(\"3.0.0\") (

holdenk

I did a quick first pass review, thanks for working on this :)

mllib/src/main/scala/org/apache/spark/mllib/evaluation/binary/BinaryLabelCounter.scala

...ain/scala/org/apache/spark/mllib/evaluation/binary/BinaryClassificationMetricComputers.scala

mllib/src/main/scala/org/apache/spark/ml/evaluation/BinaryClassificationEvaluator.scala

…k for weight col

SparkQA · 2019-02-19T17:44:29Z

Test build #102510 has finished for PR 17084 at commit 955b022.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

imatiach-msft · 2019-02-21T03:10:48Z

@holdenk @srowen @actuaryzhang would you be able to take another look at this PR when you have a chance? I've updated to latest, fixed all failing tests and I think I have fixed all of the comments. Thank you!

srowen

It's looking OK; does Pyspark need an update, and does the multiclass evaluator need an update to match?

mllib/src/main/scala/org/apache/spark/mllib/evaluation/BinaryClassificationMetrics.scala

imatiach-msft · 2019-02-22T04:32:47Z

@srowen multiclass weight columns PR is already merged:
#17086
The pyspark side will need to be updated. Should that be part of this PR or a separate PR?

srowen · 2019-02-22T04:34:47Z

Ah right, long since lost track. Yeah it would make sense to update it here too. I don't think R has an evaluator?

SparkQA · 2019-02-22T06:11:31Z

Test build #102621 has finished for PR 17084 at commit 079e114.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-02-24T06:12:49Z

Test build #102720 has finished for PR 17084 at commit a571adc.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
class BinaryClassificationEvaluator(JavaEvaluator, HasLabelCol, HasRawPredictionCol, HasWeightCol,

mllib/src/test/scala/org/apache/spark/mllib/evaluation/BinaryClassificationMetricsSuite.scala

mllib/src/test/scala/org/apache/spark/ml/evaluation/BinaryClassificationEvaluatorSuite.scala

python/pyspark/mllib/evaluation.py

SparkQA · 2019-02-25T03:59:29Z

Test build #102733 has finished for PR 17084 at commit d8a5865.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
class MulticlassMetrics @Since(\"1.1.0\") (predictionAndLabels: RDD[_ <: Product])

SparkQA · 2019-02-25T05:20:40Z

Test build #102734 has finished for PR 17084 at commit 00bfec1.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
class MulticlassMetrics @Since(\"1.1.0\") (predictionAndLabels: RDD[_ <: Product])

imatiach-msft · 2019-02-25T22:45:11Z

@srowen I think I have fixed all pending comments and the tests are currently passing for this PR. I couldn't find any evaluators in the R code, but then again I am not as familiar with sparkR (however, I've used sparklyr before https://github.com/rstudio/sparklyr). Please let me know if there are any other comments that need to be addressed. Thank you!

srowen · 2019-02-25T23:17:06Z

Merged to master

imatiach-msft mentioned this pull request Feb 27, 2017

[SPARK-18693][ML][MLLIB] ML Evaluators should use weight column #16557

Closed

actuaryzhang reviewed May 13, 2017

View reviewed changes

HyukjinKwon mentioned this pull request Jun 25, 2017

[INFRA] Close stale PRs #18417

Closed

imatiach-msft force-pushed the ilmat/binary-evalute branch from 98652cf to cf59c62 Compare June 26, 2017 03:55

imatiach-msft force-pushed the ilmat/binary-evalute branch from 60fc2a7 to e89a030 Compare April 16, 2018 16:43

imatiach-msft changed the title ~~[SPARK-18693][ML][MLLIB] ML Evaluators should use weight column - added weight column for binary classification evaluator~~ [SPARK-24103][ML][MLLIB] ML Evaluators should use weight column - added weight column for binary classification evaluator May 14, 2018

imatiach-msft mentioned this pull request Dec 11, 2018

[SPARK-24102][ML][MLLIB] ML Evaluators should use weight column - added weight column for regression evaluator #17085

Closed

imatiach-msft force-pushed the ilmat/binary-evalute branch from e89a030 to 1d182a2 Compare February 5, 2019 04:45

imatiach-msft force-pushed the ilmat/binary-evalute branch from 793a284 to e78076b Compare February 11, 2019 05:49

holdenk reviewed Feb 11, 2019

View reviewed changes

srowen requested changes Feb 15, 2019

View reviewed changes

mllib/src/main/scala/org/apache/spark/ml/evaluation/BinaryClassificationEvaluator.scala Outdated Show resolved Hide resolved

imatiach-msft added 9 commits February 18, 2019 23:57

Added weight column for binary classification evaluator

8d31e1e

Updated based on comments - fixed since tag, renamed vars, added chec…

7f8d5ad

…k for weight col

updated version

af96c45

updated based on comments

16a0326

made code more similar to other two PRs

3d9104c

fix MIMA

d99f6d1

updated based on comments and fixed style

fd80790

updated based on comments

bde3069

fixed failing tests

955b022

imatiach-msft force-pushed the ilmat/binary-evalute branch from 3c459bd to 955b022 Compare February 19, 2019 04:58

srowen reviewed Feb 22, 2019

View reviewed changes

mllib/src/main/scala/org/apache/spark/mllib/evaluation/BinaryClassificationMetrics.scala Outdated Show resolved Hide resolved

updated doc based on comments

079e114

updated python code

a571adc

srowen reviewed Feb 24, 2019

View reviewed changes

updated based on new comments

00bfec1

imatiach-msft force-pushed the ilmat/binary-evalute branch from d8a5865 to 00bfec1 Compare February 25, 2019 04:03

srowen approved these changes Feb 25, 2019

View reviewed changes

srowen closed this in b66be0e Feb 25, 2019

imatiach-msft mentioned this pull request Mar 25, 2019

[SPARK-24102][ML][MLLIB][PYSPARK][FOLLOWUP] Added weight column to pyspark API for regression evaluator and metrics #24197

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-24103][ML][MLLIB] ML Evaluators should use weight column - added weight column for binary classification evaluator #17084

[SPARK-24103][ML][MLLIB] ML Evaluators should use weight column - added weight column for binary classification evaluator #17084

imatiach-msft commented Feb 27, 2017

SparkQA commented Feb 27, 2017

imatiach-msft commented Feb 27, 2017

imatiach-msft commented Mar 16, 2017

actuaryzhang commented Mar 16, 2017

HyukjinKwon commented May 11, 2017

actuaryzhang commented May 13, 2017

HyukjinKwon commented Jun 19, 2017

imatiach-msft commented Jun 26, 2017

imatiach-msft commented Jun 26, 2017

SparkQA commented Jun 26, 2017

SparkQA commented Jun 26, 2017

imatiach-msft commented Jun 26, 2017

imatiach-msft commented Jun 26, 2017

SparkQA commented Jun 26, 2017

imatiach-msft commented Jun 27, 2017

imatiach-msft commented Apr 16, 2018

SparkQA commented Apr 16, 2018

imatiach-msft commented Apr 16, 2018

SparkQA commented Feb 5, 2019

SparkQA commented Feb 11, 2019

holdenk left a comment

SparkQA commented Feb 19, 2019

imatiach-msft commented Feb 21, 2019

srowen left a comment

imatiach-msft commented Feb 22, 2019

srowen commented Feb 22, 2019

SparkQA commented Feb 22, 2019

SparkQA commented Feb 24, 2019

SparkQA commented Feb 25, 2019

SparkQA commented Feb 25, 2019

imatiach-msft commented Feb 25, 2019

srowen commented Feb 25, 2019

[SPARK-24103][ML][MLLIB] ML Evaluators should use weight column - added weight column for binary classification evaluator #17084

[SPARK-24103][ML][MLLIB] ML Evaluators should use weight column - added weight column for binary classification evaluator #17084

Conversation

imatiach-msft commented Feb 27, 2017

What changes were proposed in this pull request?

How was this patch tested?

SparkQA commented Feb 27, 2017

imatiach-msft commented Feb 27, 2017

imatiach-msft commented Mar 16, 2017

actuaryzhang commented Mar 16, 2017

HyukjinKwon commented May 11, 2017

actuaryzhang commented May 13, 2017

HyukjinKwon commented Jun 19, 2017

imatiach-msft commented Jun 26, 2017

imatiach-msft commented Jun 26, 2017

SparkQA commented Jun 26, 2017

SparkQA commented Jun 26, 2017

imatiach-msft commented Jun 26, 2017

imatiach-msft commented Jun 26, 2017

SparkQA commented Jun 26, 2017

imatiach-msft commented Jun 27, 2017

imatiach-msft commented Apr 16, 2018

SparkQA commented Apr 16, 2018

imatiach-msft commented Apr 16, 2018

SparkQA commented Feb 5, 2019

SparkQA commented Feb 11, 2019

holdenk left a comment

Choose a reason for hiding this comment

SparkQA commented Feb 19, 2019

imatiach-msft commented Feb 21, 2019

srowen left a comment

Choose a reason for hiding this comment

imatiach-msft commented Feb 22, 2019

srowen commented Feb 22, 2019

SparkQA commented Feb 22, 2019

SparkQA commented Feb 24, 2019

SparkQA commented Feb 25, 2019

SparkQA commented Feb 25, 2019

imatiach-msft commented Feb 25, 2019

srowen commented Feb 25, 2019