Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-18693][ML][MLLIB] ML Evaluators should use weight column #16557

Closed

Conversation

imatiach-msft
Copy link
Contributor

What changes were proposed in this pull request?

The evaluators BinaryClassificationEvaluator, RegressionEvaluator, and MulticlassClassificationEvaluator and the corresponding metrics classes BinaryClassificationMetrics, RegressionMetrics and MulticlassMetrics should use sample weight data.
The updates to the regression metrics were based on (and updated with new changes based on comments):
https://issues.apache.org/jira/browse/SPARK-11520
("RegressionMetrics should support instance weights")
but the pull request was closed as the changes were never checked in.

How was this patch tested?

This is still a work in progress, I will be adding more tests soon. I took the regression tests from:
#9907
Which was closed as a stale PR but I updated it with some changes.

@SparkQA
Copy link

SparkQA commented Jan 12, 2017

Test build #71247 has finished for PR 16557 at commit 2b624c0.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class BinaryClassificationMetrics @Since(\"2.2.0\") (
  • class MulticlassMetrics @Since(\"1.1.0\") (predAndLabelsWithOptWeight: RDD[_])

@SparkQA
Copy link

SparkQA commented Jan 12, 2017

Test build #71270 has finished for PR 16557 at commit f6d6fef.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@imatiach-msft
Copy link
Contributor Author

it looks like a random test timed out:
org.apache.spark.scheduler.BasicSchedulerIntegrationSuite.job with fetch failure
Error Details
java.util.concurrent.TimeoutException: Futures timed out after [1 second]
I will try to run again.

@imatiach-msft
Copy link
Contributor Author

Jenkins retest this please

@imatiach-msft
Copy link
Contributor Author

Jenkins, retest this please

@SparkQA
Copy link

SparkQA commented Jan 12, 2017

Test build #71272 has finished for PR 16557 at commit f6d6fef.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 13, 2017

Test build #71278 has finished for PR 16557 at commit 54fe922.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 13, 2017

Test build #71291 has started for PR 16557 at commit 397c26b.

@imatiach-msft
Copy link
Contributor Author

Jenkins, retest this please

1 similar comment
@imatiach-msft
Copy link
Contributor Author

Jenkins, retest this please

@imatiach-msft imatiach-msft force-pushed the ilmat/evaluate-with-weights branch from 397c26b to 808ca6b Compare January 13, 2017 15:55
@imatiach-msft
Copy link
Contributor Author

Jenkins, retest this please

@imatiach-msft
Copy link
Contributor Author

Jenkins doesn't seem to be working ...

@imatiach-msft
Copy link
Contributor Author

@sethah @Lewuathe @thunterdb @WeichenXu123 @jkbradley would you be able to take a look at the changes to add a weight column to binary/multiclass/regression evaluators/metrics classes? It looks like you are familiar with this code. Thank you!

@imatiach-msft
Copy link
Contributor Author

Jenkins, retest this please

@SparkQA
Copy link

SparkQA commented Jan 13, 2017

Test build #71331 has finished for PR 16557 at commit 808ca6b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class BinaryClassificationMetrics @Since(\"2.2.0\") (
  • class MulticlassMetrics @Since(\"1.1.0\") (predAndLabelsWithOptWeight: RDD[_])

@SparkQA
Copy link

SparkQA commented Jan 13, 2017

Test build #71334 has finished for PR 16557 at commit 808ca6b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class BinaryClassificationMetrics @Since(\"2.2.0\") (
  • class MulticlassMetrics @Since(\"1.1.0\") (predAndLabelsWithOptWeight: RDD[_])

@imatiach-msft imatiach-msft changed the title [SPARK-18693][ML][MLLIB][WIP] ML Evaluators should use weight column [SPARK-18693][ML][MLLIB] ML Evaluators should use weight column Jan 13, 2017
@imatiach-msft imatiach-msft force-pushed the ilmat/evaluate-with-weights branch from 808ca6b to 228bbfb Compare January 17, 2017 18:54
@imatiach-msft
Copy link
Contributor Author

Jenkins, retest this please

@SparkQA
Copy link

SparkQA commented Jan 17, 2017

Test build #71523 has finished for PR 16557 at commit 228bbfb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class BinaryClassificationMetrics @Since(\"2.2.0\") (
  • class MulticlassMetrics @Since(\"1.1.0\") (predAndLabelsWithOptWeight: RDD[_])

@SparkQA
Copy link

SparkQA commented Jan 17, 2017

Test build #71531 has finished for PR 16557 at commit e2873f2.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@imatiach-msft
Copy link
Contributor Author

Jenkins, retest this please

@SparkQA
Copy link

SparkQA commented Jan 18, 2017

Test build #71544 has finished for PR 16557 at commit e2873f2.

  • This patch fails build dependency tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@imatiach-msft
Copy link
Contributor Author

Jenkins, retest this please

1 similar comment
@imatiach-msft
Copy link
Contributor Author

Jenkins, retest this please

@imatiach-msft imatiach-msft force-pushed the ilmat/evaluate-with-weights branch from e2873f2 to d589dfe Compare January 18, 2017 15:39
@SparkQA
Copy link

SparkQA commented Jan 18, 2017

Test build #71604 has finished for PR 16557 at commit e2873f2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 18, 2017

Test build #71607 has finished for PR 16557 at commit d589dfe.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 19, 2017

Test build #71670 has finished for PR 16557 at commit c1f5f09.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class SparkListenerExecutorBlacklisted(
  • case class SparkListenerExecutorUnblacklisted(time: Long, executorId: String)
  • case class SparkListenerNodeBlacklisted(
  • case class SparkListenerNodeUnblacklisted(time: Long, hostId: String)
  • case class QualifiedTableName(database: String, name: String)
  • class FindHiveSerdeTable(session: SparkSession) extends Rule[LogicalPlan]

@imatiach-msft imatiach-msft force-pushed the ilmat/evaluate-with-weights branch from c1f5f09 to ba68f72 Compare January 23, 2017 16:28
@imatiach-msft
Copy link
Contributor Author

Jenkins, retest this please

@imatiach-msft
Copy link
Contributor Author

ping @sethah @Lewuathe @thunterdb @WeichenXu123 @jkbradley would you be able to take a look at the changes to add a weight column to binary/multiclass/regression evaluators/metrics classes? It looks like you are familiar with this code. Thank you!

@SparkQA
Copy link

SparkQA commented Jan 23, 2017

Test build #71858 has finished for PR 16557 at commit ba68f72.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 23, 2017

Test build #71860 has finished for PR 16557 at commit 6dbb0ad.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@imatiach-msft
Copy link
Contributor Author

ping @sethah @Lewuathe @thunterdb @WeichenXu123 @jkbradley would you be able to take a look at the changes to add a weight column to binary/multiclass/regression evaluators/metrics classes? It looks like you are familiar with this code. Thank you!

1 similar comment
@imatiach-msft
Copy link
Contributor Author

ping @sethah @Lewuathe @thunterdb @WeichenXu123 @jkbradley would you be able to take a look at the changes to add a weight column to binary/multiclass/regression evaluators/metrics classes? It looks like you are familiar with this code. Thank you!

@imatiach-msft
Copy link
Contributor Author

@srowen @yanboliang might you be able to take a look at this PR? Is it possibly too large and I should break it up into 3 PRs, one per evaluator/metrics class?

@srowen
Copy link
Member

srowen commented Jan 26, 2017

I wouldn't ping that frequently, please. I don't feel qualified to review this myself.

@sethah
Copy link
Contributor

sethah commented Jan 26, 2017

+1 for breaking it up, maybe starting with regression. Also, just because something hasn't been reviewed in two weeks does not mean that there is no interest in it. Two weeks is not all that long (I've seen valuable PRs sit for much longer than that) and it likely just means people are busy. As Sean pointed out, it's probably not necessary to ping once per day.

@thunterdb
Copy link
Contributor

I agree, let's break this PR. It will go faster, and some changes may require longer discussions.

@imatiach-msft imatiach-msft force-pushed the ilmat/evaluate-with-weights branch from 6dbb0ad to a0fc4c3 Compare February 27, 2017 18:03
@imatiach-msft
Copy link
Contributor Author

ok, I will close this and create three new PRs, one for each of the evaluators

@imatiach-msft
Copy link
Contributor Author

I've created 3 PRs, located here:
#17084
#17085
#17086

@SparkQA
Copy link

SparkQA commented Feb 27, 2017

Test build #73525 has finished for PR 16557 at commit a0fc4c3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

asfgit pushed a commit that referenced this pull request Nov 9, 2018
…ed weight column for multiclass classification evaluator

## What changes were proposed in this pull request?

The evaluators BinaryClassificationEvaluator, RegressionEvaluator, and MulticlassClassificationEvaluator and the corresponding metrics classes BinaryClassificationMetrics, RegressionMetrics and MulticlassMetrics should use sample weight data.

I've closed the PR: #16557
 as recommended in favor of creating three pull requests, one for each of the evaluators (binary/regression/multiclass) to make it easier to review/update.

Note: I've updated the JIRA to:
https://issues.apache.org/jira/browse/SPARK-24101
Which is a child of JIRA:
https://issues.apache.org/jira/browse/SPARK-18693

## How was this patch tested?

I added tests to the metrics class.

Closes #17086 from imatiach-msft/ilmat/multiclass-evaluate.

Authored-by: Ilya Matiach <[email protected]>
Signed-off-by: Sean Owen <[email protected]>
holdenk pushed a commit to holdenk/spark that referenced this pull request Jan 5, 2019
…ed weight column for regression evaluator

## What changes were proposed in this pull request?

The evaluators BinaryClassificationEvaluator, RegressionEvaluator, and MulticlassClassificationEvaluator and the corresponding metrics classes BinaryClassificationMetrics, RegressionMetrics and MulticlassMetrics should use sample weight data.

I've closed the PR: apache#16557
 as recommended in favor of creating three pull requests, one for each of the evaluators (binary/regression/multiclass) to make it easier to review/update.

The updates to the regression metrics were based on (and updated with new changes based on comments):
https://issues.apache.org/jira/browse/SPARK-11520
 ("RegressionMetrics should support instance weights")
 but the pull request was closed as the changes were never checked in.

## How was this patch tested?

I added tests to the metrics class.

Closes apache#17085 from imatiach-msft/ilmat/regression-evaluate.

Authored-by: Ilya Matiach <[email protected]>
Signed-off-by: Sean Owen <[email protected]>
jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
…ed weight column for multiclass classification evaluator

## What changes were proposed in this pull request?

The evaluators BinaryClassificationEvaluator, RegressionEvaluator, and MulticlassClassificationEvaluator and the corresponding metrics classes BinaryClassificationMetrics, RegressionMetrics and MulticlassMetrics should use sample weight data.

I've closed the PR: apache#16557
 as recommended in favor of creating three pull requests, one for each of the evaluators (binary/regression/multiclass) to make it easier to review/update.

Note: I've updated the JIRA to:
https://issues.apache.org/jira/browse/SPARK-24101
Which is a child of JIRA:
https://issues.apache.org/jira/browse/SPARK-18693

## How was this patch tested?

I added tests to the metrics class.

Closes apache#17086 from imatiach-msft/ilmat/multiclass-evaluate.

Authored-by: Ilya Matiach <[email protected]>
Signed-off-by: Sean Owen <[email protected]>
jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
…ed weight column for regression evaluator

## What changes were proposed in this pull request?

The evaluators BinaryClassificationEvaluator, RegressionEvaluator, and MulticlassClassificationEvaluator and the corresponding metrics classes BinaryClassificationMetrics, RegressionMetrics and MulticlassMetrics should use sample weight data.

I've closed the PR: apache#16557
 as recommended in favor of creating three pull requests, one for each of the evaluators (binary/regression/multiclass) to make it easier to review/update.

The updates to the regression metrics were based on (and updated with new changes based on comments):
https://issues.apache.org/jira/browse/SPARK-11520
 ("RegressionMetrics should support instance weights")
 but the pull request was closed as the changes were never checked in.

## How was this patch tested?

I added tests to the metrics class.

Closes apache#17085 from imatiach-msft/ilmat/regression-evaluate.

Authored-by: Ilya Matiach <[email protected]>
Signed-off-by: Sean Owen <[email protected]>
srowen pushed a commit that referenced this pull request Feb 25, 2019
…ed weight column for binary classification evaluator

## What changes were proposed in this pull request?

The evaluators BinaryClassificationEvaluator, RegressionEvaluator, and MulticlassClassificationEvaluator and the corresponding metrics classes BinaryClassificationMetrics, RegressionMetrics and MulticlassMetrics should use sample weight data.

I've closed the PR: #16557
as recommended in favor of creating three pull requests, one for each of the evaluators (binary/regression/multiclass) to make it easier to review/update.

## How was this patch tested?
I added tests to the metrics and evaluators classes.

Closes #17084 from imatiach-msft/ilmat/binary-evalute.

Authored-by: Ilya Matiach <[email protected]>
Signed-off-by: Sean Owen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants