[SPARK-6332] [MLlib] compute calibration curve for binary classifier #5025

robert-dodier · 2015-03-14T02:19:34Z

This PR contains an implementation of a calibration method in the class BinaryClassificationMetrics. The code was adapted from the method for ROC curve construction. Tests on small data sets have been added to BinaryClassificationMetricsSuite, and the current version of the code passes those tests.

In this implementation, the return value of the new method is an RDD[((Double, Double), (Double, Long))]. The first pair describes each bin and the second pair describes the content of each bin. In the first pair, the two values are the least and greatest scores in the bin. In the second pair, the two values are the proportion of positive examples in the bin, and the number of examples in the bin. I chose this representation in order to keep as much information as possible. However, a simpler representation might be better; let's talk about that if anyone is interested.

jkbradley · 2015-03-14T17:36:10Z

@robert-dodier Thanks for the PR! I added a couple of clarification questions to the JIRA

jkbradley · 2015-04-27T18:53:50Z

ok to test

AmplabJenkins · 2015-07-13T21:54:28Z

Can one of the admins verify this patch?

feynmanliang · 2015-08-31T23:13:20Z

mllib/src/main/scala/org/apache/spark/ml/classification/ProbabilisticClassifier.scala

@@ -141,6 +141,7 @@ private[spark] abstract class ProbabilisticClassificationModel[
   *
   * WARNING: Not all models output well-calibrated probability estimates!  These probabilities
   *          should be treated as confidences, not precise probabilities.
+   *          See also BinaryClassificationMetrics.calibration to assess calibration.


"[[DoubleBrackets]]" to generate API doc link

feynmanliang · 2015-08-31T23:33:07Z

Can you resolve merge conflicts?

I made a quick pass for style and organization; did not check correctness. Overall it looks like there is quite a bit of repeated code between here and cumulativeCounts/confusions which we may be able to refactor out

o ProbabilisticClassifier.scala: mention calibration in comments o BinaryClassificationMetrics.scala: adapting code for ROC to calibration; incomplete and commented out for now o BinaryClassificationMetricsSuite.scala: tests for calibration

types to what calibration actually returns.

…o others.

same as spark master again.

rxin · 2015-12-31T02:43:05Z

I'm going to close this pull request. If this is still relevant and you are interested in pushing it forward, please open a new pull request. Thanks!

feynmanliang reviewed Aug 31, 2015
View reviewed changes

Robert Dodier added 7 commits October 5, 2015 15:40

Initial attempt to implement calibration; compiles, not tested yet.

1df8619

Change (..., (Double, Int)) to (..., (Double, Long)) to match

0769ee6

types to what calibration actually returns.

Adjust JVM command line to get tests to run.

bf682c0

Adjust bin size to prevent final bin from being very small compared t…

967e961

…o others.

Revert local changes to pom.xml.

4281f55

Adjust command line arguments to make my pom.xml the

6cf1e2c

same as spark master again.

robert-dodier force-pushed the master branch from c526891 to 6cf1e2c Compare October 5, 2015 23:15

asfgit closed this in 7b4452b Dec 31, 2015

robert-dodier mentioned this pull request Jan 8, 2016

[SPARK-6332] [MLlib] compute calibration curve for binary classifiers #10666

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-6332] [MLlib] compute calibration curve for binary classifier #5025

[SPARK-6332] [MLlib] compute calibration curve for binary classifier #5025

robert-dodier commented Mar 14, 2015

jkbradley commented Mar 14, 2015

jkbradley commented Apr 27, 2015

AmplabJenkins commented Jul 13, 2015

feynmanliang Aug 31, 2015

feynmanliang commented Aug 31, 2015

rxin commented Dec 31, 2015

[SPARK-6332] [MLlib] compute calibration curve for binary classifier #5025

[SPARK-6332] [MLlib] compute calibration curve for binary classifier #5025

Conversation

robert-dodier commented Mar 14, 2015

jkbradley commented Mar 14, 2015

jkbradley commented Apr 27, 2015

AmplabJenkins commented Jul 13, 2015

feynmanliang Aug 31, 2015

Choose a reason for hiding this comment

feynmanliang commented Aug 31, 2015

rxin commented Dec 31, 2015