Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MLLIB] [SPARK-2222] Add multiclass evaluation metrics #1155

Closed
wants to merge 15 commits into from

Conversation

avulanov
Copy link
Contributor

Adding two classes:

  1. MulticlassMetrics implements various multiclass evaluation metrics
  2. MulticlassMetricsSuite implements unit tests for MulticlassMetrics

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@xiejuncs
Copy link

Nice work.

I am reading the implementation of MulticlassMetrics. According to your code, for Micro average, you calculate the recall and then let precision and f1 measure equal to the recall. I am not sure whether this makes sense.

According to this post: http://rushdishams.blogspot.com/2011/08/micro-and-macro-average-of-precision.html

Assume we just have three classes. For each class, we have three numbers, true positive(tp), false positive(fp), false negative(fn). Hence, we have tp1, fp1 and fn1 for class 1. so on so forth.

For Micro-Average Precision: (tp1 + tp2 + tp3) / (tp1 + tp2 + tp3 + fp1 + fp2 + fp3)
For Micro-Average Recall: (tp1 + tp2 + tp3) / (tp1 + tp2 + tp3 + fn1 + fn2 + fn3)
For Micro-Average F1Measure: it is just the harmonic mean of precision and recall.

Based on the above definition, recall and precision should not be the same. Is it correct?

@avulanov
Copy link
Contributor Author

The micro averaged Precision and Recall are equal for multiclass classifier, because sum(fni)=sum(fpi), i.e. they are just the sum of all non-diagonal elements in confusion matrix. F1-measure, as a harmonic mean of teo equal numbers, also equals to P and R. For more details please refer to the book "Introduction to IR" by Manning.

@xiejuncs
Copy link

It makes sense. You are right. sum(fni)=sum(fpi). The recall and precision are the same. Thanks very much.

@SpyderRivera
Copy link

👍

@mengxr
Copy link
Contributor

mengxr commented Jul 2, 2014

Jenkins, add to whitelist.

@mengxr
Copy link
Contributor

mengxr commented Jul 2, 2014

Jenkins, test this please.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished.

@AmplabJenkins
Copy link

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16297/

* Evaluator for multiclass classification.
* NB: type Double both for prediction and label is retained
* for compatibility with model.predict that returns Double
* and MLUtils.loadLibSVMFile that loads class labels as Double
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not necessary to mention loadLibSVMFile in particular here. This is a "global" assumption in MLlib.

@SparkQA
Copy link

SparkQA commented Jul 14, 2014

QA tests have started for PR 1155. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16619/consoleFull

@SparkQA
Copy link

SparkQA commented Jul 14, 2014

QA results for PR 1155:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds the following public classes (experimental):
class MulticlassMetrics(predictionAndLabels: RDD[(Double, Double)]) {
* (equals to precision for multiclass classifier

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16619/consoleFull

@mengxr
Copy link
Contributor

mengxr commented Jul 14, 2014

@avulanov In Scala, "for" is slower than "while". See https://issues.scala-lang.org/browse/SI-1338 for example. So please replace the for loop with two while loops in your implementation.

* as in "labels"
*/
lazy val confusionMatrix: Matrix = {
val transposedFlatMatrix = Array.ofDim[Double](labels.size * labels.size)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Save labels.size to n? Btw, I'm not sure whether we should use lazy val here because the result matrix could be 1000x1000, different from other lazy vals used here.

@avulanov
Copy link
Contributor Author

@mengxr I've addressed your comments. Thanks for pointing me to the Scala issue

@SparkQA
Copy link

SparkQA commented Jul 15, 2014

QA tests have started for PR 1155. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16670/consoleFull

@mengxr
Copy link
Contributor

mengxr commented Jul 15, 2014

@avulanov I made some minor updates and send you a PR at avulanov#1 . If it looks good to you, please merge that PR and the changes should show up here. Thanks!

@avulanov
Copy link
Contributor Author

@mengxr done!

@SparkQA
Copy link

SparkQA commented Jul 15, 2014

QA tests have started for PR 1155. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16671/consoleFull

@SparkQA
Copy link

SparkQA commented Jul 15, 2014

QA results for PR 1155:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds the following public classes (experimental):
class MulticlassMetrics(predictionAndLabels: RDD[(Double, Double)]) {
* (equals to precision for multiclass classifier

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16671/consoleFull

@asfgit asfgit closed this in 04b01bb Jul 15, 2014
@mengxr
Copy link
Contributor

mengxr commented Jul 15, 2014

Merged. Thanks for your contribution!

@avulanov
Copy link
Contributor Author

Thanks! I'll be glad to contribute more.

xiliu82 pushed a commit to xiliu82/spark that referenced this pull request Sep 4, 2014
Adding two classes:
1) MulticlassMetrics implements various multiclass evaluation metrics
2) MulticlassMetricsSuite implements unit tests for MulticlassMetrics

Author: Alexander Ulanov <[email protected]>
Author: unknown <[email protected]>
Author: Xiangrui Meng <[email protected]>

Closes apache#1155 from avulanov/master and squashes the following commits:

2eae80f [Alexander Ulanov] Merge pull request apache#1 from mengxr/avulanov-master
5ebeb08 [Xiangrui Meng] minor updates
79c3555 [Alexander Ulanov] Addressing reviewers comments mengxr
0fa9511 [Alexander Ulanov] Addressing reviewers comments mengxr
f0dadc9 [Alexander Ulanov] Addressing reviewers comments mengxr
4811378 [Alexander Ulanov] Removing println
87fb11f [Alexander Ulanov] Addressing reviewers comments mengxr. Added confusion matrix
e3db569 [Alexander Ulanov] Addressing reviewers comments mengxr. Added true positive rate and false positive rate. Test suite code style.
a7e8bf0 [Alexander Ulanov] Addressing reviewers comments mengxr
c3a77ad [Alexander Ulanov] Addressing reviewers comments mengxr
e2c91c3 [Alexander Ulanov] Fixes to mutliclass metics
d5ce981 [unknown] Comments about Double
a5c8ba4 [unknown] Unit tests. Class rename
fcee82d [unknown] Unit tests. Class rename
d535d62 [unknown] Multiclass evaluation
asfgit pushed a commit that referenced this pull request Nov 1, 2014
Implementation of various multi-label classification measures, including: Hamming-loss, strict and default Accuracy, macro-averaged Precision, Recall and F1-measure based on documents and labels, micro-averaged measures: https://issues.apache.org/jira/browse/SPARK-2329

Multi-class measures are currently in the following pull request: #1155

Author: Alexander Ulanov <[email protected]>
Author: avulanov <[email protected]>

Closes #1270 from avulanov/multilabelmetrics and squashes the following commits:

fc8175e [Alexander Ulanov] Merge with previous updates
43a613e [Alexander Ulanov] Addressing reviewers comments: change Set to Array
517a594 [avulanov] Addressing reviewers comments: Scala style
cf4222b [avulanov] Addressing reviewers comments: renaming. Added label method that returns the list of labels
1843f73 [Alexander Ulanov] Scala style fix
79e8476 [Alexander Ulanov] Replacing fold(_ + _) with sum as suggested by srowen
ca46765 [Alexander Ulanov] Cosmetic changes: Apache header and parameter explanation
40593f5 [Alexander Ulanov] Multi-label metrics: Hamming-loss, strict and normal accuracy, fix to macro measures, bunch of tests
ad62df0 [Alexander Ulanov] Comments and scala style check
154164b [Alexander Ulanov] Multilabel evaluation metics and tests: macro precision and recall averaged by docs, micro and per-class precision and recall averaged by class
@tolgap
Copy link

tolgap commented Jan 19, 2015

@avulanov You have added a class called MulticlassMetrics, but I do not understand how it operates on multiclass classification? I would understand the usage if it accepts RDD[(Vector, Vector)], but it uses RDD[(Double, Double)] and that seems to me like binary classification?

Can you give me an example for... say: the MNIST dataset (10 output neurons). Thanks!

@avulanov
Copy link
Contributor Author

@tolgap As documentation suggests, MulticlassMetrics accepts predictionAndLabels, an RDD of (prediction, label) pairs, where prediction is the predicted class/label, label is the actual class/label.

For example:

import org.apache.spark.mllib.util.MLUtils
import org.apache.spark.mllib.classification.ANNClassifier
import org.apache.spark.mllib.evaluation.MulticlassMetrics

/* load mnist data */
val data = MLUtils.loadLibSVMFile(sc, "mnist_file_in_svm_format")
val split = data.randomSplit(Array(0.9, 0.1), 11L)
val training = split(0)
val test = split(1)
/* train ANN with hidden layer of 32 neurons */
/* (input and output layer sizes will be derived from the data) */
val model = ANNClassifier.train(train, Array[Int](32), 40, 1.0, 1e-4)
val predictionAndLabels = test.map( lp => (model.predict(lp.features), lp.label))
val metrics = new MulticlassMetrics(predictionAndLabels)
println("Accuracy:" + metrics.precision)

@tolgap
Copy link

tolgap commented Jan 21, 2015

@avulanov How many neurons does the output layer have in this case? 1 or 10? Because my current implementation has an output layer of 10 neurons, e.g:

val output = Array[Double](7.466E-4, 4.16464E-9, 0.0, 0.0, 0.99462, /*..*/)

In this case, this example has the highest probability of being the digit 4 (fifth element has highest probability).

@avulanov
Copy link
Contributor Author

@tolgap ANNClassifier will create 10 output neurons for mnist, 10 is the number of distinct labels derived from the data. Each class usually is encoded with a separate output neuron, especially when there are no explicit relations (or ordering) between classes. If you wish to learn more, there is a good explanation here: http://www.faqs.org/faqs/ai-faq/neural-nets/part2/index.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants