-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Conversation
Hey @acphile , Thanks for submitting the PR
CI supported jobs: [centos-cpu, sanity, miscellaneous, centos-gpu, windows-cpu, website, unix-gpu, edge, windows-gpu, clang, unix-cpu] Note: |
python/mxnet/gluon/metric.py
Outdated
for label, pred in zip(labels, preds): | ||
self.metrics.update_binary_stats(label, pred) | ||
|
||
if self.average == "macro": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, macro averaging
+ F1 does not mean to average the F1 of each batch. I think we should revise it to be the same as https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Averaging F1 per batch here previously existed in metric.py before I made changes. This calculation also exists in MAE, MSE, RMSE, and PearsonCorrelation. Should I remove all of them accordingly? For the average "macro" in sklearn, it seems used in calculating F1 score for multiclass/multilabel targets. But currently our F1 only supports binary classification. I think I need to make extensions for F1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's remove it and make it similar to sklearn. This is in fact the reason why I never use the metric class in MXNet.
python/mxnet/gluon/metric.py
Outdated
|
||
mae = numpy.abs(label - pred).mean() | ||
|
||
if self.average == "macro": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's very strange to have macro
+ MAE. See scikit-learn: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.mean_absolute_error.html#sklearn.metrics.mean_absolute_error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @acphile! Some comments
-
@sxjscience suggested compatibility with sklearn's metrics. If so, we should have a mechanism to ensure compatibility / correctness. One way to ensure this is to compare to add tests that compare the output of sklearn to the output of the gluon metric for different inputs. Such test may even include random data to ensure compatibility in edge cases (cf https://en.wikipedia.org/wiki/Fuzzing)
-
We currently support
get()
vs.get_global()
,reset()
vs.reset_local()
, but in fact the global functionality is not used anywhere in MXNet and there may not be a good widely used use-case for it. To make our metric API more pythonic and easier to understand, we may remove theglobal
support. -
@sxjscience suggests to remove the macro support because it's not correct and not widely used
-
Your code needs to pass the sanity checks for coding style http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Fsanity/detail/PR-18083/1/pipeline
python/mxnet/gluon/metric.py
Outdated
return 2 * self.precision * self.recall / (self.precision + self.recall) | ||
else: | ||
return 0. | ||
return (1 + self.beta ** 2) * self.precision * self.recall / numpy.maximum(self.beta ** 2 * self.precision + self.recall, 1e-12) | ||
|
||
@property | ||
def global_fscore(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method should be removed as you dropped the global states?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method actually refers to the micro calculation for F1 and it is not related to original global support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to adjust the name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is ok to use global_fscore since it is in a private container class.
python/mxnet/gluon/metric.py
Outdated
@@ -24,9 +24,9 @@ | |||
|
|||
import numpy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of using numpy, we can use mxnet.numpy
as it runs asynchronously and has GPU support.
The summary states of a metric should be stored on CPU, but if for example data and label are on GPU and input to the metric, we can calculate the sufficient statistics on GPU
python/mxnet/gluon/metric.py
Outdated
label = label.as_np_ndarray().astype('int32') | ||
if self.class_type == "binary": | ||
self._set(1) | ||
if len(numpy.unique(label)) > 2: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will trigger synchronization (as we need to wait for the result of the np.unique operator). Could we make error checking that triggers synchronization optional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mxnet-bot run ci [centos-cpu, sanity, centos-gpu, windows-cpu, unix-gpu, windows-gpu, unix-cpu] |
Jenkins CI successfully triggered : [centos-gpu, centos-cpu, windows-gpu, unix-cpu, sanity, unix-gpu, windows-cpu] |
Let's disable it here, because it blocks this PR |
You can go ahead and try reproduce the issue locally: The reproducer is available since more than a month at #17886 (comment) |
No, it's unrelated and should be a separated and isolated PR. Each PR should serve one propose. That way, we can focus discussions, have single purpose commits and also allow reverting |
Let's disable it in #18204 |
@mxnet-bot run ci [unix-gpu] |
Jenkins CI successfully triggered : [unix-gpu] |
@mxnet-bot run ci [unix-cpu, windows-gpu] |
Jenkins CI successfully triggered : [unix-cpu, windows-gpu] |
tests/python/tensorrt/test_cvnets.py
Outdated
@@ -29,7 +28,12 @@ | |||
def get_classif_model(model_name, use_tensorrt, ctx=mx.gpu(0), batch_size=128): | |||
mx.contrib.tensorrt.set_use_fp16(False) | |||
h, w = 32, 32 | |||
net = gluoncv.model_zoo.get_model(model_name, pretrained=True) | |||
model_url = "https://raw.githubusercontent.com/dmlc/web-data/master/gluoncv/models/" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't hardcode master
in the URL here. The repository may change and will then break the CI. Instead, use the commit ID: https://raw.githubusercontent.com/dmlc/web-data/221ce5b7c6d5b0777a1e3471f7f03ff98da90a0a/gluoncv/models
@mxnet-bot run ci [windows-gpu] |
Jenkins CI successfully triggered : [windows-gpu] |
@mxnet-bot run ci [unix-cpu] |
Jenkins CI successfully triggered : [unix-cpu] |
@acphile this PR fails nightly CD while running nightly python unit tests. The following tests fail: We'll need to revert this PR and fix the failures before re-merging it. Here's the link to revert PR: #18318 |
This reverts commit effbb8b.
The reason it fails is that the CI check in this PR was run too long ago. I should have restarted it before merging the PR. Meantime master changed and caused some additional changes to be necessary. They are in https://github.com/apache/incubator-mxnet/pull/18312/files |
* finish 5 changes * move metric.py to gluon, replace mx.metric with mx.gluon.metric in python/mxnet/ * fix importError * replace mx.metric with mx.gluon.metric in tests/python * remove global support * remove macro support * rewrite BinaryAccuracy * extend F1 to multiclass/multilabel * add tests for new F1, remove global tests * use mxnet.numpy instead of numpy * fix sanity * rewrite ce and ppl, improve some details * use mxnet.numpy.float64 * remove sklearn * remove reset_local() and get_global in other files * fix test_mlp * replace mx.metric with mx.gluon.metric in example * fix context difference * Disable -DUSE_TVM_OP on GPU builds * Fix disable tvm op for gpu runs * use label.ctx in metric.py; remove gluoncv dependency in test_cvnets * fix sanity * fix importError * remove nose Co-authored-by: Ubuntu <[email protected]> Co-authored-by: Leonard Lausen <[email protected]>
This reverts commit effbb8b.
Description
change based on #18046
a. improve Class MAE (and MSE, RMSE)
b. improve Class _BinaryClassification
c. improve Class TopKAccuracy
d. add Class MeanCosineSimilarity
e. add Class MeanPairwiseDistance
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.
Changes
Comments