Cannot use `average="none"` in v0.11? #1425

carbocation · 2023-01-03T23:20:56Z

🐛 Bug

Using version 0.11 and trying to use Dice with average="none", I get the following error:

ValueError: The `reduce` none is not valid.

Additional context

There is a check for the allowed average values in the Dice class. This permits "none".

However, comparing v0.11 and v0.10.3, it looks like a second check was added further down in the class during this PR.

if average not in ["micro", "macro", "samples"]:
            raise ValueError(f"The `reduce` {average} is not valid.")

Since that doesn't check for average of "none", and since average isn't redefined above it, I think that secondary check added in #1252 causes the bug.

The text was updated successfully, but these errors were encountered:

github-actions · 2023-01-03T23:21:36Z

Hi! thanks for your contribution!, great first issue!

blazejdolicki · 2023-01-23T13:20:29Z

Hey,

Is there any progress on this? :)

I stumbled upon the same problem and needed a quick fix, so I dug a bit deeper. It seems more complicated than removing this if statement. For TLDR, I wrote a quick and dirty fix which allows to instantiate the Dice class with average set to weighted or none without any errors. However, I did not check if the computed score is actually correct. Feel free to check it out:
blazejdolicki@5a22643

Furthermore, I don't understand some changes made by #1252 in __init__ of Dice. For instance, right now it assigns kwargs["reduce"] only after calling super().init(**kwargs) so essentially the new value of kwargs["reduce"] is never used. I have very little idea about the library implementation, but this doesn't seem right. Looking at this PR it seems that "weighted" and "none" average weren't taken into account at all in the changes.

Please let me know if I'm missing something.

blazejdolicki · 2023-01-24T12:38:12Z

Update: When updating the metric state, I ran into a new error regarding shape mismatch of self.tp. It seems that I needed to change how self.reduce was initialized. My guess is that self.reduce used to be initialized in the super class (Metric), but in #1252 it was moved to Dice class, but with incorrect initalization. Here is a commit with the update: blazejdolicki@f94b9e1
Now the training script finishes without errors.

SkafteNicki · 2023-01-31T13:51:41Z

Hi everyone, thanks for reporting this issue.
We are actually in the process of completely deprecating Dice as it is mathematically the same as F1Score. Therefore please use that metric instead. We have some work thats been going on in PR #1090 (that is sadly a bit stagnated at the moment) that is going to deprecate Dice but instead add GeneralizedDice as an metric.

Jesse-Phitidis · 2023-02-17T20:38:34Z

Hi everyone, thanks for reporting this issue. We are actually in the process of completely deprecating Dice as it is mathematically the same as F1Score. Therefore please use that metric instead. We have some work thats been going on in PR #1090 (that is sadly a bit stagnated at the moment) that is going to deprecate Dice but instead add GeneralizedDice as an metric.

I cannot get the expected Dice result using MulticlassF1Score. See below this toy example where I calculate the Dice for class 1 and 2 (ignoring 0). My implementation and the MONAI implementation agree, but the F1 score does not (and also still returns a tensor with length of classes including 0).

from monai.metrics import DiceMetric as dice
from torchmetrics.classification import MulticlassF1Score as F1_score
import torch
from torch.nn.functional import one_hot

calc = dice(include_background=False, reduction='mean_batch')
F1 = F1_score(num_classes=3, multidim_average='global', average='none', ignore_index=0)

def with_monai(pred, gt, calc):
    pred = one_hot(pred, num_classes=3)
    gt = one_hot(gt, num_classes=3)
    pred = torch.permute(pred, (0,3,1,2))
    gt = torch.permute(gt, (0,3,1,2))
    return calc(pred, gt)

def my_dice(pred, gt):
    pred1map = (pred==1).type(torch.int64)
    pred2map = (pred==2).type(torch.int64)
    gt1map = (gt==1).type(torch.int64)
    gt2map = (gt==2).type(torch.int64)
    c1 = 2*torch.sum(pred1map * gt1map)/ (torch.sum(pred1map) + torch.sum(gt1map))
    c2 = 2*torch.sum(pred2map * gt2map)/ (torch.sum(pred2map) + torch.sum(gt2map))
    return torch.tensor([c1, c2])

test_pred = torch.tensor([[[1, 1], [2, 0]]]).type(torch.int64)
test_gt = torch.tensor([[[1, 0], [2, 1]]]).type(torch.int64)

print(test_pred.shape)
# torch.Size([1, 2, 2])

print(test_pred)
# tensor([[[1, 1],
           [2, 0]]])

print(test_gt)
# tensor([[[1, 0],
           [2, 1]]])

print(F1(test_pred, test_gt))
# tensor([0.0000, 0.6667, 1.0000])

print(with_monai(test_pred, test_gt, calc))
# tensor([[0.5000, 1.0000]])

print(my_dice(test_pred, test_gt))
# tensor([0.5000, 1.0000])

Jesse-Phitidis · 2023-09-13T11:04:27Z

Okay, I have figured it out! The issue is due to my misunderstanding of the ignore_index argument. Ignoring index 0 actually changes the results for index 1 and 2, because the interpretation is "if pred OR gt is 0, do not consider this voxel", so false positive in the background and false negatives where we predict background do not get considered in dice score. I would have expected the ignore_index option to simply not calculate metrics for the ignored index (and maybe output a tensor with one fewer elements e.g. [index=x1, index2=y] not [index0=0, index1=x, index2=y]).

…

On Fri, Jul 28, 2023 at 8:46 AM Jan Karwowski ***@***.***> wrote: I think that you should set multidim_average="samplewise" to achieve the same results. — Reply to this email directly, view it on GitHub <#1425 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AZWRUXD2D547DWTQ6N3N7MTXSNU4FANCNFSM6AAAAAATQF45XI> . You are receiving this because you commented.Message ID: ***@***.***>

Borda · 2023-09-14T14:55:20Z

The issue is due to my misunderstanding of the ignore_index argument.

Would you think that some rewording is needed?

Jesse-Phitidis · 2023-09-14T16:43:20Z

Personally, I can't think of any situations where the current behaviour is useful (but perhaps this is just biases from my own field of work). I think that this behaviour would not be the expected behaviour for most people and unless they were to carefully test the function, they wouldn't notice what is actually happening and would have inflated metrics. My vote would be to change the computation, but if not, then yes, rewording would be helpful I think.

…

On Thu, 14 Sept 2023, 15:55 Jirka Borovec, ***@***.***> wrote: The issue is due to my misunderstanding of the ignore_index argument. Would you think that some rewording is needed? — Reply to this email directly, view it on GitHub <#1425 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AZWRUXDUIZ3A7KECBXTFWZTX2MLGFANCNFSM6AAAAAATQF45XI> . You are receiving this because you commented.Message ID: ***@***.***>

lirfu · 2023-09-25T13:08:56Z

I can confirm same behaviour of ignore_index for other multiclass classification metrics (like precision, recall, f1) on v1.2.0.

When using the ignore_index value, that class is simply filtered out from the labels, so the corresponding row of the confusion matrix is zero. Filtering is done here: stat_scores.py

When calculating with macro averaging, the value is biased towards 0 because the TP of ignored class is 0, leading to overall score of 0. Averaging is performed here: compute.py

The weighted averaging removes this problem because the weight of the ignored class is set to 0, but then the other classes are re-weighted as well according to their frequency.

I agree with @Jesse-Phitidis, the current implementation either accounts for all classes (bias towards class imbalance) or doesn't account for false-positives (promotes trivial classifiers). The parameter ignore_class sounds like it ignores that class from the final averaging in macro or weighted (which would give the more complete result), but actually just ignores any TP&FP for that class: If y=0 and model says 1, that is ok, (but not vice-versa).

My suggestion is to keep the current ignore_index (with more elaborate docs on its effect) for backwards compatibility of any unforseen usages people might have for it and add a new parameter background_index to stop the background class from contributing to the final averaging macro and weighted, while keeping its influence on the FPs. Combining the two parameters also fixes the bias towards 0 of the current implementation.

carbocation added bug / fix Something isn't working help wanted Extra attention is needed labels Jan 3, 2023

Borda assigned justusschock Jan 4, 2023

Borda added question Further information is requested and removed bug / fix Something isn't working labels Jan 4, 2023

SkafteNicki added this to the v1.2.0 milestone Aug 18, 2023

SkafteNicki modified the milestones: v1.2.0, v1.3.0 Aug 30, 2023

Borda changed the title ~~Cannot use average="none" in version 0.11?~~ Cannot use average="none" in v0.11? Sep 14, 2023

Borda modified the milestones: v1.3.0, v1.4.0, v1.3.x Jan 11, 2024

SkafteNicki linked a pull request Sep 8, 2024 that will close this issue

Deprecate Dice from classification and re-add to segmentation #2725

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot use `average="none"` in v0.11? #1425

Cannot use `average="none"` in v0.11? #1425

carbocation commented Jan 3, 2023 •

edited

Loading

github-actions bot commented Jan 3, 2023

blazejdolicki commented Jan 23, 2023

blazejdolicki commented Jan 24, 2023

SkafteNicki commented Jan 31, 2023

Jesse-Phitidis commented Feb 17, 2023 •

edited by Borda

Loading

Jesse-Phitidis commented Sep 13, 2023 via email •

edited

Loading

Borda commented Sep 14, 2023

Jesse-Phitidis commented Sep 14, 2023 via email

lirfu commented Sep 25, 2023

Cannot use average="none" in v0.11? #1425

Cannot use average="none" in v0.11? #1425

Comments

carbocation commented Jan 3, 2023 • edited Loading

🐛 Bug

Additional context

github-actions bot commented Jan 3, 2023

blazejdolicki commented Jan 23, 2023

blazejdolicki commented Jan 24, 2023

SkafteNicki commented Jan 31, 2023

Jesse-Phitidis commented Feb 17, 2023 • edited by Borda Loading

Jesse-Phitidis commented Sep 13, 2023 via email • edited Loading

Borda commented Sep 14, 2023

Jesse-Phitidis commented Sep 14, 2023 via email

lirfu commented Sep 25, 2023

Cannot use `average="none"` in v0.11? #1425

Cannot use `average="none"` in v0.11? #1425

carbocation commented Jan 3, 2023 •

edited

Loading

Jesse-Phitidis commented Feb 17, 2023 •

edited by Borda

Loading

Jesse-Phitidis commented Sep 13, 2023 via email •

edited

Loading