Skip to content
This repository has been archived by the owner on Mar 12, 2024. It is now read-only.

Confusion about num_classes #108

Closed
dvd42 opened this issue Jun 26, 2020 · 8 comments
Closed

Confusion about num_classes #108

dvd42 opened this issue Jun 26, 2020 · 8 comments
Labels
question Further information is requested

Comments

@dvd42
Copy link

dvd42 commented Jun 26, 2020

Hi, I was looking through the code, and other posted issues and it is still not clear to me what the number of classes should be. For coco it is set to 91 (90 + 1 for the no-object class) as explained here. However as seen in the code that builds the model:

detr/models/detr.py

Lines 36 to 38 in 10a2c75

hidden_dim = transformer.d_model
self.class_embed = nn.Linear(hidden_dim, num_classes + 1)
self.bbox_embed = MLP(hidden_dim, hidden_dim, 4, 3)

+1 is added in the classification layer for the no object class. So if I have a dataset that has X number of classes (without including the bg), what should I set the value of num_classes to be.

P.S: Thanks for this great project!!! :)

@alcinos
Copy link
Contributor

alcinos commented Jun 26, 2020

Hi @dvd42
Thank you for your interest in DETR.

The explanation you're pointing at was slightly incorrect, I fixed it.
You should always use num_classes = max_id + 1 where max_id is the highest class ID that you have in your dataset.
For example, if you have 4 classes with IDs 1, 23, 24, 56, then you will use num_classes=57. Detr will then reserve id 57 for the "no_object" class.
In general, you should try to make your ids consecutive if possible, but it doesn't really matter if there are a few "holes"

I think I have answered your question, and as such I'm closing this. Feel free to reach out if you have further concerns.

@alcinos alcinos closed this as completed Jun 26, 2020
@fmassa
Copy link
Contributor

fmassa commented Jun 26, 2020

Thanks for fixing my wrong answer @alcinos !

@woctezuma
Copy link

woctezuma commented Aug 17, 2020

Just to be clear about the +1 in the original question, I think it is only there:

>>> labels = torch.randint(1, 91, (4, 11))

So let us say that you have N labels, indexed from 1 to N (with no "hole"). You would feed num_classes equal to N+1 to DETR, so that DETR assigns the no_object class to ID equal to num_classes=N+1. Then, when it comes to nn.Linear, the +1 is there so that the output has a sufficient length, where:

  • prediction for ID n°0 is dummy,
  • predictions for ID from n°1 to n°num_classes match our convention (N objects n°1...N, plus one no_object class n°N+1).

self.class_embed = nn.Linear(hidden_dim, num_classes + 1)

A good piece of news is that the code should still work fine even if the user were to start indexing the classes at 0.
It is compatible with both conventions, as long as the parameter num_classes is actually max ID + 1, as explained above. The only issue is that the parameter name can be confusing.

@yangsenius
Copy link

yangsenius commented Sep 9, 2020

Hi @alcinos.

I have read many issues and your comments about num_classes problems, I still want to make sure if my understanding is right.

The labels of COCO dataset are from 1 to 90. So in detr, num_classes = 90+1=91 and

self.class_embed = nn.Linear(hidden_dim, num_classes + 1)

My questions are:

  1. Does this mean the self.class_embed will output a 92-dim class vector for each query? Is the first dim (i.e. 0-index of the vector) always not used for any classes, even for non-object class? (because the last dim, i.e. 91-index is for non-object?)

  2. If the answer of the question 1 is yes, I would like to ask: can we set the first dim of the class vector as the non-object class logit when the labels do not contain the 0 label id such as labels of dataset = [1,2,3,4]?
    In this way, we set num_classes=len(labels of dataset) is , and only change the self.num_class to 0 in the following code

target_classes_o = torch.cat([t["labels"][J] for t, (_, J) in zip(targets, indices)])
        target_classes = torch.full(src_logits.shape[:2], **0**,
                                    dtype=torch.int64, device=src_logits.device)
        target_classes[idx] = target_classes_o

Can it work well?

@woctezuma
Copy link

woctezuma commented Sep 9, 2020

Not Alcinos, but:

  1. Yes.

  2. The convention used by DETR has no real downside as far as I understand. Sure, it is not the most optimized solution, but:

  • the convention can deal with small gaps in the numbering of categories, without the need to keep a mapping of indices,
  • the convention works fine no matter if the first category is labelled with index n°0 or index n°1,
  • the network does not seem to suffer from a few dummy labels with zero example in the training dataset.

In terms of minimizing time spent debugging and errors/issues encountered by other users, this convention is a good trade-off.

@yangsenius
Copy link

Thanks @woctezuma.

The matcher in

cost_class = -out_prob[:, tgt_ids]

will always take all object classes predictions as cost. So I think the way I propose above will not affect the match as well.

@alcinos
Copy link
Contributor

alcinos commented Sep 9, 2020

@yangsenius As noted by @woctezuma, there is no theoretical issue with using 0 as the "no-object" label, but I personally don't see any good reason to do it.
You'll run into issues if you forget about this and label a true class with label 0.
Also note that the assumption that the "no-object" class is the last one is used throughout the code, and I'm not sure I'd be able to list all the places where this assumption is made. Some example that come to mind are the postprocessor, and most of our visualization codes. If you don't want to spend time debugging, and you don't have a strong, compelling reason to make this change, I'd suggest sticking to the current convention.

Best of luck.

@yangsenius
Copy link

@yangsenius As noted by @woctezuma, there is no theoretical issue with using 0 as the "no-object" label, but I personally don't see any good reason to do it.
You'll run into issues if you forget about this and label a true class with label 0.
Also note that the assumption that the "no-object" class is the last one is used throughout the code, and I'm not sure I'd be able to list all the places where this assumption is made. Some example that come to mind are the postprocessor, and most of our visualization codes. If you don't want to spend time debugging, and you don't have a strong, compelling reason to make this change, I'd suggest sticking to the current convention.

Best of luck.

Very thanks for your suggestions!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

5 participants