-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What does "HO_weight" and "binary_weight" mean? #36
Comments
The two sets of weights are used for the classification with long-tail data distribution. The more training samples a class has, its loss weight would be smaller (because it has more chances to get better, thus each update can be small). Meanwhile, the rare HOI class with fewer samples needs a larger weight. |
Thank you for your reply. It seems that these weights are used to handle the class-wise imbalance, but focal loss is designed for heavy hard/easy imbalance or pos/neg imbalance, how can it be used for this problem? I've tried to apply focal loss or GHM loss to train the binary classifier, but the results are almost the same with training with BinaryCrossEntropy loss with balanced sampling. |
Yep, the performances of various loss tricks are comparable in HICO-DET, in our experiments the log loss weight performs best for HOI classification. For extreme rare classes (many), all these tricks contribute very small. |
BTW, each HOI adopted a Sigmoid for binary classification because of the multi-label problem (i.e. one person can perform multiple actions simultaneously). Thus we computed the sum of the 600 Sigmoid cross entropies as the HOI loss. |
Thanks, your answer helps me a lot. |
Hi @DirtyHarryLYL , I'm still confused about the imbalance of pos and neg samples for each class. Since you used image-centric sampling strategy, for each training batch, all the candidate box pairs come from the same image, and you update the whole model using SigmoidCrossEntropy loss. However, it may happen that, i.e. for the first 10000 images in one epoch, there is not even one sample for HOI class n (0<n<599), so that the model would only learn from the neg samples of class n and always predict 0 for this class, which may cause the death of the classifier. How did you deal with this imbalance problem? Besides, I noticed that the model predicts a 1x2 vector to represent each binary label, why not just use a single 0 or 1 since one number is enough to represent the probability of "interactiveness"? |
In each mini-batch, i.e., samples from one image, we will input the fixed number of pos and neg human-object pairs into the model (e.g. 15 pos pairs and 60 neg pairs). We have tried both single scalar 0~1, or 1x2 vector (two probabilities for pos and neg). The results are comparable. In the initial version, we choose the 1x2 vector for the convenience of the analysis and did not change in the later versions. You could also try other output formats in your experiments. |
Thank you. |
No problem~ |
could you provide the formula for weights in detail ? I found k*f(1/(n^i/N)) can not obtain the weights in the file in HICO. But the weights in your code is different: 9.609821 and 13.670264 |
The formula is just the simple frequency as probability, i.e. k*lg(1/frequency). neg number has two parts:
To my knowledge, the 600 and 597 HOIs have the same gt pair numbers, so the results are different. |
Thanks for your reply. Yeah, I made a mistake. This weight will affect the performance largely. |
No problem~ yeah, long-tail data distribution learning is still an open question, the studies on loss, data sampling, latent space learning are very interesting. |
Hi @DirtyHarryLYL ! Thanks a lot for your great work!
I noticed that in
lib/networks/TIN_HICO.py
, you've added two extra weightsself.HO_weight
andself.binary_weight
to the classification scores from both HOI and binary classifiers, which is different from the code from iCAN. May I ask why did you multiply the weights with the raw classification scores and how are the weights be generated?Thanks!
The text was updated successfully, but these errors were encountered: