-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding DETA to 🤗 Transformers #3
Comments
Hi Niels, Thanks for this effort, I'm delighted to hear about it! Also glad to hear that it was relatively straightforward to implement.
This preNMS topk is important because it reduces the number of predictions fed into NMS. If the number of predictions is too high, NMS becomes too slow. In your case of 300 predictions, NMS runs fast enough and thus this topk step is not needed and can be removed. In general for custom dataset, it's fine to run something like
Cool! I've uploaded our models to three model repos at https://huggingface.co/jozhang97/deta-resnet-50 and https://huggingface.co/jozhang97/deta-swin-l and https://huggingface.co/jozhang97/deta-swin-l-o365 Is this what you mean? Let me know if there is anything else I can do to help. |
Cool, I tried it by fine-tuning DETA-resnet-50 with the exact same training hyperparameters as my DETR tutorial (300 steps with a learning rate of 1e-4 for the Transformer and 1e-5 for the backbone, weight decay of 1e-4 and gradient clipping of 0.1), this is giving me the following result on one of the validation images: So this might still need some tweaking (the results after fine-tuning DETR looked a lot better, see the bottom of this notebook). So this might have to do with postprocessing settings, or training settings. However it seems basic training works, so I'll first integrate the model in the library, and you can then perhaps go over my fine-tuning tutorial and see if some things can be improved, if you're up for that of course. |
Ahh yes, I reckon its due to the NMS postprocessing. NMS is not ideal in crowded scenes (see SoftNMS paper). Hopefully it can be quickly fixed by tweaking the NMS box threshold at this line. If not, perhaps NMS variants, like SoftNMS, would do better. I'd be happy to take a stab at this. |
Hi @NielsRogge , thanks for making the tutorial! A quick note on the demo results: DETA used sigmoid activation for classification (see here), while the DETR visualization code above seems using softmax. Please make sure this is changed accordingly. Best, |
Hi @xingyizhou, Yes I'm aware of that (the author of Conditional DETR said it to me when porting Conditional DETR to 🤗 Transformers 😄 ). The |
I'd like to merge DETA into HuggingFace Transformers, just wondering at which organization we can host the checkpoints. I see in the thread above that you uploaded the original checkpoints to your personal account, but was wondering where you'd like the HF-compatible DETA checkpoints to be hosted. Typically, they are hosted as part of an organization (like University of Texas). Kindly let me know! Kind regards, Niels |
Hi @NielsRogge |
Hi @jozhang97 yes I've uploaded the HF models to your personal account. Maybe as a next step, would it be possible to look into fine-tuning of DETA on a custom dataset? The relevant docs is here: https://huggingface.co/docs/transformers/tasks/object_detection (one would need to replace DetrForObjectDetection by DetaForObjectDetection as well as the image processor) |
Hi, We've created a demo for DETA: https://huggingface.co/spaces/hysts/DETA. But the model often has low confidences even for seemingly easy examples, the results don't look that impressive despite the 63 AP on COCO (YOLO models for instance would recognize all objects in the demo images above), could you take a look? |
Hi, Both sigmoid and NMS are used. The app uses the |
Hi DETA authors,
As this work is very nice and it builds upon DETR and Deformable DETR, both of which are available in 🤗 Transformers, it was relatively straightforward to implement DETA as well (as the only difference is a tweak in the loss function + postprocessing).
Here's a notebook that illustrates inference with DETA models: https://colab.research.google.com/drive/1epI4ejrD0dbrSR9vRRhEPE7duoALqIk9?usp=sharing.
Now I'd also like to make a fine-tuning tutorial for people, illustrating how to fine-tune DETA on a custom dataset. For that I'm using my original DETR fine-tuning tutorial, and tweaking it for DETA. However here I got a question; I'm fine-tuning on the "balloon" dataset which only consists of 1 class (balloon). However during inference, I'm getting an error stating that that "topk is out of range". This is because of this line which seems to select the top 10,000 scores, however when you're fine-tuning on a single class, then the number of queries * number of classes = 300 * 1 = 300. Hence this is smaller than 10,000 => so was wondering what the recommendation here is when fine-tuning on a dataset with only a single class (or more generally, for any custom dataset).
Also, I'm currently hosting the DETA checkpoints on my personal username on HuggingFace:
It would be cool if you could create an organization on the 🤗 Hub and host the checkpoints there (or under your own personal username if you prefer so). This way, you can also write model cards (READMEs) for those repositories etc. It seems there's already an org for the UT-data-bootcamp, but not sure we should host the checkpoints there.
Let me know what you think!
Open-sourcely yours,
Niels
ML Engineer @ HF
The text was updated successfully, but these errors were encountered: