-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Model Converge Problem #5
Comments
What do you mean by not working, totally disconverge? Or performance not good. It will be useful if you are provide more information.
|
I tried 3 vision transformers and two vision transformers are working on the same dataset and coverage easily. Your NesT transformer is totally disconverge from the beginning and after 100 epochs, there is no improvement in accuracy and loss. When I trained on another transformer on same it works fine. |
Did you train these methods from scratch or finetune with their pre-trained checkpoints. It will be important sometimes. |
Did you train these methods from scratch or finetune with their pre-trained checkpoints. It will be important sometimes. Our scripts currently only train from scratch, but it can be easy for finetuning using our pre-trained models. |
Hi, from our experiments, we do not our methods have convergence issue. it will be great if you can provide more training detailed info so I can help, e.g. what is the others method you train and what is the setup (scripts), what is the number of devices. Otherwise, it is hard to diagnose. |
Hi, From my experience with the architecture. It is very sensitive to warm-up epochs. When i used the implementation from timm with the warm up schedule of PyTorch-lightning it was diverging. But when i followed their warm up implementation it worked fine. The problem also happened with me once after when i did not use any augmentation by mistake. |
I am training on a medium-scale dataset that consists of 100,000 images. The learning rate and weight decay as the same as your config but still not working. Any opinion?
Regards,
Khawar Islam
The text was updated successfully, but these errors were encountered: