Model Converge Problem #5

khawar-islam · 2021-07-17T12:43:43Z

I am training on a medium-scale dataset that consists of 100,000 images. The learning rate and weight decay as the same as your config but still not working. Any opinion?

Regards,
Khawar Islam

zizhaozhang · 2021-07-18T18:12:46Z

What do you mean by not working, totally disconverge? Or performance not good. It will be useful if you are provide more information.
Two suggestions:

It is better to do data diagnosis first.
Train a standard ResNet on your dataset, to see whether it is data issue or model issue.

khawar-islam · 2021-07-19T00:00:09Z

I tried 3 vision transformers and two vision transformers are working on the same dataset and coverage easily. Your NesT transformer is totally disconverge from the beginning and after 100 epochs, there is no improvement in accuracy and loss.

When I trained on another transformer on same it works fine.

zizhaozhang · 2021-07-19T06:18:05Z

Did you train these methods from scratch or finetune with their pre-trained checkpoints. It will be important sometimes.
Our scripts currently only train from scratch, but it can be easy for finetuning using our pre-trained models.

khawar-islam · 2021-07-19T06:28:32Z

Did you train these methods from scratch or finetune with their pre-trained checkpoints. It will be important sometimes.
I am training from scratch

Our scripts currently only train from scratch, but it can be easy for finetuning using our pre-trained models.
scratch

zizhaozhang · 2021-07-19T17:45:56Z

Hi, from our experiments, we do not our methods have convergence issue. it will be great if you can provide more training detailed info so I can help, e.g. what is the others method you train and what is the setup (scripts), what is the number of devices. Otherwise, it is hard to diagnose.

Euruson · 2021-11-17T04:25:41Z

The network is kind of "sensitive". I used AdamW with learning rate decay and found it crashed when adjusting the learning rate. Note that I used the implementation of PyTorch in timm.

Freder-chen · 2022-01-27T02:04:19Z

The network is kind of "sensitive". I used AdamW with learning rate decay and found it crashed when adjusting the learning rate. Note that I used the implementation of PyTorch in timm.

From my training, this should be a rare occurrence. And it is recommended to use gradient clipping.

abdohelmy · 2022-05-19T17:45:51Z

Hi, From my experience with the architecture. It is very sensitive to warm-up epochs. When i used the implementation from timm with the warm up schedule of PyTorch-lightning it was diverging. But when i followed their warm up implementation it worked fine. The problem also happened with me once after when i did not use any augmentation by mistake.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model Converge Problem #5

Model Converge Problem #5

khawar-islam commented Jul 17, 2021

zizhaozhang commented Jul 18, 2021

khawar-islam commented Jul 19, 2021

zizhaozhang commented Jul 19, 2021

khawar-islam commented Jul 19, 2021

zizhaozhang commented Jul 19, 2021

Euruson commented Nov 17, 2021

Freder-chen commented Jan 27, 2022

abdohelmy commented May 19, 2022

Model Converge Problem #5

Model Converge Problem #5

Comments

khawar-islam commented Jul 17, 2021

zizhaozhang commented Jul 18, 2021

khawar-islam commented Jul 19, 2021

zizhaozhang commented Jul 19, 2021

khawar-islam commented Jul 19, 2021

zizhaozhang commented Jul 19, 2021

Euruson commented Nov 17, 2021

Freder-chen commented Jan 27, 2022

abdohelmy commented May 19, 2022