You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, I not sure where to start with troubleshooting the following issue.
I am trying to train NLA-SLR on WLASL-2000, when training Video-64 top-1 per-class accuracy seems to be stuck on 0.05 - as is, the model is not learning at all. I use configs/rgb_frame64.yaml without any changes, WLASL data is scaled to 256x256 with black padding.
I train on 2x A100 40GB with batch_size: 4. When using prediction.py to test the pretrained Video-64 model, I successfully obtain 51% accuracy therefore data must be fine. Tried to train WLASL-100, stopped at epoch 25 as there was no progress either (and validation accuracy was stuck at 1%).
I modified the code to output training accuracy and it seems that the model is overfitting like crazy with training accuracy reaching 99%.
The text was updated successfully, but these errors were encountered:
I made some adjustments to the default config: doubled batch size and halved learning rate, around epoch 50 model seems to start actually learning something useful - validation accuracy goes up to 27%. Will see if I am able to reproduce paper results this way. Still looks like very overfit model.
On top of this, it seems to be necessary to disable Ampere GPU optimizations, otherwise even training accuracy is stuck at 1% and the model is completely broken.
Before training Video-64, you may try to pretrain each single stream (RGB and keypoints) separately. This progressive training strategy is very helpful.
I have the same issue with key points, were you able to solve it @foxcpp? I used a smaller learning rate for videos and it worked, still the accuracy is not high. But for key points, it gets stuck at 0.05.
Hello, I not sure where to start with troubleshooting the following issue.
I am trying to train NLA-SLR on WLASL-2000, when training Video-64 top-1 per-class accuracy seems to be stuck on 0.05 - as is, the model is not learning at all. I use configs/rgb_frame64.yaml without any changes, WLASL data is scaled to 256x256 with black padding.
I train on 2x A100 40GB with batch_size: 4. When using prediction.py to test the pretrained Video-64 model, I successfully obtain 51% accuracy therefore data must be fine. Tried to train WLASL-100, stopped at epoch 25 as there was no progress either (and validation accuracy was stuck at 1%).
I modified the code to output training accuracy and it seems that the model is overfitting like crazy with training accuracy reaching 99%.
The text was updated successfully, but these errors were encountered: