Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NLA-SLR: Pretrained model works fine, training gets stuck at 0.05% accuracy #57

Open
foxcpp opened this issue May 2, 2024 · 4 comments

Comments

@foxcpp
Copy link

foxcpp commented May 2, 2024

Hello, I not sure where to start with troubleshooting the following issue.

I am trying to train NLA-SLR on WLASL-2000, when training Video-64 top-1 per-class accuracy seems to be stuck on 0.05 - as is, the model is not learning at all. I use configs/rgb_frame64.yaml without any changes, WLASL data is scaled to 256x256 with black padding.

I train on 2x A100 40GB with batch_size: 4. When using prediction.py to test the pretrained Video-64 model, I successfully obtain 51% accuracy therefore data must be fine. Tried to train WLASL-100, stopped at epoch 25 as there was no progress either (and validation accuracy was stuck at 1%).

I modified the code to output training accuracy and it seems that the model is overfitting like crazy with training accuracy reaching 99%.

@foxcpp
Copy link
Author

foxcpp commented May 7, 2024

I made some adjustments to the default config: doubled batch size and halved learning rate, around epoch 50 model seems to start actually learning something useful - validation accuracy goes up to 27%. Will see if I am able to reproduce paper results this way. Still looks like very overfit model.

@foxcpp
Copy link
Author

foxcpp commented May 7, 2024

    torch.backends.cuda.matmul.allow_tf32 = False
    torch.backends.cudnn.allow_tf32 = False

On top of this, it seems to be necessary to disable Ampere GPU optimizations, otherwise even training accuracy is stuck at 1% and the model is completely broken.

@2000ZRL
Copy link
Collaborator

2000ZRL commented May 14, 2024

Before training Video-64, you may try to pretrain each single stream (RGB and keypoints) separately. This progressive training strategy is very helpful.

@pooyafayyaz
Copy link

I have the same issue with key points, were you able to solve it @foxcpp? I used a smaller learning rate for videos and it worked, still the accuracy is not high. But for key points, it gets stuck at 0.05.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants