NLA-SLR: Pretrained model works fine, training gets stuck at 0.05% accuracy #57

foxcpp · 2024-05-02T19:19:36Z

Hello, I not sure where to start with troubleshooting the following issue.

I am trying to train NLA-SLR on WLASL-2000, when training Video-64 top-1 per-class accuracy seems to be stuck on 0.05 - as is, the model is not learning at all. I use configs/rgb_frame64.yaml without any changes, WLASL data is scaled to 256x256 with black padding.

I train on 2x A100 40GB with batch_size: 4. When using prediction.py to test the pretrained Video-64 model, I successfully obtain 51% accuracy therefore data must be fine. Tried to train WLASL-100, stopped at epoch 25 as there was no progress either (and validation accuracy was stuck at 1%).

I modified the code to output training accuracy and it seems that the model is overfitting like crazy with training accuracy reaching 99%.

foxcpp · 2024-05-07T12:59:52Z

I made some adjustments to the default config: doubled batch size and halved learning rate, around epoch 50 model seems to start actually learning something useful - validation accuracy goes up to 27%. Will see if I am able to reproduce paper results this way. Still looks like very overfit model.

foxcpp · 2024-05-07T13:00:57Z

    torch.backends.cuda.matmul.allow_tf32 = False
    torch.backends.cudnn.allow_tf32 = False

On top of this, it seems to be necessary to disable Ampere GPU optimizations, otherwise even training accuracy is stuck at 1% and the model is completely broken.

2000ZRL · 2024-05-14T09:36:43Z

Before training Video-64, you may try to pretrain each single stream (RGB and keypoints) separately. This progressive training strategy is very helpful.

pooyafayyaz · 2024-07-31T22:31:36Z

I have the same issue with key points, were you able to solve it @foxcpp? I used a smaller learning rate for videos and it worked, still the accuracy is not high. But for key points, it gets stuck at 0.05.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NLA-SLR: Pretrained model works fine, training gets stuck at 0.05% accuracy #57

NLA-SLR: Pretrained model works fine, training gets stuck at 0.05% accuracy #57

foxcpp commented May 2, 2024 •

edited

Loading

foxcpp commented May 7, 2024 •

edited

Loading

foxcpp commented May 7, 2024

2000ZRL commented May 14, 2024

pooyafayyaz commented Jul 31, 2024

NLA-SLR: Pretrained model works fine, training gets stuck at 0.05% accuracy #57

NLA-SLR: Pretrained model works fine, training gets stuck at 0.05% accuracy #57

Comments

foxcpp commented May 2, 2024 • edited Loading

foxcpp commented May 7, 2024 • edited Loading

foxcpp commented May 7, 2024

2000ZRL commented May 14, 2024

pooyafayyaz commented Jul 31, 2024

foxcpp commented May 2, 2024 •

edited

Loading

foxcpp commented May 7, 2024 •

edited

Loading