Cannot reproduce the accuracy of ResNet #5593

Ezra-Yu · 2022-03-11T08:53:41Z

I have run the baseline of the resnet in this blog . The reported accuracy of the baseline is 76.16 .But I can not get the reported accuracy. here is my result:
Acc@1 75.878 Acc@5 92.856
Acc@1 75.382 Acc@5 92.574
Acc@1 75.490 Acc@5 92.714

my environment is :
cuda 11.3
pyroch 1.10.2
torchvision 0.11.3

all the setting are default, I just command:
torchrun --nproc_per_node=8 train.py --model resnet50

cc @datumbox

The text was updated successfully, but these errors were encountered:

datumbox · 2022-03-11T09:26:34Z

@Ezra-Yu Thanks for reporting.

The baseline accuracy reported on the blogpost is for the previously released pretrained model of TorchVision. I verified its accuracy by:

torchrun --nproc_per_node=1 train.py --test-only --prototype --weights ResNet50_Weights.IMAGENET1K_V1 --model resnet50 -b 1
Acc@1 76.132 Acc@5 92.864

Note that minor differences on the 2nd decimal are expected (see #4559), but the accuracy of the released pre-trained model matches.

According to the references, you are using the correct command / hyperparams for training the model. So the listed command should give you a similar model. The difference that you report is significant but so does the variance of the reported accuracies that you get. This is because this model is trained for very few epochs and hasn't fully converged. To further investigate the situation, I tried to locate the training log of the released model but unfortunately as this has been trained many years ago and the person who trained it is no longer with our team I wasn't able to locate it. I think the most likely scenario is that the person who trained it got lucky and got a good initialization value. This can be confirmed by doing a few more runs.

Concerning any other accuracy reported on the blogpost, the models have been trained in 2021 using the scripts of our repo and I can confirm these are the numbers we got on our runs. You should be able to fully reproduce them with the specific scripts.

I'm going to close the issue to keep things tidy but if you have concerns please feel free to reopen. Thanks!

Ezra-Yu · 2022-03-11T09:42:06Z

Thank you for your quick reply.

So, do you mean that from LR optimizations step in the blog, I can fully reproduce the reported result?

datumbox · 2022-03-11T12:51:59Z

If you run this command, you should be able to fully reproduce the updated result:

torchrun --nproc_per_node=8 train.py --model resnet50 --batch-size 128 --lr 0.5 \
--lr-scheduler cosineannealinglr --lr-warmup-epochs 5 --lr-warmup-method linear \
--auto-augment ta_wide --epochs 600 --random-erase 0.1 --weight-decay 0.00002 \
--norm-weight-decay 0.0 --label-smoothing 0.1 --mixup-alpha 0.2 --cutmix-alpha 1.0 \
--train-crop-size 176 --model-ema --val-resize-size 232 --ra-sampler --ra-reps 4

People from the community already have successfully reproduced it and actually improved upon it. See #5201

datumbox closed this as completed Mar 11, 2022

datumbox added module: models module: reference scripts labels Mar 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot reproduce the accuracy of ResNet #5593

Cannot reproduce the accuracy of ResNet #5593

Ezra-Yu commented Mar 11, 2022 •

edited by pytorch-bot bot

Loading

datumbox commented Mar 11, 2022

Ezra-Yu commented Mar 11, 2022

datumbox commented Mar 11, 2022 •

edited

Loading

Cannot reproduce the accuracy of ResNet #5593

Cannot reproduce the accuracy of ResNet #5593

Comments

Ezra-Yu commented Mar 11, 2022 • edited by pytorch-bot bot Loading

datumbox commented Mar 11, 2022

Ezra-Yu commented Mar 11, 2022

datumbox commented Mar 11, 2022 • edited Loading

Ezra-Yu commented Mar 11, 2022 •

edited by pytorch-bot bot

Loading

datumbox commented Mar 11, 2022 •

edited

Loading