-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot reproduce the accuracy of ResNet #5593
Comments
@Ezra-Yu Thanks for reporting. The baseline accuracy reported on the blogpost is for the previously released pretrained model of TorchVision. I verified its accuracy by:
Note that minor differences on the 2nd decimal are expected (see #4559), but the accuracy of the released pre-trained model matches. According to the references, you are using the correct command / hyperparams for training the model. So the listed command should give you a similar model. The difference that you report is significant but so does the variance of the reported accuracies that you get. This is because this model is trained for very few epochs and hasn't fully converged. To further investigate the situation, I tried to locate the training log of the released model but unfortunately as this has been trained many years ago and the person who trained it is no longer with our team I wasn't able to locate it. I think the most likely scenario is that the person who trained it got lucky and got a good initialization value. This can be confirmed by doing a few more runs. Concerning any other accuracy reported on the blogpost, the models have been trained in 2021 using the scripts of our repo and I can confirm these are the numbers we got on our runs. You should be able to fully reproduce them with the specific scripts. I'm going to close the issue to keep things tidy but if you have concerns please feel free to reopen. Thanks! |
Thank you for your quick reply. So, do you mean that from |
If you run this command, you should be able to fully reproduce the updated result:
People from the community already have successfully reproduced it and actually improved upon it. See #5201 |
I have run the baseline of the resnet in this blog . The reported accuracy of the baseline is 76.16 .But I can not get the reported accuracy. here is my result:
Acc@1 75.878 Acc@5 92.856
Acc@1 75.382 Acc@5 92.574
Acc@1 75.490 Acc@5 92.714
my environment is :
cuda 11.3
pyroch 1.10.2
torchvision 0.11.3
all the setting are default, I just command:
torchrun --nproc_per_node=8 train.py --model resnet50
cc @datumbox
The text was updated successfully, but these errors were encountered: