Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Small discrepancy in accuracy & Some results of pretrained classifiers are missing in doc #3152

Closed
yoshitomo-matsubara opened this issue Dec 10, 2020 · 10 comments · Fixed by #3360

Comments

@yoshitomo-matsubara
Copy link
Contributor

📚 Documentation

With torch==1.7.0 and torchvision==0.8.1, I found

  1. small discrepancy in accuracy (doc vs. code)
  2. pretrained ShuffleNet V2 (x0.5) and MNASNet 0.5 are available, but the numbers are missing in doc

Environment

  • Ubuntu 18.04 LTS
  • Python 3.6.9
  • torch: 1.7.0
  • torchvision: 0.8.1

I used pipenv run python train.py --test-only --pretrained --model ${model} --data-path /home/yoshitom/dataset/ilsvrc2012/ -b 32 to get accuracy of pretrained models.
For Inception v3, the following change is made on train.py as specified in README.md. as part of commit aa753263b8e7b3180225e3ad1e6e5434a5f42882

dataset_test = torchvision.datasets.ImageFolder(
    valdir,
    transforms.Compose([
        transforms.Resize(342),
        transforms.CenterCrop(299),
        transforms.ToTensor(),
        normalize,
    ]))
Model Top-1 error (in doc) Top-1 error (by code) Top-5 error (in doc) Top-5 error (by code)
AlexNet 43.45 43.376 20.91 20.944
VGG-11 30.98 31.128 11.37 11.342
VGG-13 30.07 30.016 10.75 10.694
VGG-16 28.41 28.372 9.62 9.632
VGG-19 27.62 27.64 9.12 9.15
VGG-11 with batch normalization 29.62 29.592 10.19 10.276
VGG-13 with batch normalization 28.45 28.382 9.63 9.64
VGG-16 with batch normalization 26.63 26.524 8.5 8.464
VGG-19 with batch normalization 25.76 25.784 8.15 8.152
ResNet-18 30.24 30.356 10.92 11.018
ResNet-34 26.7 26.734 8.58 8.57
ResNet-50 23.85 23.988 7.13 7.066
ResNet-101 22.63 22.686 6.44 6.444
ResNet-152 21.69 21.75 5.94 6.018
SqueezeNet 1.0 41.9 42 19.58 19.512
SqueezeNet 1.1 41.81 41.816 19.38 19.486
Densenet-121 25.35 25.528 7.83 8.026
Densenet-169 24 24.372 7 7.19
Densenet-201 22.8 23.068 6.43 6.61
Densenet-161 22.35 22.854 6.2 6.398
Inception v3 22.55 22.668 6.44 6.624
GoogleNet 30.22 30.256 10.47 10.456
ShuffleNet V2 30.64 30.598 11.68 11.626
MobileNet V2 28.12 28.15 9.71 9.666
ResNeXt-50-32x4d 22.38 22.372 6.3 6.32
ResNeXt-101-32x8d 20.69 20.79 5.47 5.444
Wide ResNet-50-2 21.49 21.536 5.91 5.936
Wide ResNet-101-2 21.16 21.088 5.72 5.656
MNASNet 1.0 26.49 26.598 8.456 8.546
ShuffleNet V2 (x0.5) 39.354 18.304
MNASNet 0.5 32.17 12.544
@datumbox
Copy link
Contributor

@yoshitomo-matsubara Thanks a lot for reporting!

Given that the discrepancies are very small, it's unclear whether this is the result of changes in the code or differences on our setup (GPU count, ImageNet copy etc) between our infra and yours. I'll leave the ticket open, so that we can reproduce the numbers on our side and confirm.

@yoshitomo-matsubara
Copy link
Contributor Author

Sure. Just FYI, I used only one GPU and ILSVRC 2012 (validation dataset) to get the above numbers.
Hope this helps.

@ain-soph
Copy link
Contributor

ain-soph commented Dec 30, 2020

Hi, I also suffer this problem recently.
I can reproduce your result with train.py and exactly the same numbers you posted.

However, when I check the validation results by myself, things are different for a third time.
Here's a sample code I think which is more simple:

import torch
import torch.cuda
import torch.utils.data
import torchvision
import torchvision.transforms

root='./data/'    # Need to modify
transform = torchvision.transforms.Compose([
    torchvision.transforms.Resize(256),
    torchvision.transforms.CenterCrop(224),
    torchvision.transforms.ToTensor(),
    torchvision.transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])
dataset = torchvision.datasets.ImageNet(root=root, split='val', transform=transform)
dataloader = torch.utils.data.DataLoader(dataset, batch_size=128, pin_memory=True, num_workers=8)
model = torchvision.models.resnet.resnet18(pretrained=True)
model.cuda()
model.eval()

correct = 0
total = 0
for _input, _target in dataloader:
    _input = _input.to(device='cuda', non_blocking=True)
    _target = _target.to(device='cuda', non_blocking=True)
    _output = model(_input)
    correct += int((_output.argmax(1) == _target).int().sum())
    total += int(_target.size(0))
print(correct / total)

torchvision.transforms.Resize((256, 256)),
torchvision.transforms.CenterCrop((224, 224)),

The result for ResNet18 is 31.066, ResNet101 is 22.876.
Even a little bit lower than the train.py results.

I'm very confusing as well. I've checked almost everywhere suspicious, but find nothing that affect the result.
(random seed, AverageMeter / SmoothedValue, batch_size, sampler, transform)

So what part of train.py makes the accuracy a little bit higher?
The distributed model?

@yoshitomo-matsubara
Copy link
Contributor Author

Hi @ain-soph ,

My best guess from your sample code is that the small accuracy difference between yours and mine is caused by ‘Resize(256, 256)‘ as long as you're using ILSVRC 2012 dataset.

‘Resize(256)‘ in example code should not be equal to ‘Resize(256, 256)‘ in your sample code according to https://pytorch.org/docs/stable/torchvision/transforms.html#torchvision.transforms.Resize

@ain-soph
Copy link
Contributor

ain-soph commented Dec 30, 2020

Hi @ain-soph ,

My best guess from your sample code is that the small accuracy difference between yours and mine is caused by ‘Resize(256, 256)‘ as long as you're using ILSVRC 2012 dataset.

‘Resize(256)‘ in example code should not be equal to ‘Resize(256, 256)‘ in your sample code according to https://pytorch.org/docs/stable/torchvision/transforms.html#torchvision.transforms.Resize

You are certainly correct! I just validate that. Thanks a lot and happy new year!
My numbers are now exactly the same as yours.

I think the docs may be already out-of-date? Even though I don't know what things have changed.

I've tested multiple settings, including multiple GPUs with nn.DataParallel, but not test DDP yet.

@yoshitomo-matsubara
Copy link
Contributor Author

Hi @datumbox
Is there any update for these discrepancies between accuracy in doc and by code?

@yoshitomo-matsubara Thanks a lot for reporting!

Given that the discrepancies are very small, it's unclear whether this is the result of changes in the code or differences on our setup (GPU count, ImageNet copy etc) between our infra and yours. I'll leave the ticket open, so that we can reproduce the numbers on our side and confirm.

@datumbox
Copy link
Contributor

datumbox commented Feb 6, 2021

@yoshitomo-matsubara Sorry I did not have time to check this. I'll try to run the too missing models ShuffleNet V2 (x0.5) and MNASNet 0.5 soon. Could you let me know which of the other models that seem incorrect has the biggest discrepancy?

@yoshitomo-matsubara
Copy link
Contributor Author

@datumbox Thank you for the response!
For discrepancy, probably you would like to start with DenseNet models

  • DenseNet-161: +0.504 (acc by code - acc in doc)
  • DenseNet-169: + 0.372
  • DenseNet-201: + 0.268
  • DenseNet-121: + 0.178

@datumbox
Copy link
Contributor

datumbox commented Feb 8, 2021

@yoshitomo-matsubara Thanks for flagging. I confirmed that there are some discrepancies of the numbers. Especially for DenseNet the reason was that the model was trained with Torch-Lua and later ported to PyTorch (see #116). I've rerun the inference for all the models and you can see the numbers at #3360. Let me know if you see any other issues.

@datumbox datumbox closed this as completed Feb 8, 2021
@datumbox datumbox reopened this Feb 8, 2021
@yoshitomo-matsubara
Copy link
Contributor Author

Thank you @datumbox for reruning inference for all the models and updating the doc!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants