Small discrepancy in accuracy & Some results of pretrained classifiers are missing in doc #3152

yoshitomo-matsubara · 2020-12-10T17:59:02Z

📚 Documentation

With torch==1.7.0 and torchvision==0.8.1, I found

small discrepancy in accuracy (doc vs. code)
pretrained ShuffleNet V2 (x0.5) and MNASNet 0.5 are available, but the numbers are missing in doc

Environment

Ubuntu 18.04 LTS
Python 3.6.9
torch: 1.7.0
torchvision: 0.8.1

I used pipenv run python train.py --test-only --pretrained --model ${model} --data-path /home/yoshitom/dataset/ilsvrc2012/ -b 32 to get accuracy of pretrained models.
For Inception v3, the following change is made on train.py as specified in README.md. as part of commit aa753263b8e7b3180225e3ad1e6e5434a5f42882

dataset_test = torchvision.datasets.ImageFolder(
    valdir,
    transforms.Compose([
        transforms.Resize(342),
        transforms.CenterCrop(299),
        transforms.ToTensor(),
        normalize,
    ]))

Model	Top-1 error (in doc)	Top-1 error (by code)	Top-5 error (in doc)	Top-5 error (by code)
AlexNet	43.45	43.376	20.91	20.944
VGG-11	30.98	31.128	11.37	11.342
VGG-13	30.07	30.016	10.75	10.694
VGG-16	28.41	28.372	9.62	9.632
VGG-19	27.62	27.64	9.12	9.15
VGG-11 with batch normalization	29.62	29.592	10.19	10.276
VGG-13 with batch normalization	28.45	28.382	9.63	9.64
VGG-16 with batch normalization	26.63	26.524	8.5	8.464
VGG-19 with batch normalization	25.76	25.784	8.15	8.152
ResNet-18	30.24	30.356	10.92	11.018
ResNet-34	26.7	26.734	8.58	8.57
ResNet-50	23.85	23.988	7.13	7.066
ResNet-101	22.63	22.686	6.44	6.444
ResNet-152	21.69	21.75	5.94	6.018
SqueezeNet 1.0	41.9	42	19.58	19.512
SqueezeNet 1.1	41.81	41.816	19.38	19.486
Densenet-121	25.35	25.528	7.83	8.026
Densenet-169	24	24.372	7	7.19
Densenet-201	22.8	23.068	6.43	6.61
Densenet-161	22.35	22.854	6.2	6.398
Inception v3	22.55	22.668	6.44	6.624
GoogleNet	30.22	30.256	10.47	10.456
ShuffleNet V2	30.64	30.598	11.68	11.626
MobileNet V2	28.12	28.15	9.71	9.666
ResNeXt-50-32x4d	22.38	22.372	6.3	6.32
ResNeXt-101-32x8d	20.69	20.79	5.47	5.444
Wide ResNet-50-2	21.49	21.536	5.91	5.936
Wide ResNet-101-2	21.16	21.088	5.72	5.656
MNASNet 1.0	26.49	26.598	8.456	8.546
ShuffleNet V2 (x0.5)		39.354		18.304
MNASNet 0.5		32.17		12.544

The text was updated successfully, but these errors were encountered:

datumbox · 2020-12-11T13:07:20Z

@yoshitomo-matsubara Thanks a lot for reporting!

Given that the discrepancies are very small, it's unclear whether this is the result of changes in the code or differences on our setup (GPU count, ImageNet copy etc) between our infra and yours. I'll leave the ticket open, so that we can reproduce the numbers on our side and confirm.

yoshitomo-matsubara · 2020-12-11T16:50:36Z

Sure. Just FYI, I used only one GPU and ILSVRC 2012 (validation dataset) to get the above numbers.
Hope this helps.

ain-soph · 2020-12-30T07:53:52Z

Hi, I also suffer this problem recently.
I can reproduce your result with train.py and exactly the same numbers you posted.

~~However, when I check the validation results by myself, things are different for a third time.~~
Here's a sample code I think which is more simple:

import torch
import torch.cuda
import torch.utils.data
import torchvision
import torchvision.transforms

root='./data/'    # Need to modify
transform = torchvision.transforms.Compose([
    torchvision.transforms.Resize(256),
    torchvision.transforms.CenterCrop(224),
    torchvision.transforms.ToTensor(),
    torchvision.transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])
dataset = torchvision.datasets.ImageNet(root=root, split='val', transform=transform)
dataloader = torch.utils.data.DataLoader(dataset, batch_size=128, pin_memory=True, num_workers=8)
model = torchvision.models.resnet.resnet18(pretrained=True)
model.cuda()
model.eval()

correct = 0
total = 0
for _input, _target in dataloader:
    _input = _input.to(device='cuda', non_blocking=True)
    _target = _target.to(device='cuda', non_blocking=True)
    _output = model(_input)
    correct += int((_output.argmax(1) == _target).int().sum())
    total += int(_target.size(0))
print(correct / total)

~~torchvision.transforms.Resize((256, 256)),~~
~~torchvision.transforms.CenterCrop((224, 224)),~~

~~The result for ResNet18 is 31.066, ResNet101 is 22.876.~~
~~Even a little bit lower than the train.py results.~~

~~I'm very confusing as well. I've checked almost everywhere suspicious, but find nothing that affect the result.~~
~~(random seed, AverageMeter / SmoothedValue, batch_size, sampler, transform)~~

~~So what part of train.py makes the accuracy a little bit higher?~~
~~The distributed model?~~

yoshitomo-matsubara · 2020-12-30T08:23:37Z

Hi @ain-soph ,

My best guess from your sample code is that the small accuracy difference between yours and mine is caused by ‘Resize(256, 256)‘ as long as you're using ILSVRC 2012 dataset.

‘Resize(256)‘ in example code should not be equal to ‘Resize(256, 256)‘ in your sample code according to https://pytorch.org/docs/stable/torchvision/transforms.html#torchvision.transforms.Resize

ain-soph · 2020-12-30T17:22:41Z

Hi @ain-soph ,

My best guess from your sample code is that the small accuracy difference between yours and mine is caused by ‘Resize(256, 256)‘ as long as you're using ILSVRC 2012 dataset.

‘Resize(256)‘ in example code should not be equal to ‘Resize(256, 256)‘ in your sample code according to https://pytorch.org/docs/stable/torchvision/transforms.html#torchvision.transforms.Resize

You are certainly correct! I just validate that. Thanks a lot and happy new year!
My numbers are now exactly the same as yours.

I think the docs may be already out-of-date? Even though I don't know what things have changed.

I've tested multiple settings, including multiple GPUs with nn.DataParallel, but not test DDP yet.

yoshitomo-matsubara · 2021-02-05T18:49:15Z

Hi @datumbox
Is there any update for these discrepancies between accuracy in doc and by code?

@yoshitomo-matsubara Thanks a lot for reporting!

Given that the discrepancies are very small, it's unclear whether this is the result of changes in the code or differences on our setup (GPU count, ImageNet copy etc) between our infra and yours. I'll leave the ticket open, so that we can reproduce the numbers on our side and confirm.

datumbox · 2021-02-06T11:07:06Z

@yoshitomo-matsubara Sorry I did not have time to check this. I'll try to run the too missing models ShuffleNet V2 (x0.5) and MNASNet 0.5 soon. Could you let me know which of the other models that seem incorrect has the biggest discrepancy?

yoshitomo-matsubara · 2021-02-06T17:36:31Z

@datumbox Thank you for the response!
For discrepancy, probably you would like to start with DenseNet models

DenseNet-161: +0.504 (acc by code - acc in doc)
DenseNet-169: + 0.372
DenseNet-201: + 0.268
DenseNet-121: + 0.178

datumbox · 2021-02-08T15:56:34Z

@yoshitomo-matsubara Thanks for flagging. I confirmed that there are some discrepancies of the numbers. Especially for DenseNet the reason was that the model was trained with Torch-Lua and later ported to PyTorch (see #116). I've rerun the inference for all the models and you can see the numbers at #3360. Let me know if you see any other issues.

yoshitomo-matsubara · 2021-02-08T17:08:40Z

Thank you @datumbox for reruning inference for all the models and updating the doc!

datumbox added the module: models label Dec 11, 2020

vfdev-5 added the module: documentation label Dec 11, 2020

datumbox mentioned this issue Feb 8, 2021

Replace top-X error with Acc@X and fix metric discrepancies #3360

Merged

datumbox closed this as completed Feb 8, 2021

datumbox reopened this Feb 8, 2021

datumbox closed this as completed in #3360 Feb 8, 2021

k-sobolev mentioned this issue Mar 1, 2021

Inconsistent image loading in torchvision before and after 0.8.0 version #3482

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Small discrepancy in accuracy & Some results of pretrained classifiers are missing in doc #3152

Small discrepancy in accuracy & Some results of pretrained classifiers are missing in doc #3152

yoshitomo-matsubara commented Dec 10, 2020

datumbox commented Dec 11, 2020

yoshitomo-matsubara commented Dec 11, 2020

ain-soph commented Dec 30, 2020 •

edited

Loading

yoshitomo-matsubara commented Dec 30, 2020

ain-soph commented Dec 30, 2020 •

edited

Loading

yoshitomo-matsubara commented Feb 5, 2021

datumbox commented Feb 6, 2021

yoshitomo-matsubara commented Feb 6, 2021

datumbox commented Feb 8, 2021

yoshitomo-matsubara commented Feb 8, 2021

Small discrepancy in accuracy & Some results of pretrained classifiers are missing in doc #3152

Small discrepancy in accuracy & Some results of pretrained classifiers are missing in doc #3152

Comments

yoshitomo-matsubara commented Dec 10, 2020

📚 Documentation

datumbox commented Dec 11, 2020

yoshitomo-matsubara commented Dec 11, 2020

ain-soph commented Dec 30, 2020 • edited Loading

yoshitomo-matsubara commented Dec 30, 2020

ain-soph commented Dec 30, 2020 • edited Loading

yoshitomo-matsubara commented Feb 5, 2021

datumbox commented Feb 6, 2021

yoshitomo-matsubara commented Feb 6, 2021

datumbox commented Feb 8, 2021

yoshitomo-matsubara commented Feb 8, 2021

ain-soph commented Dec 30, 2020 •

edited

Loading

ain-soph commented Dec 30, 2020 •

edited

Loading