Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance on imagenet100 and imagenet1k #1

Closed
cffan opened this issue Jan 29, 2020 · 23 comments
Closed

Performance on imagenet100 and imagenet1k #1

cffan opened this issue Jan 29, 2020 · 23 comments
Labels
good first issue Good for newcomers

Comments

@cffan
Copy link

cffan commented Jan 29, 2020

Have you tried your implementation on imagenet100 dataset? I'm getting accuracy at around 69.0 with default config (8 gpu, lr 0.03, bs 256), which is lower than the MoCo implementation in the CMC repo.

@bl0
Copy link
Owner

bl0 commented Jan 30, 2020

Hi, for imagenet100, we have got accuracy at 73+ (on par with that in CMC repo) with default config except for batch size = 128, which is the same with CMC repo.

@cffan
Copy link
Author

cffan commented Jan 30, 2020

Could you share the config you use to get 73+? Thanks!

@bl0
Copy link
Owner

bl0 commented Jan 30, 2020

You can just modify the batch size per gpu to 128/ngpu such as 16 if you use 8 gpu.

@bl0
Copy link
Owner

bl0 commented Jan 30, 2020

BTW, I also tried batch size = 256, the accuracy is 70.540, which is also lower than the MoCo implementation in the CMC repo.
But with the same config except for batch size = 128, the accuracy is 73+.

@cffan
Copy link
Author

cffan commented Feb 3, 2020

I tried batchsize 128 and got result around 72.3, which is better than the previous but still slightly worse than your results. Just want to make sure I got everything right. Here're my commands:

python -m torch.distributed.launch --nproc_per_node=8 \
    train.py \
    --batch-size 16 \
    --exp-name exp_name\
    --data-root data_folder

python -m torch.distributed.launch --nproc_per_node=4 \
    eval.py \
    --exp-name exp_name \
    --model-path output/exp_name/current.pth \
    --batch-size 64 \
    --data-root data_folder

And I'm running pytorch 1.4.0 and tochvision 0.5.0.

I think the author mentioned that using alpha=0.99 is slightly better than 0.999. Do you notice the same thing?

@bl0
Copy link
Owner

bl0 commented Feb 4, 2020

Hi, sorry for the inconvenience. Eventually, I found the full config we used, which shows that we use alpha=0.99 instead of 0.999 as CMC author suggested.

Pre-training:

alpha=0.99, amp=False, aug='CJ', batch_size=32, beta1=0.5, beta2=0.999, crop=0.2, data_folder='./data/imagenet100', dataset='imagenet100', epochs=240, exp_name='MoCo/ddp/k_all-bs_128-all_shuffle_bn', learning_rate=0.03, local_rank=0, lr_decay_epochs=[120, 160, 200], lr_decay_rate=0.1, moco=True, model='resnet50', model_folder='./output/imagenet100/MoCo/ddp/k_all-bs_128-all_shuffle_bn//models', momentum=0.9, nce_k=16384, nce_m=0.5, nce_t=0.07, num_workers=4, opt_level='O2', print_freq=10, resume='', save_freq=10, softmax=True, start_epoch=1, tb_folder='./output/imagenet100/MoCo/ddp/k_all-bs_128-all_shuffle_bn//tensorboard', tb_freq=500, warm=False, weight_decay=0.0001

Finetuning:

adam=False, amp=False, aug='CJ', batch_size=256, beta1=0.5, beta2=0.999, bn=False, cosine=False, crop=0.2, data_folder='./data', dataset='imagenet100', epochs=60, exp_name='MoCo/ddp/k_all-bs_128-all_shuffle_bn', layer=6, learning_rate=10.0, lr_decay_epochs=[30, 40, 50], lr_decay_rate=0.2, model='resnet50', model_path='./output/imagenet100/MoCo/ddp/k_all-bs_128-all_shuffle_bn/models/current.pth', model_width=1, momentum=0.9, n_label=100, num_workers=24, opt_level='O2', print_freq=10, resume='', save_folder='./output/imagenet100/MoCo/ddp/k_all-bs_128-all_shuffle_bn//linear_models', save_freq=5, start_epoch=1, syncBN=False, tb_folder='./output/imagenet100/MoCo/ddp/k_all-bs_128-all_shuffle_bn//linear_tensorboard', tb_freq=500, warm=False, weight_decay=0

The figure of ins_loss and test_acc is also attached for reference.
image
image

@cffan
Copy link
Author

cffan commented Feb 5, 2020

Thanks, I'll try these configs.

Could you also share the configs to reproduce results on imagenet?

@bl0
Copy link
Owner

bl0 commented Feb 5, 2020

Hi, I have updated the README and added the pre-trained model. You can get full configs in the checkpoints like this:

import torch
ckpt = torch.load('model.pth')
ckpt['opt']

BTW, the figure of ins_loss and test_acc for imagenet1k is also attached for reference.

image

image

@bl0 bl0 added the good first issue Good for newcomers label Feb 5, 2020
@bl0 bl0 changed the title Performance on imagenet100 Performance on imagenet100 and imagenet1k Feb 5, 2020
@cffan
Copy link
Author

cffan commented Feb 5, 2020

Thanks! I reproduced 73+ results on imagenet100. Will try to run on imagenet1k.

@bl0
Copy link
Owner

bl0 commented Feb 7, 2020

I will close this issue. If you have any questions, feel free to reopen it again.

@bl0 bl0 closed this as completed Feb 7, 2020
@bl0 bl0 pinned this issue Feb 16, 2020
@bl0
Copy link
Owner

bl0 commented Mar 1, 2020

BTW, I got Acc@1 78.140% Acc@5 94.000% on imagenet100 with batch size 512, lr = 0.8, alpha = 0.99, K all.

@cffan
Copy link
Author

cffan commented Mar 1, 2020 via email

@bl0
Copy link
Owner

bl0 commented Mar 1, 2020

Yes.

@cffan
Copy link
Author

cffan commented Mar 5, 2020

I tried your large batch size settings and get Acc@1 around 76. Do you change anything for evaluation? How many GPUs do you use?

@bl0 bl0 unpinned this issue Mar 5, 2020
@bl0 bl0 pinned this issue Mar 5, 2020
@bl0
Copy link
Owner

bl0 commented Mar 5, 2020

Hi, sorry for the misleading message. Actually, the result of 78+ is obtained in my internal version, which uses the cosine learning rate decay with 5 epochs of warmup.
I will release these modifications as soon as possible.

@bl0
Copy link
Owner

bl0 commented Mar 6, 2020

Hi, I have updated the code and provide a script scripts/train_eval_imagenet100_baseLR0.4_alpha0.99_crop0.08_k1281166_t0.1_AMPO1.sh to reproduce the performance.

BTW, today I have merged a lot of updates from my internal version, such as warmup lr scheduler, add logger, support amp, etc.

@bl0
Copy link
Owner

bl0 commented Mar 6, 2020

FYI, I have uploaded the checkpoint to onedrive which is pretrained on imagenet100 and achieve 78+.

@cffan
Copy link
Author

cffan commented Mar 6, 2020

Is this a typo? This seems to be the dataset size of the Imagenet1k.

@bl0
Copy link
Owner

bl0 commented Mar 6, 2020

Fixed. Thanks for your help.

@cffan
Copy link
Author

cffan commented Mar 7, 2020

Have you tried similar large batch size settings on Imagenet1K?

@bl0
Copy link
Owner

bl0 commented Mar 7, 2020

Actually, the key is the large base learning rate.
I use the large batch size just because I use 8 GPUs and I don't want the batch size per GPU too small, which may be inefficient. The consequence of the large batch size is linear learning rate scale and warmup.

For Imagenet1K, I also use the large batch size with a linear learning rate scale and warmup. But I have not tuned the base learning rate parameter.

@bl0
Copy link
Owner

bl0 commented Mar 7, 2020

From my perspective, the imagenet100 is small, so the training is not sufficient. Then the large batch size and small alpha work well.
But for imagenet1k, the situation is different. So I don't think the large base learning rate could be much better than the default base learning rate.

@bastian1209
Copy link

@bl0
Hi, I have a question about the learning rate for the linear evaluation phase. From the comments above and several materials, for MoCo implementation, lr=10.0 for ImageNet-100 and lr=30.0 for ImageNet-1K seems general baselines. Did the scale of CE loss for the linear evaluation training seem plausible? I mean, in my case, the observed CE loss is usually on the scale of 1e+2 ~ 1e+4 while accuracy keeps increasing stably. I would feel thankful for your reply!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

3 participants