CHANGELOG

This CHANGELOG file records changes between different arXiv versions of our paper, and the version of this codebase which should be used to reproduce the results in the corresponding arXiv version. View changes between code versions on the Releases page.

ArXiv v1 -> v2

Code version: v1.2.

Fix image captioning results with a modified beam search implementation. Rest of the downstream task results and pre-trained models are unchanged.

ArXiv v1 -> v2

Code version: v1.0 or v1.1.

ArXiv v1 was our ECCV 2020 submission (reject). ArXiv v2 is our CVPR 2021 submission (accept). The repository snapshots for these two versions are tagged at v0.9 and v1.0.

While the core motivation and approach is the same, we have made some minor changes in our experiments and evaluation setup. These slightly improve model performances across the board (within decimals). New models are available in v1.0 model zoo, however links to old models in v0.9 will be active till June 30, 2021. We encourage you to use the new models!

We have updated the experiment config files for all changes described below.

Experiment Changes

New Feature:

Add a new pretraining task for BERT-style Masked Language Modeling. Pre-trained model released in Model Zoo.

Pre-training:

The only change during pre-training is that we do not apply weight decay to LayerNorm and biases in input embedding and transformer layers. We apply weight decay to the biases in output linear layer (before softmax).
Other factors that could affect results:
- Use official albumentations.ColorJitter transform that mimics torchvision ColorJitter transform. Earlier I implemented my own ColorJitter because albumentations didn't have one.
- Use PyTorch Native AMP (Automatic Mixed Precision) instead of NVIDIA Apex.

Downstream Evaluations:

PASCAL VOC 2007 Linear Classification: [diff]
- Instead of training linear SVMs on 8192-dimensional average pooled features from ResNet-50 (7x7x2048 —> 2x2x2048), like (Misra et al. 2019), we directly train SVMs on 2048-dimensional global average pooled features, following recent works like SwAV (Caron et al. 2020).
- We change the pre-processing: resize shortest edge to 256 pixels, and take center crop of 224 pixels.
- These improve VOC mAP by 1-2 points everywhere, and makes SVM training faster. Since we select best checkpoint based on this metric, all results on other downstream tasks also change in ArXiv v2 (But the trends remain same.)
ImageNet Linear Evaluation: [diff]
- Changed random resized crop scale from (20-100%) to (8-100%) for consistency with evaluations in SSL works like MoCo and SwAV.
- Use cosine LR decay instead of step decay, following SwAV. Improves accuracy by up to 1%.
iNaturalist Fine-tuning: [diff]
- This evaluation is left unchanged across ArXiv versions, but we fixd a typo in image pre-processing step, present in publicly released config.
Detectron2 tasks (COCO and LVIS Instance Segmentation, VOC Detection):
- Heavily simplified the script. Updated Detectron2 uses a more memory-efficient SyncBatchNorm and supports AMP.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CHANGELOG.md

CHANGELOG.md

CHANGELOG

ArXiv v1 -> v2

ArXiv v1 -> v2

Experiment Changes

New Feature:

Pre-training:

Downstream Evaluations:

Files

CHANGELOG.md

Latest commit

History

CHANGELOG.md

File metadata and controls

CHANGELOG

ArXiv v1 -> v2

ArXiv v1 -> v2

Experiment Changes

New Feature:

Pre-training:

Downstream Evaluations: