This repository contains the code of our paper:
When Does Bias Transfer in Transfer Learning?
Hadi Salman*, Saachi Jain*, Andrew Ilyas*, Logan Engstrom*, Eric Wong, Aleksander Madry
Paper - Blog post
@article{salman2022does,
title={When does Bias Transfer in Transfer Learning?},
author={Salman, Hadi and Jain, Saachi and Ilyas, Andrew and Engstrom, Logan and Wong, Eric and Madry, Aleksander},
journal={arXiv preprint arXiv:2207.02842},
year={2022}
}
The major components of our repo are:
- main.py: Entry point of our codebase.
- src/: Contains all our code for running full transfer pipeline.
- dataset_configs/: Contains the config files that
main.py
expects. These config files contain the hyperparams for each transfer tasks.
Our code relies on our FFCV Library. To install this library, along with other dependencies including PyTorch, follow the instructions below.
conda create -n ffcv python=3.9 cupy pkg-config compilers libjpeg-turbo opencv pytorch torchvision cudatoolkit=11.3 numba -c pytorch -c conda-forge
conda activate ffcv
pip install ffcv
You are now ready to run our code!
A major component of our paper is training a source model (potentially with some spurious correlation).
- To train standard ImageNet model:
python main.py \
--config-file dataset_configs/imagenet.yaml \
--training.exp_name clean_imagenet \
--training.outdir $OUTDIR
- To train spurious ImageNet model, which adds a backdoor to the ImageNet model (either a yellow square, or a fixed Gaussian patter, or hats):
python main.py \
--config-file dataset_configs/imagenet.yaml \
--spurious.spurious_type square \ #or "gaussian" or "hat"
--spurious.spurious_perc 1 \
--training.exp_name spurious_imagenet_square \
--training.outdir $OUTDIR
where $OUTDIR
is the output directory of your choice.
Now given a pretrained ImageNet model (either clean or spurious), we can transfer this model to various downstream tasks by running:
python main.py --config-file dataset_configs/${DATASET}.yaml \
--training.exp_name transfer \
--training.outdir $OUTDIR \
--model.transfer $TRANSFER_TYPE \
--model.checkpoint ${CHECKPOINT_PATH}/version_0/checkpoints/checkpoint_latest.pt
where $DATASET
is the downstream dataset which we want to transfer to (see next section), $OUTDIR
is the output directory of your choice, $TRANSFER_TYPE
is either FIXED
or FULL
denoting either fixed-feature transfer or full-network transfer, and $CHECKPOINT_PATH
is the path of the saved checkpoint of the source ImageNet model from above.
This code also evaluates the trained model at the end of training to check for any bias. The results (accuracy, predictions, etc.) can be found in $OUTDIR/results.pt
.
-
aircraft (Download)
-
birds (Download)
-
caltech101 (Download)
-
caltech256 (Download)
-
cifar10 (Automatically downloaded when you run the code)
-
cifar100 (Automatically downloaded when you run the code)
-
flowers (Download)
-
food (Download)
-
pets (Download)
-
stanford_cars (Download)
-
SUN397 (Download)
We have created an FFCV version of each of these datasets to enable super fast training. We will make these datasets available soon!
Coming soon!
Coming soon!