Code for the paper "ProxyDR: Deep Hyperspherical Metric Learning with Distance Ratio-Based Formulation"
- Python3.8
- (For conda environment installations, you can follow the commands in
conda_installation.txt
) - PyTorch (http://pytorch.org/) (gpytorch 1.4.1)
- NumPy (version 1.19.5)
- Pandas (version 1.0.5)
- Scikit-learn (version 0.24.2)
- SciPy (version 1.5.0)
- Biopython (version 1.79)
- Json5 (version 0.8.5)
- scikit-bio
- ete3
We used CIFAR-100 from torchvision https://pytorch.org/vision/stable/datasets.html.
One may download the CIFAR-100 dataset from https://www.cs.toronto.edu/~kriz/cifar.html (CIFAR-100 python version).
One can download NABirds dataset from https://dl.allaboutbirds.org/nabirds. You need to change path names in nabirds_cls.csv
, nabirds_cls2.csv
, and nabirds_info.csv
such that images are located in the written path (you will only need to change "DATA_init" to the corresponding folder name in each line).
You need to run Prepare_NABirds.ipynb
after properly changing the config.json
file as explained in the train section.
You can download these from Small microplankton (MicroS), Large microplankton (MicroL), and Mesozooplankton (MesoZ). These datasets should be inside a folder named "plankton_data" (you need to make this folder). You need to change path names in MicroS_cls.csv
, MicroS_info.csv
, MicroL_cls.csv
, MicroL_info.csv
, MesoZ_cls.csv
, and MesoZ_info.csv
such that images are located in the written path (you will only need to change "DATA_init" to the corresponding folder name in each line. For instance, you might use the command sed -i 's/DATA_init/Data_path_name/g' MicroS_cls.csv
).
Before training, in the config.json
file, you need to put where the "nabirds" and "plankton_data" folders are located (DATA_init
) and where this repository (ProxyDR
) is located (FOLDER_init
).
For training of CIFAR100 dataset, run python train_cifar100.py --GPU [GPU_NUMBER(S)] --method [METHOD_NAME] --distance [DISTANCE] --use_val --seed [SEED_NUMBER] --[TRAINING_OPTION]
.
For training of NABird dataset, run python train_nabirds.py --GPU [GPU_NUMBER(S)] --method [METHOD_NAME] --distance [DISTANCE] --use_val --seed [SEED_NUMBER] --[TRAINING_OPTION]
.
For training of plankton datasets, run python train.py --GPU [GPU_NUMBER(S)] --dataset [DATASET_NAME] --method [METHOD_NAME] --distance [DISTANCE] --size_inform --use_val --seed [SEED_NUMBER] --[TRAINING_OPTION]
.
-
Methods (
[METHOD_NAME]
)
Softmax:softmax
, NormFace:normface
, ProxyDR: defaultDR
, CORR loss:--method DR --mds_W --CORR
-
Training options and the corresponding
[TRAINING_OPTION]
names
Standard: default (without any --[TRAINING_OPTION]), EMA:--ema
, Dynamic (scale factor):--dynamic
, MDS (multidimensional scaling):--mds_W
For example, to train NormFace model on MicroS dataset with standard option (also GPU:0, seed: 1, use Euclidean distance, size information and validation), run python train.py --GPU 0 --dataset MicroS --method SD --distance euc --size_inform --seed 1 --use_val
For example, to train ProxyDR model on MicroS dataset with MDS and dynamic options (also GPU:0, seed: 1, use Euclidean distance, size information and validation), run python train.py --GPU 0 --dataset MicroS --method DR --distance euc --size_inform --seed 1 --use_val --mds_W --dynamic
For example, to train CORR model (requires MDS) on MicroS dataset (also GPU:0, seed: 1, use Euclidean distance, size information and validation), run python train.py --GPU 0 --dataset MicroS --method DR --distance euc --size_inform --seed 1 --use_val --mds_W --CORR
If you want to replicate the experiments, instead of typing each training setting, you can run train_CIFAR100_whole_models.sh
, train_NABirds_whole_models.sh
, train_MicroS_whole_models.sh
, train_MicroL_whole_models.sh
, and train_MesoZ_whole_models.sh
. (You may want to change GPU number. Values might differ due to randomness.)
For evaluation of CIFAR100 dataset models, run python eval_cifar100.py --GPU [GPU_NUMBER(S)] --method [METHOD_NAME] --distance [DISTANCE] --use_val --seed [SEED_NUMBER] --[TRAINING_OPTION]
.
For evaluation of NABird dataset models, run python eval_nabirds.py --GPU [GPU_NUMBER(S)] --method [METHOD_NAME] --distance [DISTANCE] --use_val --seed [SEED_NUMBER] --[TRAINING_OPTION]
.
For evaluation of plankton dataset models, run python eval_.py --GPU [GPU_NUMBER(S)] --dataset [DATASET_NAME] --method [METHODNAME] --distance [DISTANCE] --size_inform --use_val --seed [SEED_NUMBER] --[TRAINING_OPTION]
.
--last
: evaluate the last training epoch model (probably not the best model)
If you want to replicate the experiments, instead of typing each evaluation setting, you can run eval_CIFAR100_whole_models.sh
, eval_NABirds_whole_models.sh
, eval_MicroS_whole_models.sh
, eval_MicroL_whole_models.sh
, and eval_MesoZ_whole_models.sh
. (You may want to change GPU number. Values might differ due to randomness.)
The training and evaluation results will be recorded in ./record/
Dynamic option implementation is modified from https://github.com/4uiiurz1/pytorch-adacos/blob/master/metrics.py.