Skip to content

Latest commit

 

History

History
124 lines (82 loc) · 3.68 KB

File metadata and controls

124 lines (82 loc) · 3.68 KB

Reproducible PASCAL VOC2012 training with PyTorch-Ignite

In this example, we provide script and tools to perform reproducible experiments on training neural networks on PASCAL VOC2012 dataset.

Features:

  • Distributed training with native automatic mixed precision
  • Experiments tracking with ClearML

ClearML Server: TODO: ADD THE LINK

Setup

pip install -r requirements.txt

Docker

For docker users, you can use the following images to run the example:

docker pull pytorchignite/vision:latest

or

docker pull pytorchignite/hvd-vision:latest

and install other requirements as suggested above

Using Horovod as distributed framework

We do not add horovod as a requirement into requirements.txt. Please, install it manually following the official guides or use pytorchignite/hvd-vision:latest docker image.

(Optional) Download Pascal VOC2012 and SDB datasets

Download and extract the datasets:

python main.py download /path/to/datasets

This script will download and extract the following datasets into /path/to/datasets

Usage

Please, export the DATASET_PATH environment variable for the Pascal VOC2012 dataset.

export DATASET_PATH=/path/to/pascal_voc2012
# e.g. export DATASET_PATH=/data/ where VOCdevkit is located

Optionally, if using SBD dataset, export the SBD_DATASET_PATH environment variable:

export SBD_DATASET_PATH=/path/to/SBD/
# e.g. export SBD_DATASET_PATH=/data/SBD/  where "cls  img  inst  train.txt  train_noval.txt  val.txt" are located

Training

Single GPU

  • Adjust batch size for your GPU type in the configuration file: configs/baseline_dplv3_resnet101_sbd.py or configs/baseline_dplv3_resnet101.py

Run the following command:

CUDA_VISIBLE_DEVICES=0 python -u main.py training configs/baseline_dplv3_resnet101_sbd.py
# or without SBD 
# CUDA_VISIBLE_DEVICES=0 python -u main.py training configs/baseline_dplv3_resnet101.py

Multiple GPUs

  • Adjust total batch size for your GPUs in the configuration file: configs/baseline_dplv3_resnet101_sbd.py or configs/baseline_dplv3_resnet101.py
python -u -m torch.distributed.launch --nproc_per_node=2 --use_env main.py training configs/baseline_dplv3_resnet101_sbd.py
# or without SBD 
# python -u -m torch.distributed.launch --nproc_per_node=2 --use_env main.py training configs/baseline_dplv3_resnet101.py

Using Horovod as distributed framework

  • Adjust total batch size for your GPUs in the configuration file: configs/baseline_dplv3_resnet101_sbd.py or configs/baseline_dplv3_resnet101.py
horovodrun -np=2 python -u main.py training configs/baseline_dplv3_resnet101_sbd.py --backend="horovod"
# or without SBD
# horovodrun -np=2 python -u main.py training configs/baseline_dplv3_resnet101.py --backend="horovod"

Evaluation

Single GPU

CUDA_VISIBLE_DEVICES=0 python -u main.py eval configs/eval_baseline_dplv3_resnet101_sbd.py

Multiple GPUs

python -u -m torch.distributed.launch --nproc_per_node=2 --use_env main.py eval configs/eval_baseline_dplv3_resnet101_sbd.py

Using Horovod as distributed framework

horovodrun -np=2 python -u main.py eval configs/eval_baseline_dplv3_resnet101_sbd.py --backend="horovod"

Acknowledgements

Trainings were done using credits provided by AWS for open-source development via NumFOCUS and using trainml.ai platform.