Skip to content

chenyaofo/image-classification-codebase

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Image Classification Codebase

This project aims to provide a codebase for the image classification task implemented by PyTorch. It does not use any high-level deep learning libraries (such as pytorch-lightening or MMClassification). Thus, it should be easy to follow and modified.

Requirements

The code is tested on python==3.9, pyhocon==0.3.57, torch=1.8.0, torchvision=0.9.0

Get Started

You can get started with a resnet20 convolution network on cifar10 with the following command.

Single node, single GPU:

CUDA_VISIBLE_DEVICES=0 python -m entry.run --conf conf/cifar10.conf -o output/cifar10/resnet20

Tips: run CUDA_VISIBLE_DEVICES=0 python -m entry.run --conf conf/resnet50-benchmark.conf -o output/benchmark to check throughput performance, more details can be found at doc/benchmark.md

You can use multiple GPUs to accelerate the training with distributed data parallel:

Single node, multiple GPUs:

CUDA_VISIBLE_DEVICES=0,1 python -m entry.run --world-size 2 \
--conf conf/cifar10.conf -o output/cifar10/resnet20

Multiple nodes:

Node 0:

CUDA_VISIBLE_DEVICES=0,1 python -m entry.run --world-size 4 --dist-url \
'tcp://IP_OF_NODE0:FREEPORT' --node-rank 0 --conf conf/cifar10.conf -o output/cifar10/resnet20

Node 1:

CUDA_VISIBLE_DEVICES=0,1 python -m entry.run --world-size 4 --dist-url \
'tcp://IP_OF_NODE1:FREEPORT' --node-rank 1 --conf conf/cifar10.conf -o output/cifar10/resnet20

Features

This codebase adopt configuration file (.hocon) to store the hyperparameters (such as the learning rate, training epochs and etc.). If you want to modify the configuration hyperparameters, you have two ways:

  1. Modify the configuration file to generate a new file.

  2. You can add -M in the running command line to modify the hyperparameters temporarily.

For example, if you hope to modify the total training epochs to 100 and the learning rate to 0.05. You can run the following command:

CUDA_VISIBLE_DEVICES=0 python -m entry.run --conf conf/cifar10.conf -o output/cifar10/resnet20 -M max_epochs=100 optimizer.lr=0.05

If you modify a non existing hyperparameter, the code will raise an exception.

To list all valid hyperparameters names, you can run the following command:

pyhocon -i conf/cifar10.conf -f properties
  1. We use NVIDIA DALI to accelerate the data preprocessing on ImageNet (use it by the flag data.use_dali) and tfrecord format to store the ImageNet (create the tfrecords by tools/make_tfrecord.py and use it by the flag data.use_tfrecord).

Finally, enjoy the code.

Cite

@misc{chen2020image,
  author = {Yaofo Chen},
  title = {Image Classification Codebase},
  year = {2021},
  howpublished = {\url{https://github.com/chenyaofo/image-classification-codebase}}
}