This repo holds the codes for Fast-SNN.
- Python 3.8.8
- Pytorch 1.8.1
For training quantized ANNs, we follow the protocol defined in Additive Powers-of-Two Quantization: An Efficient Non-uniform Discretization for Neural Networks
For more details, please refer to APoT_Quantization
For network architectures, we currently support AlexNet, VGG11 (in 'CIFAR10'), ResNet-20/32/44/56/110 (in 'CIFAR-10'), and ResNet-18 (in 'CIFAR10_resnet18'). For AlexNet, VGG11, and ResNet-20/32/44/56/110, we quantize both weights and activations. For ResNet-18, we quantize activations.
By default, the dataset is supposed to be in a 'data' folder at the same lavel of 'main.py'
We progressively train full precision, 4, 3, and 2 bit ANN models.
An example to train AlexNet:
python main.py --arch alex --bit 32 --wd 5e-4
python main.py --arch alex --bit 4 --wd 1e-4 --lr 4e-2 --init result/alex_32bit/model_best.pth.tar
python main.py --arch alex --bit 3 --wd 1e-4 --lr 4e-2 --init result/alex_4bit/model_best.pth.tar
python main.py --arch alex --bit 2 --wd 3e-5 --lr 4e-2 --init result/alex_3bit/model_best.pth.tar
The time steps of SNNs are automatically calculated from activation precision, i.e., T = 2^b-1. By default, we use signed IF neuron model.
optinal arguments:
--u Use unsigned IF neuron model
Example: AlexNet(SNN) performance with traditional unsigned IF neuron model. An 3/2-bit ANN is converted to an SNN with T=3/7.
python snn.py --arch alex --bit 3 -e -u --init result/alex_3bit/model_best.pth.tar
python snn.py --arch alex --bit 2 -e -u --init result/alex_2bit/model_best.pth.tar
Example: AlexNet(SNN) performance with signed IF neuron model. An 3/2-bit ANN is converted to an SNN with T=3/7.
python snn.py --arch alex --bit 3 -e -u --init result/alex_3bit/model_best.pth.tar
python snn.py --arch alex --bit 2 -e -u --init result/alex_2bit/model_best.pth.tar
By default, we use signed IF neuron model during fine-tuning.
optinal arguments:
--num_epochs / -n Number of epochs to fine-tune at each layer
default: 1
--force Always update fine-tuned parameters without evaluation on training data
Example: finetune converted SNN models.
python snn_ft.py --arch alex --bit 2 --force --init result/alex_2bit/model_best.pth.tar
python snn_ft.py --arch resnet18 --bit 2 --force --init result/resnet18_2bit/model_best.pth.tar
python snn_ft.py --arch resnet56 --bit 2 -n 8 --init result/resnet56_2bit/model_best.pth.tar
Model | 3-bit | 2-bit |
---|---|---|
AlexNet | alex_3bit | alex_2bit |
VGG11 | vgg11_3bit | vgg11_2bit |
ResNet20 | resnet20_3bit | resnet20_2bit |
ResNet44 | resnet44_3bit | resnet44_2bit |
ResNet56 | resnet56_3bit | resnet56_2bit |
ResNet18 | resnet18_3bit | resnet18_2bit |
We use distributed data parallel (DDP) for training. Please refer to Pytorch DDP for details.
To speed up data loading, we replace the vanilla Pytorch dataloader with nvidia-dali.
Nvidia-dali package
# for CUDA 10
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist nvidia-dali-cuda100
# for CUDA 11
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist nvidia-dali-cuda110
For more details on nvidia-dali, please refer to NVIDIA's official document NVIDIA DALI Documentation
For network architectures, we currently support AlexNet and VGG16.
With full-precision pre-trained models from TorchVision, we progressively 4, 3, and 2 bit ANN models.
An example to train AlexNet:
python -m torch.distributed.launch --nproc_per_node=4 dali_main.py -a alexnet -b 256 --bit 4 --workers 4 --lr=0.1 --epochs 60 --dali_cpu /data/imagenet2012
python -m torch.distributed.launch --nproc_per_node=4 dali_main.py -a alexnet -b 256 --bit 3 --init result/alexnet_4bit/model_best.pth.tar --workers 4 --lr=0.01 --epochs 60 --dali_cpu /data/imagenet2012
python -m torch.distributed.launch --nproc_per_node=4 dali_main.py -a alexnet -b 256 --bit 2 --init result/alexnet_3bit/model_best.pth.tar --workers 4 --lr=0.01 --epochs 60 --dali_cpu /data/imagenet2012
Example: AlexNet (SNN) performance with traditional unsigned IF neuron model. A 3/2-bit ANN is converted to an SNN with T=7/3.
python -m torch.distributed.launch --nproc_per_node=4 snn.py -a alexnet -b 256 -e -u --bit 3 --init result/alexnet_3bit/model_best.pth.tar --workers 4 --dali_cpu /data/imagenet2012
python -m torch.distributed.launch --nproc_per_node=4 snn.py -a alexnet -b 256 -e -u --bit 2 --init result/alexnet_2bit/model_best.pth.tar --workers 4 --dali_cpu /data/imagenet2012
Example: AlexNEt (SNN) performance with signed IF neuron model. A 3/2-bit ANN is converted to an SNN with T=7/3.
python -m torch.distributed.launch --nproc_per_node=4 snn.py -a alexnet -b 256 -e --bit 3 --init result/alexnet_3bit/model_best.pth.tar --workers 4 --dali_cpu /data/imagenet2012
python -m torch.distributed.launch --nproc_per_node=4 snn.py -a alexnet -b 256 -e --bit 2 --init result/alexnet_2bit/model_best.pth.tar --workers 4 --dali_cpu /data/imagenet2012
By default, we use signed IF neuron model in fine-tuning.
Example:
python -m torch.distributed.launch --nproc_per_node=4 snn_ft.py -a alexnet -b 128 --bit 3 -n 8 --init result/alexnet_3bit/model_best.pth.tar --workers 4 --dali_cpu /data/imagenet2012
python -m torch.distributed.launch --nproc_per_node=4 snn_ft.py -a alexnet -b 128 --bit 2 -n 8 --init result/alexnet_2bit/model_best.pth.tar --workers 4 --dali_cpu /data/imagenet2012
Model | 3-bit | 2-bit |
---|---|---|
AlexNet | alexnet_3bit | alexnet_2bit |
VGG16 | vgg16_3bit | vgg16_2bit |
We use yolov2-yolov3_PyTorch as the framework for object detection.
About required packages and datasets, please refer to README in yolov2-yolov3_PyTorch for preparation. In the 'object detection' folder, we also prepare a merged README detailing everything.
We currently support Tiny YOLO and YOLOv2 with a ResNet-34 backbone.
optinal arguments:
--version / -v Supported architecture
available: yolov2_tiny, yolov2_r34
Example: train Tiny YOLO with activations qunatized to 32/4/3/2 bits.
python -m torch.distributed.launch --nproc_per_node=4 train.py -d voc -v yolov2_tiny -ms --ema --sybn --batch_size 4 --bit 32
python -m torch.distributed.launch --nproc_per_node=4 train.py -d voc -v yolov2_tiny -ms --ema --sybn --batch_size 4 --bit 4 --init CHECKPOINT_PATH
python -m torch.distributed.launch --nproc_per_node=4 train.py -d voc -v yolov2_tiny -ms --ema --sybn --batch_size 4 --bit 3 --init CHECKPOINT_PATH
python -m torch.distributed.launch --nproc_per_node=4 train.py -d voc -v yolov2_tiny -ms --ema --sybn --batch_size 4 --bit 2 --init CHECKPOINT_PATH
optinal arguments:
--spike Evaluate with spikes (as SNNs)
Example: evaluate Tiny YOLO (SNN) with T = 15, 7, 3
python eval.py -d voc --cuda -v yolov2_tiny --bit 4 --spike --init CHECKPOINT_PATH
python eval.py -d voc --cuda -v yolov2_tiny --bit 3 --spike --init CHECKPOINT_PATH
python eval.py -d voc --cuda -v yolov2_tiny --bit 2 --spike --init CHECKPOINT_PATH
Model | 4-bit | 3-bit | 2-bit |
---|---|---|---|
Tiny Yolo | yolov2_tiny_4bit | yolov2_tiny_3bit | yolov2_tiny_2bit |
YoloV2(ResNet-34) | yolov2_r34_4bit | yolov2_r34_3bit | yolov2_r34_2bit |
Example: train Tiny YOLO with activations qunatized to 32/4/3/2 bits.
python -m torch.distributed.launch --nproc_per_node=4 train.py -d coco -v yolov2_tiny --bit 32 -ms --ema --sybn --batch_size 4
python -m torch.distributed.launch --nproc_per_node=4 train.py -d coco -v yolov2_tiny --bit 4 -ms --ema --sybn --batch_size 4 --init CHECKPOINT_PATH
python -m torch.distributed.launch --nproc_per_node=4 train.py -d coco -v yolov2_tiny --bit 3 -ms --ema --sybn --batch_size 4 --init CHECKPOINT_PATH
python -m torch.distributed.launch --nproc_per_node=4 train.py -d coco -v yolov2_tiny --bit 2 -ms --ema --sybn --batch_size 4 --init CHECKPOINT_PATH
Example: evaluate Tiny YOLO (SNN) with T = 15, 7, 3
python eval.py -d coco-val --cuda -v yolov2_tiny --bit 4 --spike --init CHECKPOINT_PATH
python eval.py -d coco-val --cuda -v yolov2_tiny --bit 3 --spike --init CHECKPOINT_PATH
python eval.py -d coco-val --cuda -v yolov2_tiny --bit 2 --spike --init CHECKPOINT_PATH
Model | 4-bit | 3-bit | 2-bit |
---|---|---|---|
Tiny Yolo | yolov2_tiny_4bit | yolov2_tiny_3bit | yolov2_tiny_2bit |
YoloV2(ResNet-34) | yolov2_r34_4bit | yolov2_r34_3bit | yolov2_r34_2bit |
We use vedaseg, an open source semantic segmentation toolbox based on PyTorch, as the framework for semantic segmentation.
About required packages and datasets, please refer to README in vedaseg for preparation. In the 'semantic segmentation' folder, we also prepare a merged README detailing everything.
We currently support Deeplabv1 (VGG9) and Deeplabv3 (ResNet-34 + ASPP).
Example: train VGG9 with activations qunatized to 32/4/3/2 bits.
bash ./tools/dist_train.sh configs/voc_deeplabv1.py "0, 1, 2, 3"
bash ./tools/dist_train.sh configs/voc_deeplabv1_4bit.py "0, 1, 2, 3"
bash ./tools/dist_train.sh configs/voc_deeplabv1_3bit.py "0, 1, 2, 3"
bash ./tools/dist_train.sh configs/voc_deeplabv1_2bit.py "0, 1, 2, 3"
Example: train ResNet-34 + ASPP with activations qunatized to 32/4/3/2 bits.
bash ./tools/dist_train.sh configs/voc_deeplabv3.py "0, 1, 2, 3"
bash ./tools/dist_train.sh configs/voc_deeplabv3_4bit.py "0, 1, 2, 3"
bash ./tools/dist_train.sh configs/voc_deeplabv3_3bit.py "0, 1, 2, 3"
bash ./tools/dist_train.sh configs/voc_deeplabv3_2bit.py "0, 1, 2, 3"
Example: evaluate VGG9 (SNN) with T = 15, 7, 3
bash ./tools/dist_test.sh configs/voc_deeplabv1_T15.py './workdir/voc_deeplabv1_4bit/best_mIoU.pth' "0, 1, 2, 3"
bash ./tools/dist_test.sh configs/voc_deeplabv1_T7.py './workdir/voc_deeplabv1_3bit/best_mIoU.pth' "0, 1, 2, 3"
bash ./tools/dist_test.sh configs/voc_deeplabv1_T3.py './workdir/voc_deeplabv1_2bit/best_mIoU.pth' "0, 1, 2, 3"
Example: evaluate ResNet-34 + ASPP (SNN) with T = 15, 7, 3
bash ./tools/dist_test.sh configs/voc_deeplabv3_T15.py './workdir/voc_deeplabv3_4bit/best_mIoU.pth' "0, 1, 2, 3"
bash ./tools/dist_test.sh configs/voc_deeplabv3_T7.py './workdir/voc_deeplabv3_3bit/best_mIoU.pth' "0, 1, 2, 3"
bash ./tools/dist_test.sh configs/voc_deeplabv3_T3.py './workdir/voc_deeplabv3_2bit/best_mIoU.pth' "0, 1, 2, 3"
Model | 4-bit | 3-bit | 2-bit |
---|---|---|---|
VGG-9 | voc_deeplabv1_4bit | voc_deeplabv1_3bit | voc_deeplabv1_2bit |
ResNet-34 + ASPP | voc_deeplabv3_4bit | voc_deeplabv3_3bit | voc_deeplabv3_2bit |
Example: train VGG9 with activations qunatized to 32/4/3/2 bits.
bash ./tools/dist_train.sh configs/coco_deeplabv1.py "0, 1, 2, 3, 6, 7"
bash ./tools/dist_train.sh configs/coco_deeplabv1_4bit.py "0, 1, 2, 3, 6, 7"
bash ./tools/dist_train.sh configs/coco_deeplabv1_3bit.py "0, 1, 2, 3"
bash ./tools/dist_train.sh configs/coco_deeplabv1_2bit.py "0, 1, 2, 3"
Example: train ResNet-34 + ASPP with activations qunatized to 32/4/3/2 bits.
bash ./tools/dist_train.sh configs/coco_deeplabv3.py "0, 1, 2, 3"
bash ./tools/dist_train.sh configs/coco_deeplabv3_4bit.py "0, 1, 2, 3"
bash ./tools/dist_train.sh configs/coco_deeplabv3_3bit.py "0, 1, 2, 3"
bash ./tools/dist_train.sh configs/coco_deeplabv3_2bit.py "0, 1, 2, 3"
Example: evaluate VGG9 (SNN) with T = 15, 7, 3
bash ./tools/dist_test.sh configs/coco_deeplabv1_T15.py './workdir/coco_deeplabv1_4bit/best_mIoU.pth' "0, 1, 2, 3"
bash ./tools/dist_test.sh configs/coco_deeplabv1_T7.py './workdir/coco_deeplabv1_3bit/best_mIoU.pth' "0, 1, 2, 3"
bash ./tools/dist_test.sh configs/coco_deeplabv1_T3.py './workdir/coco_deeplabv1_2bit/best_mIoU.pth' "0, 1, 2, 3"
Example: evaluate ResNet-34 + ASPP (SNN) with T = 15, 7, 3
bash ./tools/dist_test.sh configs/coco_deeplabv3_T15.py './workdir/coco_deeplabv3_4bit/best_mIoU.pth' "0, 1, 2, 3"
bash ./tools/dist_test.sh configs/coco_deeplabv3_T7.py './workdir/coco_deeplabv3_3bit/best_mIoU.pth' "0, 1, 2, 3"
bash ./tools/dist_test.sh configs/coco_deeplabv3_T3.py './workdir/coco_deeplabv3_2bit/best_mIoU.pth' "0, 1, 2, 3"
Model | 4-bit | 3-bit | 2-bit |
---|---|---|---|
VGG-9 | coco_deeplabv1_4bit | coco_deeplabv1_3bit | coco_deeplabv1_2bit |
ResNet-34 + ASPP | coco_deeplabv3_4bit | coco_deeplabv3_3bit | coco_deeplabv3_2bit |