Due to the various architectural optimizations, ICNet is one of the few proposed models that can perform semantic segmentation in real-time. Other models such as SegNet are similar in efficiency, but are far less accurate.
It is important to note that the ICNet model implemented in this project does not exactly match the inference model that was released by the original ICNet author. The primary difference is that the original ICNet uses a unreleased and proprietary version of ResNet50. When investigating their prototxt, you will notice that the input to their ResNet model has 2 additional Conv layers before the first Average Pool. These layers are not found in the original ResNet50 released by Kaiming He.
To get past this hurdle, an open source implementation of ResNet found in TF Slim was used. Additionally, the ICNet implementation in this project has the second branch stem out from a different layer than the original model. However, these modifications are minor and should not change the performance.
If you wish to avoid going through the trouble of training multiple times, you can train ICNet directly from classifications weights. This will be similar to what is done in the PSPNet walkthrough here. The same general procedure can be followed with ICNet. Note that this will produce worse results than the two-stage procedure.
In order to replicate the training procedure in the original ICNet paper, multiple steps must be taken. In particular, transfer learning must be done from the baseline PSPNet50 model. Compression must also then be used.
- Stage 0 ~ Pre-train a PSPNet50 model: First, a PSPNet50 model is trained on weights initialized from a dilated ResNet50 model. Using a similar training procedure as described in the original paper (with a crop size of 768, 120K training iterations and an initial learning rate of 0.01), the PSPNet50 model in this project was trained and converged at approximately 74% mIoU.
- Stage 1 ~ Initialize ICNet Branch from PSPNet50: With a base PSPNet50 model trained, the second stage of training can begin by initializing the ICNet quarter resolution branch with the pre-trained PSPNet50 model (with a crop size of 1024, 200K training iterations and an initial learning rate of 0.001). Initializing ICNet from these weights allowed for convergence at accuracies similar to the original ICNet paper.
- Stage 2 ~ Compression and Retraining: Once the base ICNet model is trained, we must prune half of the kernels to achieve the performance of the original paper. This is a process where kernels are removed from each convolutional layer iteratively. After the kernels are pruned, the pruned model must be retrained a final time to recover from the lost accuracy during pruning.
The following instructions will provide a step-by-step guide for training ICNet on the Cityscapes dataset. It is assumed you have access to one or two NVIDIA Titan 1080 Ti GPUs or other equivalent graphics cards.
To start, it is assumed the PSPNet walkthrough was followed (as detailed here) to obtain TFRecords for the training set and validation set. You should also have either trained your own PSPNet50 model or downloaded a pre-trained version of the model from the Model Zoo here.
We start with the first stage by training ICNet from PSPNet50 weights. Assuming you will be using the pretrained model, first download and extract the model
# Download the archive from the Google Drive link then
mkdir -p 0818_pspnet50_1.0_713_resnet_v1
tar -zxvf 0818_pspnet50_1.0_713_resnet_v1.tar.gz -C 0818_pspnet50_1.0_713_resnet_v1
Next, we setup the configuration file. Copy and modify the supplied stage 1 configuration file located at:
configs/two_stage_icnet_1.0_1025_resnet_v1_stage_1.config
It will contain the required hyperparameters for training. You must specify your dataset location as usual. You should modify the following checkpoint field with the location of your PSPNet checkpoint file
train_config: {
# YOUR PSPNet CHECKPOINT LOCATION
fine_tune_checkpoint: "0818_pspnet50_1.0_713_resnet_v1/model.ckpt"
...
}
Using this config file, we can start the first stage training process. As with PSPNet, memory will most likely be limited so we will train with gradient checkpointing. We will use the train_mem_saving.py
script. The following nodes can be used for training ICNet with gradient checkpointing
SharedFeatureExtractor/resnet_v1_50/block1/unit_3/bottleneck_v1/Relu
SharedFeatureExtractor/resnet_v1_50/block2/unit_4/bottleneck_v1/Relu
SharedFeatureExtractor/resnet_v1_50/block3/unit_6/bottleneck_v1/Relu
SharedFeatureExtractor/resnet_v1_50/block4/unit_3/bottleneck_v1/Relu
FastPSPModule/Conv/Relu6
CascadeFeatureFusion_0/Relu
CascadeFeatureFusion_1/Relu
To start a single-GPU training session, make a directory to save checkpoints to. For example, we make a directory /tmp/icnet_1.0_1025_resnet_v1_stage_1_TRAIN
. Then start training on the first GPU by running
# The checkpointing nodes are supplied in the script by default for ICNet
python train_mem_saving.py \
--config_path configs/two_stage_icnet_1.0_1025_resnet_v1_stage_1_CUSTOMIZED.config \
--logdir /tmp/icnet_1.0_1025_resnet_v1_stage_1_TRAIN \
--test_image_summaries \
--log_memory
To see evaluation results during training, create another directory at /tmp/icnet_1.0_1025_resnet_v1_stage_1_EVAL
and then run the evaluation script (targeting another GPU) with
# set CUDA_VISIBLE_DEVICES to another GPU
python eval.py \
--config_path configs/two_stage_icnet_1.0_1025_resnet_v1_stage_1_CUSTOMIZED.config \
--train_dir /tmp/icnet_1.0_1025_resnet_v1_stage_1_TRAIN \
--eval_dir /tmp/icnet_1.0_1025_resnet_v1_stage_1_EVAL \
--verbose # will log mIoU accuracy
Once training has finished, use Tensorboard to find the checkpoint with the highest resulting mIoU. We will use this for training the second stage. To open Tensorboard run
tensorboard --logdir /tmp/icnet_1.0_1025_resnet_v1_stage_1_EVAL
We will create a directory for storing our compressed model at /tmp/icnet_1.0_1025_resnet_v1_stage_1_COMPRESS
. Before running compression, we must first export a inference graph without training nodes. To do this, first run the export script with
python export.py \
--input_shape 1024,2048,3 \
--pad_to_shape 1025,2049 \
--config_path configs/two_stage_icnet_1.0_1025_resnet_v1_stage_1_CUSTOMIZED.config \
--trained_checkpoint tmp/test_cleanup_icnet/model.ckpt-XXX-XXX # replace with your best model from previous run
--output_dir /tmp/icnet_1.0_1025_resnet_v1_stage_1_COMPRESS
Then, using the output inference graph from the export script, run the compression script with
python compress.py \
--prune_config configs/compression/icnet_resnet_v1_pruner_v2.config \
--input_checkpoint /tmp/icnet_1.0_1025_resnet_v1_stage_1_COMPRESS/model.ckpt \ # output from export script
--output_dir /tmp/icnet_1.0_1025_resnet_v1_stage_1_COMPRESS \
--compression_factor 0.5 \ # We will compress to half
--interactive # If we want to visualize the kernels being removed
Now that we have a compressed model, we can retrain. As with the first stage config, copy and modify the supplied stage 2 configuration file located at:
configs/two_stage_icnet_1.0_1025_resnet_v1_stage_2.config
It will contain the required hyperparameters for training. You should modify same fields as before, making sure to point to the compressed model for initialization. Notice that in this config, the filter_scale
field is set to 0.5 instead of 1.0. Then run training and evaluation as before which will produce your final model.