On Calibrating Semantic Segmentation Models: Analyses and An Algorithm (CVPR 2023)

We provide a systematic study on the calibration of semantic segmentation models and propose a simple yet effective approach of selective scaling. Source code is released for selective scaling. Common questions could be discussed in the issues.

Miscalibration Obeservation for Semantic Segmentation Models

Requirements

Install Python 3.8 + Pytorch 1.11
Install MMSegmentation 0.25.0 from OpenMMLab. The support modules like mmdet 2.25.0 and mmcv-full 1.5.2 may be needed.

Benchmarks, Models, and Calibrators

Benchmarks	Models	Calibrators
ADE20K	Segmenter	Temperature Scaling
COCO-164K	SegFormer	Logistic Scaling
BDD100K	Knet-DeepLab	Dirichlet Scaling
DAVIS2016	Knet-SWIN	Local Temperature Scaling
SPACENET-7	ConvNeXt-V1	Meta-Cal
BraTs-2017		Ensembling
SYNTHIA

In Domain Experiments

Shift Domain Experiments

Ablation Study

We conducted ablation study by varying misprediction detection accuracy, integrating with different existing calibrators, and examining calibration errors across regions.

Reference for Quick Implementation

The general implantation framework is as follows:

Group validation images into calibrator training/validation/testing sets.
Label all pixels given the predictive correctness.
Train a binary classifier with training set and hyperparameters are tuned with validation set. Here, the training data pair is prepared with the predictive probability and correctness label.
Evaluate testing performance. Given the classifier’s prediction, a separate scaling on logits (before Softmax) is conducted. Correctly predicted pixels are scaled with 1 (i.e. non-scaling), while mispredictions are scaled with a larger temperature T2. When the classifier is more accurate, the temperature T2 can be more aggressively large, like 1e10. When the classifier’s correctness is moderately low, it is suggested that the temperature is around 2.
Note that background pixels with 255 or -1 or other index number are excluded from any set. The general setting: batch-size is 20 (i.e. 20 pixel-based probability vectors) and the optimizer is the default AdamW with weight decay of 1e-6. The training epoch is 40 and the best validated one is selected for evaluation. The training/validation/testing separation is randomly separated based upon random shuffling. So the performance may vary, but it should be not much.

Citation

@article{wang2022calibrating,
  title={On Calibrating Semantic Segmentation Models: Analyses and An Algorithm},
  author={Wang, Dongdong and Gong, Boqing and Wang, Liqiang},
  journal={arXiv preprint arXiv:2212.12053},
  year={2022}
}

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
Figures		Figures
calibrations		calibrations
checkpoints		checkpoints
README.md		README.md
calibration_dataloader.py		calibration_dataloader.py
calibration_models.py		calibration_models.py
calibrator_train.py		calibrator_train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

On Calibrating Semantic Segmentation Models: Analyses and An Algorithm (CVPR 2023)

Miscalibration Obeservation for Semantic Segmentation Models

Requirements

Benchmarks, Models, and Calibrators

In Domain Experiments

Shift Domain Experiments

Ablation Study

Reference for Quick Implementation

Citation

About

Releases

Packages

Languages

dwang181/selectivecal

Folders and files

Latest commit

History

Repository files navigation

On Calibrating Semantic Segmentation Models: Analyses and An Algorithm (CVPR 2023)

Miscalibration Obeservation for Semantic Segmentation Models

Requirements

Benchmarks, Models, and Calibrators

In Domain Experiments

Shift Domain Experiments

Ablation Study

Reference for Quick Implementation

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages