Merge branch 'facebookresearch-master'

Ricardozzf · Jun 25, 2019 · ed234f5 · ed234f5
2 parents f7f2656 + 25b2bd8
commit ed234f5
Show file tree

Hide file tree

Showing 69 changed files with 7,922 additions and 231 deletions.
diff --git a/.gitignore b/.gitignore
@@ -25,6 +25,13 @@ dist/
 # Pycharm editor settings
 .idea
 
+# vscode editor settings
+.vscode
+
+# MacOS
+.DS_Store
+
 # project dirs
 /datasets
 /models
+/output
diff --git a/INSTALL.md b/INSTALL.md
@@ -7,7 +7,7 @@
 - yacs
 - matplotlib
 - GCC >= 4.9
-- (optional) OpenCV for the webcam demo
+- OpenCV
 
 
 ### Option 1: Step-by-step installation
@@ -24,7 +24,7 @@ conda activate maskrcnn_benchmark
 conda install ipython
 
 # maskrcnn_benchmark and coco api dependencies
-pip install ninja yacs cython matplotlib tqdm
+pip install ninja yacs cython matplotlib tqdm opencv-python
 
 # follow PyTorch installation in https://pytorch.org/get-started/locally/
 # we give the instructions for CUDA 9.0
@@ -38,6 +38,12 @@ git clone https://github.com/cocodataset/cocoapi.git
 cd cocoapi/PythonAPI
 python setup.py build_ext install
 
+# install apex
+cd $INSTALL_DIR
+git clone https://github.com/NVIDIA/apex.git
+cd apex
+python setup.py install --cuda_ext --cpp_ext
+
 # install PyTorch Detection
 cd $INSTALL_DIR
 git clone https://github.com/facebookresearch/maskrcnn-benchmark.git
@@ -55,7 +61,58 @@ unset INSTALL_DIR
 # or if you are on macOS
 # MACOSX_DEPLOYMENT_TARGET=10.9 CC=clang CXX=clang++ python setup.py build develop
 ```
+#### Windows 10 
+```bash
+open a cmd and change to desired installation directory 
+from now on will be refered as INSTALL_DIR 
+conda create --name maskrcnn_benchmark
+conda activate maskrcnn_benchmark
+
+# this installs the right pip and dependencies for the fresh python
+conda install ipython
 
+# maskrcnn_benchmark and coco api dependencies
+pip install ninja yacs cython matplotlib tqdm opencv-python
+
+# follow PyTorch installation in https://pytorch.org/get-started/locally/
+# we give the instructions for CUDA 9.0
+## Important : check the cuda version installed on your computer by running the command in the cmd : 
+nvcc -- version 
+conda install -c pytorch pytorch-nightly torchvision cudatoolkit=9.0
+
+git clone https://github.com/cocodataset/cocoapi.git
+
+    #To prevent installation error do the following after commiting cocooapi : 
+    #using file explorer  naviagate to cocoapi\PythonAPI\setup.py and change line 14 from:
+    #extra_compile_args=['-Wno-cpp', '-Wno-unused-function', '-std=c99'],
+    #to
+    #extra_compile_args={'gcc': ['/Qstd=c99']},
+    #Based on  https://github.com/cocodataset/cocoapi/issues/51
+
+cd cocoapi/PythonAPI
+python setup.py build_ext install
+
+# navigate back to INSTALL_DIR
+cd ..
+cd .. 
+# install apex
+
+git clone https://github.com/NVIDIA/apex.git
+cd apex
+python setup.py install --cuda_ext --cpp_ext
+# navigate back to INSTALL_DIR
+cd .. 
+# install PyTorch Detection
+
+git clone https://github.com/Idolized22/maskrcnn-benchmark.git
+cd maskrcnn-benchmark
+
+# the following will install the lib with
+# symbolic links, so that you can modify
+# the files if you want and won't need to
+# re-build it
+python setup.py build develop
+```
 ### Option 2: Docker Image (Requires CUDA, Linux only)
 
 Build image with defaults (`CUDA=9.0`, `CUDNN=7`, `FORCE_CUDA=1`):

diff --git a/README.md b/README.md
@@ -10,6 +10,7 @@ creating detection and segmentation models using PyTorch 1.0.
 - **Very fast**: up to **2x** faster than [Detectron](https://github.com/facebookresearch/Detectron) and **30%** faster than [mmdetection](https://github.com/open-mmlab/mmdetection) during training. See [MODEL_ZOO.md](MODEL_ZOO.md) for more details.
 - **Memory efficient:** uses roughly 500MB less GPU memory than mmdetection during training
 - **Multi-GPU training and inference**
+- **Mixed precision training:** trains faster with less GPU memory on [NVIDIA tensor cores](https://developer.nvidia.com/tensor-cores).
 - **Batched inference:** can perform inference using multiple images per batch per GPU
 - **CPU support for inference:** runs on CPU in inference time. See our [webcam demo](demo) for an example
 - Provides pre-trained models for almost all reference Mask R-CNN and Faster R-CNN configurations with 1x schedule.
@@ -129,7 +130,7 @@ you'll also need to change the learning rate, the number of iterations and the l
 
 Here is an example for Mask R-CNN R-50 FPN with the 1x schedule:
 ```bash
-python tools/train_net.py --config-file "configs/e2e_mask_rcnn_R_50_FPN_1x.yaml" SOLVER.IMS_PER_BATCH 2 SOLVER.BASE_LR 0.0025 SOLVER.MAX_ITER 720000 SOLVER.STEPS "(480000, 640000)" TEST.IMS_PER_BATCH 1
+python tools/train_net.py --config-file "configs/e2e_mask_rcnn_R_50_FPN_1x.yaml" SOLVER.IMS_PER_BATCH 2 SOLVER.BASE_LR 0.0025 SOLVER.MAX_ITER 720000 SOLVER.STEPS "(480000, 640000)" TEST.IMS_PER_BATCH 1 MODEL.RPN.FPN_POST_NMS_TOP_N_TRAIN 2000
 ```
 This follows the [scheduling rules from Detectron.](https://github.com/facebookresearch/Detectron/blob/master/configs/getting_started/tutorial_1gpu_e2e_faster_rcnn_R-50-FPN.yaml#L14-L30)
 Note that we have multiplied the number of iterations by 8x (as well as the learning rate schedules),
@@ -138,6 +139,7 @@ and we have divided the learning rate by 8x.
 We also changed the batch size during testing, but that is generally not necessary because testing
 requires much less memory than training.
 
+Furthermore, we set `MODEL.RPN.FPN_POST_NMS_TOP_N_TRAIN 2000` as the proposals are selected for per the batch rather than per image in the default training. The value is calculated by **1000 x images-per-gpu**. Here we have 2 images per GPU, therefore we set the number as 1000 x 2 = 2000. If we have 8 images per GPU, the value should be set as 8000. Note that this does not apply if `MODEL.RPN.FPN_POST_NMS_PER_BATCH` is set to `False` during training. See [#672](https://github.com/facebookresearch/maskrcnn-benchmark/issues/672) for more details.
 
 ### Multi-GPU training
 We use internally `torch.distributed.launch` in order to launch
@@ -147,8 +149,26 @@ process will only use a single GPU.
 
 ```bash
 export NGPUS=8
-python -m torch.distributed.launch --nproc_per_node=$NGPUS /path_to_maskrcnn_benchmark/tools/train_net.py --config-file "path/to/config/file.yaml"
+python -m torch.distributed.launch --nproc_per_node=$NGPUS /path_to_maskrcnn_benchmark/tools/train_net.py --config-file "path/to/config/file.yaml" MODEL.RPN.FPN_POST_NMS_TOP_N_TRAIN images_per_gpu x 1000
 ```
+Note we should set `MODEL.RPN.FPN_POST_NMS_TOP_N_TRAIN` follow the rule in Single-GPU training.
+
+### Mixed precision training
+We currently use [APEX](https://github.com/NVIDIA/apex) to add [Automatic Mixed Precision](https://developer.nvidia.com/automatic-mixed-precision) support. To enable, just do Single-GPU or Multi-GPU training and set `DTYPE "float16"`.
+
+```bash
+export NGPUS=8
+python -m torch.distributed.launch --nproc_per_node=$NGPUS /path_to_maskrcnn_benchmark/tools/train_net.py --config-file "path/to/config/file.yaml" MODEL.RPN.FPN_POST_NMS_TOP_N_TRAIN images_per_gpu x 1000 DTYPE "float16"
+```
+If you want more verbose logging, set `AMP_VERBOSE True`. See [Mixed Precision Training guide](https://docs.nvidia.com/deeplearning/sdk/mixed-precision-training/index.html) for more details.
+
+## Evaluation
+You can test your model directly on single or multiple gpus. Here is an example for Mask R-CNN R-50 FPN with the 1x schedule on 8 GPUS:
+```bash
+export NGPUS=8
+python -m torch.distributed.launch --nproc_per_node=$NGPUS /path_to_maskrcnn_benchmark/tools/test_net.py --config-file "configs/e2e_mask_rcnn_R_50_FPN_1x.yaml" TEST.IMS_PER_BATCH 16
+```
+To calculate mAP for each class, you can simply modify a few lines in [coco_eval.py](https://github.com/facebookresearch/maskrcnn-benchmark/blob/master/maskrcnn_benchmark/data/datasets/evaluation/coco/coco_eval.py). See [#524](https://github.com/facebookresearch/maskrcnn-benchmark/issues/524#issuecomment-475118810) for more details.
 
 ## Abstractions
 For more information on some of the main abstractions in our implementation, see [ABSTRACTIONS.md](ABSTRACTIONS.md).
@@ -243,8 +263,9 @@ note = {Accessed: [Insert date here]}
 - [RetinaMask: Learning to predict masks improves state-of-the-art single-shot detection for free](https://arxiv.org/abs/1901.03353). 
   Cheng-Yang Fu, Mykhailo Shvets, and Alexander C. Berg.
   Tech report, arXiv,1901.03353.
-
-
+- [FCOS: Fully Convolutional One-Stage Object Detection](https://arxiv.org/abs/1904.01355).
+  Zhi Tian, Chunhua Shen, Hao Chen and Tong He.
+  Tech report, arXiv,1904.01355. [[code](https://github.com/tianzhi0549/FCOS)]
 
 ## License
 

diff --git a/configs/cityscapes/README.md b/configs/cityscapes/README.md
@@ -0,0 +1,217 @@
+### Paper 
+1 [mask-rcnn](https://arxiv.org/pdf/1703.06870.pdf)  
+
+
+### dataset
+1 [cityscapesScripts](https://github.com/mcordts/cityscapesScripts)  
+
+
+### Performance (from paper)
+|      case    | training data | im/gpu | mask AP[val] | mask AP [test] | mask AP50 [test] |
+|--------------|:-------------:|:------:|:------------:|:--------------:|-----------------:|
+|   R-50-FPN   | fine          |   8/8  |    31.5      | 26.2           | 49.9             |
+|   R-50-FPN   | fine + COCO   |   8/8  |    36.4      | 32.0           | 58.1             |
+
+
+### Note (from paper)
+We apply our Mask R-CNN models with the ResNet-FPN-50 backbone; we found the 101-layer counterpart performs similarly due to the small dataset size. We train with image scale (shorter side) randomly sampled from [800, 1024], which reduces overfitting; inference is on a single scale of 1024 pixels. We use a mini-batch size of 1 image per GPU (so 8 on 8 GPUs) and train the model for 24k iterations, starting from a learning rate of 0.01 and reducing it to 0.001 at 18k iterations. It takes ∼4 hours of training on a single 8-GPU machine under this setting.  
+
+
+### Implemetation (for finetuning from coco trained model)
+Step 1: download trained model on coco dataset from [model zoo](https://download.pytorch.org/models/maskrcnn/e2e_mask_rcnn_R_50_FPN_1x.pth)  
+Step 2: do the model surgery on the trained model as below and use it as `pretrained model` for finetuning:    
+```python
+def clip_weights_from_pretrain_of_coco_to_cityscapes(f, out_file):
+	""""""
+	# COCO categories for pretty print
+	COCO_CATEGORIES = [
+	    "__background__",
+	    "person",
+	    "bicycle",
+	    "car",
+	    "motorcycle",
+	    "airplane",
+	    "bus",
+	    "train",
+	    "truck",
+	    "boat",
+	    "traffic light",
+	    "fire hydrant",
+	    "stop sign",
+	    "parking meter",
+	    "bench",
+	    "bird",
+	    "cat",
+	    "dog",
+	    "horse",
+	    "sheep",
+	    "cow",
+	    "elephant",
+	    "bear",
+	    "zebra",
+	    "giraffe",
+	    "backpack",
+	    "umbrella",
+	    "handbag",
+	    "tie",
+	    "suitcase",
+	    "frisbee",
+	    "skis",
+	    "snowboard",
+	    "sports ball",
+	    "kite",
+	    "baseball bat",
+	    "baseball glove",
+	    "skateboard",
+	    "surfboard",
+	    "tennis racket",
+	    "bottle",
+	    "wine glass",
+	    "cup",
+	    "fork",
+	    "knife",
+	    "spoon",
+	    "bowl",
+	    "banana",
+	    "apple",
+	    "sandwich",
+	    "orange",
+	    "broccoli",
+	    "carrot",
+	    "hot dog",
+	    "pizza",
+	    "donut",
+	    "cake",
+	    "chair",
+	    "couch",
+	    "potted plant",
+	    "bed",
+	    "dining table",
+	    "toilet",
+	    "tv",
+	    "laptop",
+	    "mouse",
+	    "remote",
+	    "keyboard",
+	    "cell phone",
+	    "microwave",
+	    "oven",
+	    "toaster",
+	    "sink",
+	    "refrigerator",
+	    "book",
+	    "clock",
+	    "vase",
+	    "scissors",
+	    "teddy bear",
+	    "hair drier",
+	    "toothbrush",
+	]
+	# Cityscapes of fine categories for pretty print
+	CITYSCAPES_FINE_CATEGORIES = [
+	    "__background__",
+	    "person",
+	    "rider",
+	    "car",
+	    "truck",
+	    "bus",
+	    "train",
+	    "motorcycle",
+	    "bicycle",
+	]
+	coco_cats = COCO_CATEGORIES
+	cityscapes_cats = CITYSCAPES_FINE_CATEGORIES
+	coco_cats_to_inds = dict(zip(coco_cats, range(len(coco_cats))))
+	cityscapes_cats_to_inds = dict(
+		zip(cityscapes_cats, range(len(cityscapes_cats)))
+	)
+
+	checkpoint = torch.load(f)
+	m = checkpoint['model']
+
+	weight_names = {
+		"cls_score": "module.roi_heads.box.predictor.cls_score.weight", 
+		"bbox_pred": "module.roi_heads.box.predictor.bbox_pred.weight", 
+		"mask_fcn_logits": "module.roi_heads.mask.predictor.mask_fcn_logits.weight", 
+	}
+	bias_names = {
+		"cls_score": "module.roi_heads.box.predictor.cls_score.bias",
+		"bbox_pred": "module.roi_heads.box.predictor.bbox_pred.bias", 
+		"mask_fcn_logits": "module.roi_heads.mask.predictor.mask_fcn_logits.bias",
+	}
+
+	representation_size = m[weight_names["cls_score"]].size(1)
+	cls_score = nn.Linear(representation_size, len(cityscapes_cats))
+	nn.init.normal_(cls_score.weight, std=0.01)
+	nn.init.constant_(cls_score.bias, 0)
+
+	representation_size = m[weight_names["bbox_pred"]].size(1)
+	class_agnostic = m[weight_names["bbox_pred"]].size(0) != len(coco_cats) * 4
+	num_bbox_reg_classes = 2 if class_agnostic else len(cityscapes_cats)
+	bbox_pred = nn.Linear(representation_size, num_bbox_reg_classes * 4)
+	nn.init.normal_(bbox_pred.weight, std=0.001)
+	nn.init.constant_(bbox_pred.bias, 0)
+
+	dim_reduced = m[weight_names["mask_fcn_logits"]].size(1)
+	mask_fcn_logits = Conv2d(dim_reduced, len(cityscapes_cats), 1, 1, 0)
+	nn.init.constant_(mask_fcn_logits.bias, 0)
+	nn.init.kaiming_normal_(
+		mask_fcn_logits.weight, mode="fan_out", nonlinearity="relu"
+	)
+
+	def _copy_weight(src_weight, dst_weight):
+		for ix, cat in enumerate(cityscapes_cats):
+			if cat not in coco_cats:
+				continue
+			jx = coco_cats_to_inds[cat]
+			dst_weight[ix] = src_weight[jx]
+		return dst_weight
+
+	def _copy_bias(src_bias, dst_bias, class_agnostic=False):
+		if class_agnostic:
+			return dst_bias
+		return _copy_weight(src_bias, dst_bias)
+
+	m[weight_names["cls_score"]] = _copy_weight(
+		m[weight_names["cls_score"]], cls_score.weight
+	)
+	m[weight_names["bbox_pred"]] = _copy_weight(
+		m[weight_names["bbox_pred"]], bbox_pred.weight
+	)
+	m[weight_names["mask_fcn_logits"]] = _copy_weight(
+		m[weight_names["mask_fcn_logits"]], mask_fcn_logits.weight
+	)
+
+	m[bias_names["cls_score"]] = _copy_bias(
+		m[bias_names["cls_score"]], cls_score.bias
+	)
+	m[bias_names["bbox_pred"]] = _copy_bias(
+		m[bias_names["bbox_pred"]], bbox_pred.bias, class_agnostic
+	)
+	m[bias_names["mask_fcn_logits"]] = _copy_bias(
+		m[bias_names["mask_fcn_logits"]], mask_fcn_logits.bias
+	)
+
+	print("f: {}\nout_file: {}".format(f, out_file))
+	torch.save(m, out_file)
+```
+Step 3: modify the `input&weight&solver` configuration in the `yaml` file, like this:  
+```
+MODEL:
+  WEIGHT: "xxx.pth" # the model u save from above code
+
+INPUT:
+  MIN_SIZE_TRAIN: (800, 832, 864, 896, 928, 960, 992, 1024, 1024)
+  MAX_SIZE_TRAIN: 2048
+  MIN_SIZE_TEST: 1024
+  MAX_SIZE_TEST: 2048
+
+SOLVER:
+  BASE_LR: 0.01
+  IMS_PER_BATCH: 8
+  WEIGHT_DECAY: 0.0001
+  STEPS: (3000,)
+  MAX_ITER: 4000
+```
+Step 4: train the model.  
+