Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SqueezeNet implementation for Faster RCNN #345

Open
mengzhangjian opened this issue Sep 19, 2016 · 20 comments
Open

SqueezeNet implementation for Faster RCNN #345

mengzhangjian opened this issue Sep 19, 2016 · 20 comments

Comments

@mengzhangjian
Copy link

mengzhangjian commented Sep 19, 2016

@siddharthm83 am trying to use SqueezeNet with Faster RCNN.The following is my training prototxt.I modify the roi_pool layer:pooled_w:13,pooled_h:13.And the training process has no mistakes. While the detection result is very bad,can anyone help me solve this?
name: "VGG_ILSVRC_16_layers"
layer {
name: 'input-data'
type: 'Python'
top: 'data'
top: 'im_info'
top: 'gt_boxes'
python_param {
module: 'roi_data_layer.layer'
layer: 'RoIDataLayer'
param_str: "'num_classes': 2"
}
}

layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
convolution_param {
num_output: 96
kernel_size: 7
stride: 2
weight_filler {
type: "xavier"
}
}
}
layer {
name: "relu_conv1"
type: "ReLU"
bottom: "conv1"
top: "conv1"
}
layer {
name: "pool1"
type: "Pooling"
bottom: "conv1"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "fire2/squeeze1x1"
type: "Convolution"
bottom: "pool1"
top: "fire2/squeeze1x1"
convolution_param {
num_output: 16
kernel_size: 1
weight_filler {
type: "xavier"
}
}
}
layer {
name: "fire2/relu_squeeze1x1"
type: "ReLU"
bottom: "fire2/squeeze1x1"
top: "fire2/squeeze1x1"
}
layer {
name: "fire2/expand1x1"
type: "Convolution"
bottom: "fire2/squeeze1x1"
top: "fire2/expand1x1"
convolution_param {
num_output: 64
kernel_size: 1
weight_filler {
type: "xavier"
}
}
}
layer {
name: "fire2/relu_expand1x1"
type: "ReLU"
bottom: "fire2/expand1x1"
top: "fire2/expand1x1"
}
layer {
name: "fire2/expand3x3"
type: "Convolution"
bottom: "fire2/squeeze1x1"
top: "fire2/expand3x3"
convolution_param {
num_output: 64
pad: 1
kernel_size: 3
weight_filler {
type: "xavier"
}
}
}
layer {
name: "fire2/relu_expand3x3"
type: "ReLU"
bottom: "fire2/expand3x3"
top: "fire2/expand3x3"
}
layer {
name: "fire2/concat"
type: "Concat"
bottom: "fire2/expand1x1"
bottom: "fire2/expand3x3"
top: "fire2/concat"
}
layer {
name: "fire3/squeeze1x1"
type: "Convolution"
bottom: "fire2/concat"
top: "fire3/squeeze1x1"
convolution_param {
num_output: 16
kernel_size: 1
weight_filler {
type: "xavier"
}
}
}
layer {
name: "fire3/relu_squeeze1x1"
type: "ReLU"
bottom: "fire3/squeeze1x1"
top: "fire3/squeeze1x1"
}
layer {
name: "fire3/expand1x1"
type: "Convolution"
bottom: "fire3/squeeze1x1"
top: "fire3/expand1x1"
convolution_param {
num_output: 64
kernel_size: 1
weight_filler {
type: "xavier"
}
}
}
layer {
name: "fire3/relu_expand1x1"
type: "ReLU"
bottom: "fire3/expand1x1"
top: "fire3/expand1x1"
}
layer {
name: "fire3/expand3x3"
type: "Convolution"
bottom: "fire3/squeeze1x1"
top: "fire3/expand3x3"
convolution_param {
num_output: 64
pad: 1
kernel_size: 3
weight_filler {
type: "xavier"
}
}
}
layer {
name: "fire3/relu_expand3x3"
type: "ReLU"
bottom: "fire3/expand3x3"
top: "fire3/expand3x3"
}
layer {
name: "fire3/concat"
type: "Concat"
bottom: "fire3/expand1x1"
bottom: "fire3/expand3x3"
top: "fire3/concat"
}
layer {
name: "fire4/squeeze1x1"
type: "Convolution"
bottom: "fire3/concat"
top: "fire4/squeeze1x1"
convolution_param {
num_output: 32
kernel_size: 1
weight_filler {
type: "xavier"
}
}
}
layer {
name: "fire4/relu_squeeze1x1"
type: "ReLU"
bottom: "fire4/squeeze1x1"
top: "fire4/squeeze1x1"
}
layer {
name: "fire4/expand1x1"
type: "Convolution"
bottom: "fire4/squeeze1x1"
top: "fire4/expand1x1"
convolution_param {
num_output: 128
kernel_size: 1
weight_filler {
type: "xavier"
}
}
}
layer {
name: "fire4/relu_expand1x1"
type: "ReLU"
bottom: "fire4/expand1x1"
top: "fire4/expand1x1"
}
layer {
name: "fire4/expand3x3"
type: "Convolution"
bottom: "fire4/squeeze1x1"
top: "fire4/expand3x3"
convolution_param {
num_output: 128
pad: 1
kernel_size: 3
weight_filler {
type: "xavier"
}
}
}
layer {
name: "fire4/relu_expand3x3"
type: "ReLU"
bottom: "fire4/expand3x3"
top: "fire4/expand3x3"
}
layer {
name: "fire4/concat"
type: "Concat"
bottom: "fire4/expand1x1"
bottom: "fire4/expand3x3"
top: "fire4/concat"
}
layer {
name: "pool4"
type: "Pooling"
bottom: "fire4/concat"
top: "pool4"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "fire5/squeeze1x1"
type: "Convolution"
bottom: "pool4"
top: "fire5/squeeze1x1"
convolution_param {
num_output: 32
kernel_size: 1
weight_filler {
type: "xavier"
}
}
}
layer {
name: "fire5/relu_squeeze1x1"
type: "ReLU"
bottom: "fire5/squeeze1x1"
top: "fire5/squeeze1x1"
}
layer {
name: "fire5/expand1x1"
type: "Convolution"
bottom: "fire5/squeeze1x1"
top: "fire5/expand1x1"
convolution_param {
num_output: 128
kernel_size: 1
weight_filler {
type: "xavier"
}
}
}
layer {
name: "fire5/relu_expand1x1"
type: "ReLU"
bottom: "fire5/expand1x1"
top: "fire5/expand1x1"
}
layer {
name: "fire5/expand3x3"
type: "Convolution"
bottom: "fire5/squeeze1x1"
top: "fire5/expand3x3"
convolution_param {
num_output: 128
pad: 1
kernel_size: 3
weight_filler {
type: "xavier"
}
}
}
layer {
name: "fire5/relu_expand3x3"
type: "ReLU"
bottom: "fire5/expand3x3"
top: "fire5/expand3x3"
}
layer {
name: "fire5/concat"
type: "Concat"
bottom: "fire5/expand1x1"
bottom: "fire5/expand3x3"
top: "fire5/concat"
}
layer {
name: "fire6/squeeze1x1"
type: "Convolution"
bottom: "fire5/concat"
top: "fire6/squeeze1x1"
convolution_param {
num_output: 48
kernel_size: 1
weight_filler {
type: "xavier"
}
}
}
layer {
name: "fire6/relu_squeeze1x1"
type: "ReLU"
bottom: "fire6/squeeze1x1"
top: "fire6/squeeze1x1"
}
layer {
name: "fire6/expand1x1"
type: "Convolution"
bottom: "fire6/squeeze1x1"
top: "fire6/expand1x1"
convolution_param {
num_output: 192
kernel_size: 1
weight_filler {
type: "xavier"
}
}
}
layer {
name: "fire6/relu_expand1x1"
type: "ReLU"
bottom: "fire6/expand1x1"
top: "fire6/expand1x1"
}
layer {
name: "fire6/expand3x3"
type: "Convolution"
bottom: "fire6/squeeze1x1"
top: "fire6/expand3x3"
convolution_param {
num_output: 192
pad: 1
kernel_size: 3
weight_filler {
type: "xavier"
}
}
}
layer {
name: "fire6/relu_expand3x3"
type: "ReLU"
bottom: "fire6/expand3x3"
top: "fire6/expand3x3"
}
layer {
name: "fire6/concat"
type: "Concat"
bottom: "fire6/expand1x1"
bottom: "fire6/expand3x3"
top: "fire6/concat"
}
layer {
name: "fire7/squeeze1x1"
type: "Convolution"
bottom: "fire6/concat"
top: "fire7/squeeze1x1"
convolution_param {
num_output: 48
kernel_size: 1
weight_filler {
type: "xavier"
}
}
}
layer {
name: "fire7/relu_squeeze1x1"
type: "ReLU"
bottom: "fire7/squeeze1x1"
top: "fire7/squeeze1x1"
}
layer {
name: "fire7/expand1x1"
type: "Convolution"
bottom: "fire7/squeeze1x1"
top: "fire7/expand1x1"
convolution_param {
num_output: 192
kernel_size: 1
weight_filler {
type: "xavier"
}
}
}
layer {
name: "fire7/relu_expand1x1"
type: "ReLU"
bottom: "fire7/expand1x1"
top: "fire7/expand1x1"
}
layer {
name: "fire7/expand3x3"
type: "Convolution"
bottom: "fire7/squeeze1x1"
top: "fire7/expand3x3"
convolution_param {
num_output: 192
pad: 1
kernel_size: 3
weight_filler {
type: "xavier"
}
}
}
layer {
name: "fire7/relu_expand3x3"
type: "ReLU"
bottom: "fire7/expand3x3"
top: "fire7/expand3x3"
}
layer {
name: "fire7/concat"
type: "Concat"
bottom: "fire7/expand1x1"
bottom: "fire7/expand3x3"
top: "fire7/concat"
}
layer {
name: "fire8/squeeze1x1"
type: "Convolution"
bottom: "fire7/concat"
top: "fire8/squeeze1x1"
convolution_param {
num_output: 64
kernel_size: 1
weight_filler {
type: "xavier"
}
}
}
layer {
name: "fire8/relu_squeeze1x1"
type: "ReLU"
bottom: "fire8/squeeze1x1"
top: "fire8/squeeze1x1"
}
layer {
name: "fire8/expand1x1"
type: "Convolution"
bottom: "fire8/squeeze1x1"
top: "fire8/expand1x1"
convolution_param {
num_output: 256
kernel_size: 1
weight_filler {
type: "xavier"
}
}
}
layer {
name: "fire8/relu_expand1x1"
type: "ReLU"
bottom: "fire8/expand1x1"
top: "fire8/expand1x1"
}
layer {
name: "fire8/expand3x3"
type: "Convolution"
bottom: "fire8/squeeze1x1"
top: "fire8/expand3x3"
convolution_param {
num_output: 256
pad: 1
kernel_size: 3
weight_filler {
type: "xavier"
}
}
}
layer {
name: "fire8/relu_expand3x3"
type: "ReLU"
bottom: "fire8/expand3x3"
top: "fire8/expand3x3"
}
layer {
name: "fire8/concat"
type: "Concat"
bottom: "fire8/expand1x1"
bottom: "fire8/expand3x3"
top: "fire8/concat"
}
layer {
name: "pool8"
type: "Pooling"
bottom: "fire8/concat"
top: "pool8"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "fire9/squeeze1x1"
type: "Convolution"
bottom: "pool8"
top: "fire9/squeeze1x1"
convolution_param {
num_output: 64
kernel_size: 1
weight_filler {
type: "xavier"
}
}
}
layer {
name: "fire9/relu_squeeze1x1"
type: "ReLU"
bottom: "fire9/squeeze1x1"
top: "fire9/squeeze1x1"
}
layer {
name: "fire9/expand1x1"
type: "Convolution"
bottom: "fire9/squeeze1x1"
top: "fire9/expand1x1"
convolution_param {
num_output: 256
kernel_size: 1
weight_filler {
type: "xavier"
}
}
}
layer {
name: "fire9/relu_expand1x1"
type: "ReLU"
bottom: "fire9/expand1x1"
top: "fire9/expand1x1"
}
layer {
name: "fire9/expand3x3"
type: "Convolution"
bottom: "fire9/squeeze1x1"
top: "fire9/expand3x3"
convolution_param {
num_output: 256
pad: 1
kernel_size: 3
weight_filler {
type: "xavier"
}
}
}
layer {
name: "fire9/relu_expand3x3"
type: "ReLU"
bottom: "fire9/expand3x3"
top: "fire9/expand3x3"
}
layer {
name: "fire9/concat"
type: "Concat"
bottom: "fire9/expand1x1"
bottom: "fire9/expand3x3"
top: "fire9/concat"
}
layer {
name: "drop9"
type: "Dropout"
bottom: "fire9/concat"
top: "fire9/concat"
dropout_param {
dropout_ratio: 0.5
}
}
layer {
name: "conv10"
type: "Convolution"
bottom: "fire9/concat"
top: "conv10"
convolution_param {
num_output: 1000
pad: 1
kernel_size: 1
weight_filler {
type: "gaussian"
mean: 0.0
std: 0.01
}
}
}
layer {
name: "relu_conv10"
type: "ReLU"
bottom: "conv10"
top: "conv10"
}

========= RPN ============

layer {
name: "rpn_conv/3x3"
type: "Convolution"
bottom: "conv10"
top: "rpn/output"
param { lr_mult: 1.0 }
param { lr_mult: 2.0 }
convolution_param {
num_output: 512
kernel_size: 3 pad: 1 stride: 1
weight_filler { type: "gaussian" std: 0.01 }
bias_filler { type: "constant" value: 0 }
}
}
layer {
name: "rpn_relu/3x3"
type: "ReLU"
bottom: "rpn/output"
top: "rpn/output"
}

layer {
name: "rpn_cls_score"
type: "Convolution"
bottom: "rpn/output"
top: "rpn_cls_score"
param { lr_mult: 1.0 }
param { lr_mult: 2.0 }
convolution_param {
num_output: 18 # 2(bg/fg) * 9(anchors)
kernel_size: 1 pad: 0 stride: 1
weight_filler { type: "gaussian" std: 0.01 }
bias_filler { type: "constant" value: 0 }
}
}

layer {
name: "rpn_bbox_pred"
type: "Convolution"
bottom: "rpn/output"
top: "rpn_bbox_pred"
param { lr_mult: 1.0 }
param { lr_mult: 2.0 }
convolution_param {
num_output: 36 # 4 * 9(anchors)
kernel_size: 1 pad: 0 stride: 1
weight_filler { type: "gaussian" std: 0.01 }
bias_filler { type: "constant" value: 0 }
}
}

layer {
bottom: "rpn_cls_score"
top: "rpn_cls_score_reshape"
name: "rpn_cls_score_reshape"
type: "Reshape"
reshape_param { shape { dim: 0 dim: 2 dim: -1 dim: 0 } }
}

layer {
name: 'rpn-data'
type: 'Python'
bottom: 'rpn_cls_score'
bottom: 'gt_boxes'
bottom: 'im_info'
bottom: 'data'
top: 'rpn_labels'
top: 'rpn_bbox_targets'
top: 'rpn_bbox_inside_weights'
top: 'rpn_bbox_outside_weights'
python_param {
module: 'rpn.anchor_target_layer'
layer: 'AnchorTargetLayer'
param_str: "'feat_stride': 16"
}
}

layer {
name: "rpn_loss_cls"
type: "SoftmaxWithLoss"
bottom: "rpn_cls_score_reshape"
bottom: "rpn_labels"
propagate_down: 1
propagate_down: 0
top: "rpn_cls_loss"
loss_weight: 1
loss_param {
ignore_label: -1
normalize: true
}
}

layer {
name: "rpn_loss_bbox"
type: "SmoothL1Loss"
bottom: "rpn_bbox_pred"
bottom: "rpn_bbox_targets"
bottom: 'rpn_bbox_inside_weights'
bottom: 'rpn_bbox_outside_weights'
top: "rpn_loss_bbox"
loss_weight: 1
smooth_l1_loss_param { sigma: 3.0 }
}

========= RoI Proposal ============

layer {
name: "rpn_cls_prob"
type: "Softmax"
bottom: "rpn_cls_score_reshape"
top: "rpn_cls_prob"
}

layer {
name: 'rpn_cls_prob_reshape'
type: 'Reshape'
bottom: 'rpn_cls_prob'
top: 'rpn_cls_prob_reshape'
reshape_param { shape { dim: 0 dim: 18 dim: -1 dim: 0 } }
}

layer {
name: 'proposal'
type: 'Python'
bottom: 'rpn_cls_prob_reshape'
bottom: 'rpn_bbox_pred'
bottom: 'im_info'
top: 'rpn_rois'

top: 'rpn_scores'

python_param {
module: 'rpn.proposal_layer'
layer: 'ProposalLayer'
param_str: "'feat_stride': 16"
}
}
layer {
name: 'roi-data'
type: 'Python'
bottom: 'rpn_rois'
bottom: 'gt_boxes'
top: 'rois'
top: 'labels'
top: 'bbox_targets'
top: 'bbox_inside_weights'
top: 'bbox_outside_weights'
python_param {
module: 'rpn.proposal_target_layer'
layer: 'ProposalTargetLayer'
param_str: "'num_classes': 2"
}
}

========= RCNN ============

layer {
name: "roi_pool5"
type: "ROIPooling"
bottom: "conv10"
bottom: "rois"
top: "pool10"
roi_pooling_param {
pooled_w: 13
pooled_h: 13
spatial_scale: 0.0625 # 1/16
}
}

layer {
name: "cls_score"
type: "InnerProduct"
bottom: "pool10"
top: "cls_score"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
inner_product_param {
num_output: 2
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "bbox_pred"
type: "InnerProduct"
bottom: "pool10"
top: "bbox_pred"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
inner_product_param {
num_output: 8
weight_filler {
type: "gaussian"
std: 0.001
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "loss_cls"
type: "SoftmaxWithLoss"
bottom: "cls_score"
bottom: "labels"
propagate_down: 1
propagate_down: 0
top: "loss_cls"
loss_weight: 1
}
layer {
name: "loss_bbox"
type: "SmoothL1Loss"
bottom: "bbox_pred"
bottom: "bbox_targets"
bottom: "bbox_inside_weights"
bottom: "bbox_outside_weights"
top: "loss_bbox"
loss_weight: 1
}

@huvers
Copy link

huvers commented Sep 23, 2016

Please update if you get this working. Thanks

@mengzhangjian
Copy link
Author

@huvers ,I tried many possible methods of net architecture,none of them is effective, do you figure it out?

@huvers
Copy link

huvers commented Sep 27, 2016

@mengzhangjian - Not yet, I'm still working on it too.

I'll put up a $100 bounty through paypal to whomever can get this working :)

@karaspd
Copy link

karaspd commented Sep 28, 2016

@mengzhangjian I am working on this right now, what is a result you achieve on which dataset?

@mengzhangjian
Copy link
Author

@karaspd ,I use it for face detection in WIDER face dataset, in the above config, I got nothing.

@karaspd
Copy link

karaspd commented Sep 29, 2016

@mengzhangjian it is a little bit strange. I am training and testing on kitti dataset with 3 different object difficulties. I used the squeezenet v1.1 and I placed roi-pool layer after fire9 with pooled_w: 7
pooled_h: 7. As a result, I see some fairly good results on easy objects (81%) but not quite good on moderate and hard objects. One of the issue with this network is detecting lots of fp bounding boxes.

@huvers
Copy link

huvers commented Sep 29, 2016

@karaspd Would you mind sharing your train.prototxt for SqueezeNetv1.1 ?

@huvers
Copy link

huvers commented Sep 29, 2016

@karaspd Hrmmm, that wouldn't run for my case (I modified it for 5 classes). How did you come up with "rpn_cls_score" to output 120?

@karaspd
Copy link

karaspd commented Sep 29, 2016

if you want to consider 60 anchors you need to change scales and aspect ratio in rpn codes. However you can ignore it and just consider 9 anchors like before and test if you have any problem. change 120 to 18 and 240 to 36.

@huvers
Copy link

huvers commented Sep 29, 2016

@karaspd Yes, I've tried changing them back to the usual 18 and 36 (as well as the rpn_cls_prob_reshape layer back to 18), but I'm still seeing no convergence in learning, and an error on the validation set:

File "/home/huvers/PycharmProjects/py-faster-rcnn/tools/../lib/datasets/voc_eval.py", line 148, in voc_eval
BB = BB[sorted_ind, :]
IndexError: too many indices for array


The same data works just fine training with VGG16.

Regardless, thanks for your help.

train.txt
test.txt

@karaspd
Copy link

karaspd commented Sep 29, 2016

I saw this error before, are you sure you are using a right pre-trained network? squeezenetv1.1?

@huvers
Copy link

huvers commented Sep 29, 2016

Yes, it is Squeezenetv1.1.

So I discovered it was actually the learning rate that was giving it issues. Originally, lr = .001, which was apparently too much. Reducing this to lr = .0001 led to convergence and the previous error disappeared. Currently running a large number of iterations to see how well it ultimately performs on my custom dataset.

@karaspd
Copy link

karaspd commented Sep 29, 2016

That's great, let me know about your results.

@mengzhangjian
Copy link
Author

@karaspd Thank you for your contribution,I will test for my face data and tell you the result.

@karaspd
Copy link

karaspd commented Oct 5, 2016

Hi @mengzhangjian @huvers, how is the training going? Did you get any interesting results?

@huvers
Copy link

huvers commented Oct 5, 2016

Hi @karaspd

My results haven't been very good so far. On my custom data (5 classes) VGG16 would achieve map=.80, while Squeezenet v1.1 can only do map=.55. I have to turn the confidence down very low for testing, and get a large number of false positives as a result.

I've experimented by making pooled_w,h = 13, which did not work well (map=.38), as well as adjusting the number of rpn convolution features without success.

Have you had better success?

@karaspd
Copy link

karaspd commented Oct 6, 2016

Hi @huvers,

I am getting a large number of false positives same as you and the average map that I got on kitti dataset is 0.64.

@mengzhangjian
Copy link
Author

@karaspd ,Sorry for that.I just took holidays a few days, I test it for my face data and the detection result is effective,I will later get concrete result. Thanks

@absorbguo
Copy link

Recently, I'm working on the integration of squeezenet and faster-rcnn. After checking your model @mengzhangjian ,I find out that after the roi-pooling layer ,you didn't add extra fully connected layer, would this decrease the detection performance?

@durveshpathak
Copy link

Hi @absorbguo @mengzhangjian were you able finally able to integrate SqueezeNet with faster RCNN ? is there a place where I can look at prototxt files ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants