Sparse convolutional neural networks #4328

wenwei202 · 2016-06-18T06:30:47Z

Anyone has interest to utilize the sparsity to accelerate DNNs?

I am working on the fork https://github.com/wenwei202/caffe/tree/scnn and currently, on average, achieve ~5x CPU and ~3x GPU layer-wise speedups of convolutional layers in AlexNet by off-the-shelf GEMM (after ~2% top1 accuracy loss).

http://papers.nips.cc/paper/6504-learning-structured-sparsity-in-deep-neural-networks.pdf

jpiabrantes · 2016-06-20T10:36:55Z

@wenwei202 could you explain a bit further how to use your fork? Any example? I have convolution layers where 90% of the weights are zero if I use your version of caffe the computations will automatically take advantage of this sparsity? If I use a dense matrix will the computations be slower or will it use the normal way of computing? Thanks for sharing your work 👍

wenwei202 · 2016-06-21T06:20:18Z

@jpiabrantes You can use conv_mode in each conv layer to indicate which method be utilized to do the computation.
e.g.
layer {
name: "conv2"
type: "Convolution"
bottom: "norm1"
top: "conv2"
convolution_param {
num_output: 256
pad: 2
kernel_size: 5
group: 2
conv_mode: LOWERED_CSRMM # sparse weight matrix in CSR format * lowered feature maps
# conv_mode: LOWERED_GEMM # default original matrix multiplication
}
}

Thanks

jpiabrantes · 2016-06-24T23:40:59Z

I just tested on the Lenet network for the MNIST example. I was able to achieve the following sparse layers:

conv1 is 75.4 percent sparse
conv2 is 94.7 percent sparse
ip1 is 74.5 percent sparse
ip2 is 89.5 percent sparse

I used conv_mode: LOWERED_CSRMM and connectivity_mode: DISCONNECTED_GRPWISE. I used the GPU and the sparse network was not faster. Sometimes it was even slower, my batchsize is 1.

wenwei202 · 2016-06-25T00:00:02Z

@jpiabrantes in CPU mode, you need to use mkl. LOWERED_CSRMM is only implemented by mkl sparse blas since sparseblas is not supported by openblas and atlas.

jpiabrantes · 2016-06-25T00:04:05Z

@wenwei202 I used the GPU mode.

wenwei202 · 2016-06-25T00:04:59Z

@jpiabrantes it is normal to achieve very limited 'speedup' in GPU even you have sparsity higher than 90%. Because GPU is high-parallelism, and irregular sparse pattern will impact the performance. I am working on structured sparsity to achieve speedup in GPU.

ghost · 2016-06-29T05:24:53Z

@wenwei202 I am not able to complete compilation. 'make runtest' fails.

wenwei202 · 2016-06-29T05:57:46Z

@Rupeshd @wenwei202
When make runtest, use atlas instead of mkl ( seems mkl has some problems to pass some testcases) and export following variables if you have more than one GPU:

export CUDA_VISIBLE_DEVICES=0 # use one GPU

To stabilize the sparsity during training, I zero out weights whose absolute values are smaller than 0.0001 after each weight updating. So, the precision of RMSPropSolverTest may not be enough to pass the test. You can comment the following code if you do not want to zero out (but it is recommended during training to stabilize the sparsity).

template <typename Dtype>
void Net<Dtype>::Update() {
  for (int i = 0; i < learnable_params_.size(); ++i) {
    learnable_params_[i]->Update();
    learnable_params_[i]->Zerout(); //comment this if you do not want to zerout.
  }
}

The only failed (crashed) test case is "TYPED_TEST(ConvolutionLayerTest, Test0DConvolution)" of https://github.com/wenwei202/caffe/blob/scnn/src/caffe/test/test_convolution_layer.cpp#L311.
And, I don't know why. If your guys can figure out, that would be great. Temporarily, I commented codes with in and passed all other test cases. Test0DConvolution was not used for usual 2D or 3D convolution, so it might not be a concern.

Hope this helps.

-Wei

zhaishengfu · 2016-07-13T08:34:21Z

@wenwei202
Hello, I think you have implemented Liu.s CVPR Sparse Convolution Neural Network. But in your fork of caffe_scnn(https://github.com/wenwei202/caffe/tree/scnn), I can't find any procedure to implement that(I know you implemented group_lasso and so on, but how can your code implement methods described in Liu's Paper?? Can you give me a simple tutorial??)
Thank you in advance.

zhaishengfu · 2016-07-13T08:38:19Z

@wenwei202
Besides, I can see you wrote 'models/eilab_reference_sparsenet/deploy_scnn.prototxt' and so on in some python files, but i can't find anyone of them.How can I generate them or where can i find them??

wenwei202 · 2016-07-14T04:17:28Z

@zhaishengfu The implementation was abandoned. Hardly it can achieve good speedup unless the sparse weights were hardcoded in the source code as the paper did. I didn't try hardcoding weights but you are free to try if you have interest. What the paper did was to convert each conv layer to three small layers. You can use this to generate the equivalent net prototxt and this to generate the corresponding decomposed caffemodel. But the code is deprecated.

zhaishengfu · 2016-07-15T08:15:15Z

@wenwei202 Thank you for your reply. But i don't understand your meaning of 'hardcoded'. I didn't see words describing about it in the paper. According to my understanding, you can get speed-up as long as your network is sparse and you implemented methods of sparse-dense matrix multiplication described in the paper. Am i wrong???

wenwei202 · 2016-07-16T01:51:27Z

@zhaishengfu Please refer to section 4 in the paper, like "Therefore, the location of non-zero elements are known and can be encoded directly in the compiled multiplication code." The duplication of that work was abandoned because of that tricky scheme. Our speedup is achieved by structured sparsity to overcome the irregular memory access pattern suffered from random distribution of sparse weights in the memory space. Hopefully, we can release our related paper soon.

zhaishengfu · 2016-07-16T01:58:52Z

@wenwei202 Thank you very much. Really looking forward to your paper. Can you let me know when you realease your paper??(or can you tell me the name of your paper??)

wenwei202 · 2016-08-15T05:51:16Z

Hi @zhaishengfu @jpiabrantes @Rupeshd @pluskid @sergeyk , our paper related to this caffe fork is just accepted by NIPS 2016. You are welcome to contribute, in case you still have interest in sparse convolutional neural networks. [paper] [Github code ]

zhaishengfu · 2016-08-15T10:51:18Z

@wenwei202 Thank you very much!! I will read it carefully!! I really enjoy your contribution to this fork

zhaishengfu · 2016-08-15T13:16:54Z

@wenwei202 hello, i have seen your paper and code roughly. is the code same with your original code?? i did't see any difference(or may be i should see more carefully)
besides, i have used your original code to train my model(regression problem). it is useful but i lose some accuracy. and if i set the learning rate to >10^-5, it will go to "nan". so i can only set it to small number and the convergence is very slow...

wenwei202 · 2016-08-25T21:40:10Z

@zhaishengfu Please use the scnn fork, and I have updated tutorial. Help that will help.

zhaishengfu · 2016-08-26T00:59:12Z

@wenwei202 ok, indeed i have used your code already. i used all of your related parameter to generate my prototxt as following. I see that you don't use tensor decomposition.
layer {
name: "conv1_1"
type: "Convolution"
bottom: "image"
top: "conv1_1"
param {
lr_mult: 1
decay_mult: 1
breadth_decay_mult: 1.0
kernel_shape_decay_mult: 1.0
block_group_lasso {
xdimen: 9
ydimen: 64
block_decay_mult: 1.0
}
regularization_type: "L1"
}
param {
lr_mult: 1
decay_mult: 1
breadth_decay_mult: 0.0
kernel_shape_decay_mult: 0.0
regularization_type: "L1"
}
connectivity_mode: DISCONNECTED_ELTWISE
convolution_param {
num_output: 64
bias_term: true
pad: 1
kernel_size: 3
group: 1
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
}
}
}

hiyijian · 2016-09-01T03:24:27Z

For the setting:
block_group_lasso {
xdimen: 9
ydimen: 64
block_decay_mult: 1.0
}
what's the meaning of 964? Dose it mean that it will reserve 964 group of weights, and zero-out others?

zhaishengfu · 2016-09-01T03:44:09Z

@hiyijian the code says clearly, the xdimen and ydimen represents the column and row dimension respectively. For example, if you have A rows and ydimen is B, then you will have A/B groups and in each group you will use regularization

hiyijian · 2016-09-01T03:57:03Z

Thanks. Clear now.
Is there any guide to set proper xdimen and ydimen in order to achieve better performance in accuracy and speed?

zhaishengfu · 2016-09-01T06:48:39Z

@hiyijian Indeed i also want to know the answer. In my trial of traning(my problem is not classification but regression), when the sparsity gets about >60%, the accuracy will decrease apprently. I think the configuration of xdimen and ydimen is related to your network and question. Maybe you can set the configuration as the paper says(such as xdimen is equal to the columns of your convolution kernel and ydimen is equal to the rows of your convolutional kernel).

hiyijian · 2016-09-01T07:27:27Z

Thank you @zhaishengfu
Maybe the network could be fine-tuned without SSL to regain the accuracy as paper report. I will have a try

wenwei202 · 2016-09-01T16:13:35Z

@zhaishengfu @hiyijian The setups of xdimen and ydimen are based on what kinds of structure sparsity you want. For example, if weight matrix with many all-zero columns are expected, then xdimen = 1 and ydimen = the number of rows. For the trade off between accuracy and sparsity, pls train nn without ssl first to get the baseline, then train it ssl, and finally finetune it without ssl. Make sure your training converges well at every phase.

hiyijian · 2016-09-02T01:30:04Z

Thanks @wenwei202 . It very helps.
You introduce 5 ways for group lasso:
1、filter-wise and channle-wise
2、shape-wise
3、depth-wise
4、2D-filter-wise
5、filter-wise and shape-wise

Would you like to make it more clear : How to put them into practice respectively via xdimen/ydimen control ?

hiyijian · 2016-09-02T02:28:05Z

Say we have a typical conv layer with nfilter* nchannel * nHeight * nWidth = 128 * 64 * 3 * 3
1、filter-wise and channle-wise: xdimen = 9 and ydimen >= 1
2 、shape-wise: xdimen != 9 and ydimen = 0
3、depth-wise: no idea
4、2D-filter-wise: xdimen = 9 and ydimem = 1
5、filter-wise and shape-wise: xdimen != 9 and ydimen >= 1

Did I do anything obivoius stupid?

gsrivas4 · 2017-07-22T06:23:00Z

Thanks for the explanation. I get the intuition now and can relate with the theory in paper. However, I am not sure about the line in the code above
t = tf.reduce_sum(t, axis=axis)
should not this be, t = tf.reduce_sum(t, axis=(1,2,3))
as for a row regularization, we should be taking the squared sum of all the weights in a filter. Please, let me know where I am going wrong in my interpretation.

Regarding the issue, it is coming during make test, if I resolve it, I will post my fix here.

wenwei202 · 2017-07-22T06:29:35Z

@srivastavag89 you are right, you should reduce along three axes if it is a 3D filter. My error.

hyunjaelee410 · 2017-07-22T06:31:50Z

@wenwei202
Thanks for your kind answer!
As a matter of fact, there was a slight problem with my configuration.

I would like to ask few more questions if you don't mind.

I've read your ICCV 2017 paper(nice work by the way), and would like to apply it with SSL.
However, SSD, which is also implemented in caffe has many newly-implemented layers.
Do you think the best of applying your work to SSD is to analyse your code then merging it with SSD code? or do you have any other idea??
SSD uses feature extractor net (VGG as a default) pre-trained with ImageNet, then fine-tune with VOC with newly added layers at the back. Do you think the network will converge well only with fine-tuning procedure with VOC?(with pre-trained feature extractor that SSL has been applied) or do you think pre-trained network should be re-trained with SSL then applied to SSD?
I've read your comment about implementing your work in TensorFlow. However, I have found that it doesn't support cuSPARSE at the moment but simple sparse matrix acceleration(https://www.tensorflow.org/api_docs/python/tf/sparse_tensor_dense_matmul).
Do you think the performance will improve in TensorFlow without support of cuSPARSE?
I have run tests on ConvNet with cifar-10 and came up with results as below.

Baseline (GEMM) : 1657.76 ms
Baseline (CSRMM) : 1594.99 ms
After SSL (GEMM) : 1638.37 ms
After SSL (CSRMM) : 1361.54 ms
After Fine-tuning(GEMM) : 1678.58 ms
After Fine-tuning(CSRMM) : 1183.33 ms

and Sparsities of each case are

Baseline : (0.430833, 0.655312, 0.659883)
After SSL : (0.603333, 0.845117, 0.659688)
After Fine-tuning : (0.624167, 0.860391, 0.708887)
-in conv1, conv2, conv3 order

What I'm interested is that inference speed has changed a lot after fine-tuning even though there wasn't dramatic difference of sparsity. From what I understand, fine tuning is about regaining accuracy. Is it naturally to get such a gain?

Sorry for so much question. I wish I could get your intuition before applying to detection.
I hope I could share nice result of applying your work to SSD.
Thanks a lot!

wenwei202 · 2017-07-22T07:03:44Z

@HyunJaeLee2

merging is a good way since everything is trackable;
It is a safe way to first use SSL to get a sparsified VggNet and then fine-tune the back by VOC, but it is also doable or even better to train the whole net using SSL and VOC. SSL is a kind of regularization, and if it would not overfit, then you may get higher sparsity because VOC is a smaller dataset and the model can be compressed more? This is just my thought, the best way is to try both ways.
cuSPARSE is still slow if you checked the figure here. I recommend LOWERED_CCNMM. TensorFlow is good for training, for inference, you may need to optimize your inference code based on the sparsity you obtained.
Finetuning is for accuracy regaining while maintaining the sparsity trained by SSL. It can get ~1.0-2.0% accuracy back. I do not know how you measured the speed, but the code is not the best one for final deployment because some code hacking is required.

gsrivas4 · 2017-08-09T19:55:50Z

@wenwei202 Thanks for your work.
I am trying to implement structured sparsity in Thenao/Lasagne. I have very similar network architecture as yours. But when I run the test, the sparsity for filters and channels change from 0% to 100% within an epoch. If you can suggest or give some pointers based on your experience where I could be going wrong in my implementation. Also, I wanted to confirm regarding the hyperparamters, did you use the filter and channel sparsity hyperparameters, as .003, as given on your github.

wenwei202 · 2017-08-09T20:17:45Z

@srivastavag89 may be the hyper-parameter of structured sparsity regularization is too large such that it only optimizes the sparsity?
The hyper-parameter depends on the network architecture and dataset. It is easier to tune the hyper-parameter by retraining the trained DNNs , instead of training from scratch.
One good hyper-parameter to start from is the one such that the value of cross-entropy and the value of regularization are close. Another good value is the one of L2 weight decay.

gsrivas4 · 2017-08-09T20:42:24Z

Thanks for the reply. I will try taking a trained network and then retrain it adding structured sparsity to it this time. Also, I will try hyperparameters as per your suggestions.
Another suggestion I needed, would it be fine if I start with another network in theano which gives similar baseline accuracy without SSL applied to it. But this network is not same but similar to your baseline network with convolution, max pooling and batch normalization layers. Would you suggest trying with this network or constructing a network exactly same as yours? Just trying to get your views, if there is anything specific in the network you took or you would expect SSL to work on other networks as well,

wenwei202 · 2017-08-09T20:53:47Z

@srivastavag89 SSL is a universal method, it would work for versatile networks not just for those in the paper.

zlheos · 2017-09-14T10:14:36Z

@wenwei202 I try to train SSL with CPU mode
but there has a problem " "Deprecated in CPU mode: breadth and kernel shape decay (use block group decay instead)""
I'm appreciated you could answer me！

wenwei202 · 2017-09-14T13:26:52Z

@zlheos Please use block_group_lasso in net protobuf and block_group_decay in solver protobuf instead. You may check the tutorial here.

wenwei202 · 2017-09-18T00:29:46Z

FYI, Structured Sparsity Learning (SSL) approach is now also implemented in TensorFlow (code). We also extend and advance SSL to Recurrent Neural Networks to reduce the hidden dimension of LSTMs, i.e., learning the number of hidden/cell states/neurons in RNNs. Missing details (e.g. training method) are included in Section 3.2 here.

m8w · 2017-09-18T02:16:31Z

Theory
begin Comments
IF I use fft(Fast Fourier Transformations) in exponential convolutions for video filters, the system implementation breaks crashes. While I wait for the solutions to these video filters and their stream applications by finding non-identity type third-order differential derivatives in some Banach Lp singular integral operators with non-convolution type manifolds I have found it is easier with Gauge systems and solitions yet neural network systems of a gaugian types might only be theory at this point.
end Comment

When going beyond the theories and applications we can simplify all kinds of identities into one type or another.

joechen24 · 2017-10-04T03:29:55Z

@wenwei202 Great job on this paper! It is very impressive! I wonder where I can find your prototxt for training the alexnet for imagenet (the one shown in paper)? Because of the need of my experiment, I would like to make sure that I am using the correct method to compress the network in the correct way (which should reproduce your results).
Much appreciated!

wenwei202 · 2017-10-09T19:37:32Z

@Jarvistonychen Took a while to look around the logs, and find the the hyperparameter of 0.0005 for entry 5 in table 4.

joechen24 · 2017-10-11T18:08:10Z

@wenwei202 Thanks a lot for spending time answering my question! Since there is only column sparsity in entry 5 of table 4, I think 0.0005 is for kernel_shape_decay? If I also wanna do breadth_decay, is 0.0005 also the right value to use?
Thanks again!

wenwei202 · 2017-10-11T18:14:37Z

@Jarvistonychen Yes, you may start from there.

ananddb90 · 2017-10-18T12:37:16Z

@wenwei202
for conv layer with dimension = nfilter x nchannel x nheight x nwidth (64x128x3x3)
how can I combine filter wise and channel wise sparsity on one prototxt of ResNet?
Do I have to write
block_group_lasso{ #filterwise
xdim: 128x3x3
ydim: 1
}
block_group_lasso{ #channelwise
xdim: 3x3
ydim: 64
}

wenwei202 · 2017-10-19T02:43:19Z

Correct. More precisely:

block_group_lasso{ 
xdim: 1152
ydim: 1
}
block_group_lasso{ 
xdim: 9
ydim: 64
}

ananddb90 · 2017-10-24T14:12:45Z

@wenwei202
In the log file, what is the meaning of

I1024 15:28:53.767140 10494 sgd_solver.cpp:120] Element Sparsity %:
I1024 15:28:53.776428 10494 sgd_solver.cpp:130] Column Sparsity %:
I1024 15:28:53.777498 10494 sgd_solver.cpp:139] Row Sparsity %:
I1024 15:28:53.777539 10494 sgd_solver.cpp:153] Block Sparsity %:

I1024 15:29:05.849658 10494 solver.cpp:231] Iteration 4420, loss = 0.494514
I1024 15:29:05.849678 10494 solver.cpp:247] Train net output #0: accuarcy = 0.920354
I1024 15:29:05.849684 10494 solver.cpp:247] Train net output #1: loss_bbox = 0.142928 (* 1 = 0.142928 loss)
I1024 15:29:05.849689 10494 solver.cpp:247] Train net output #2: loss_cls = 0.347726 (* 1 = 0.347726 loss)
I1024 15:29:05.849691 10494 solver.cpp:247] Train net output #3: rpn_cls_loss = 0.0584648 (* 1 = 0.0584648 loss)
I1024 15:29:05.849695 10494 solver.cpp:247] Train net output #4: rpn_loss_bbox = 0.0126774 (* 1 = 0.0126774 loss)
I1024 15:29:05.849699 10494 sgd_solver.cpp:106] Iteration 4420, lr = 0.0001
I1024 15:29:06.022259 10494 sgd_solver.cpp:120] Element Sparsity %:
0.28699 100 0 0 0 0 0 1.58691 0 1.17188 0 0 0 0.366211 0 0 0 0 0 0.496419 0 0 0 0 0 1.55029 0 1.17188 0 0 0 1.82495 0 0 0 0 0 0.311957 0 0 0 0 0 0.57373 0 0 0 0 0 7.91092 0 0 0 0 0 7.32727 0 0 0 0 0.78125 6.30493 0 0 0 0 0 16.4474 0 0 0 0 0 11.7126 0 0.78125 0 0 0 6.78982 0 0 0 0 0 21.0297 0 0 0 0.195312 0 7.89948 0 0 0 0 0 7.76062 0 0 0 0 0 6.10775 0 0 0 0 0 8.77342 0 0 0 0 0 11.3316 0 0 0 0 0 6.27865 0 0 0 0 0 8.37975 0 0 0 0 0 8.17313 0 0 0 0 0 7.22427 0 0 0 0 0 5.82258 0 0 0 0 0 7.35607 0.146484 0 0 0 0 7.47519 0 0 0 0 0 5.42183 0 0 0 0 0 7.55234 0.0488281 0 0 0 0 3.44215 21.0938 2.3112 0 10.9375 19.4444 3.55883 14.9414 3.04452 28.5714 77.5074 87.5
I1024 15:29:06.031704 10494 sgd_solver.cpp:130] Column Sparsity %:
0 100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.17188 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.5625 0 0 0 0 0 0.78125 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.976562 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.0976562 0
I1024 15:29:06.032747 10494 sgd_solver.cpp:139] Row Sparsity %:
0 100 0 0 0 0 0 0 0 1.17188 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.17188 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2.34375 0 0 0 0 0 0 0 0 0 0 0.78125 0 0 0 0 0 0 7.42188 0 0 0 0 0 0.78125 0 0.78125 0 0 0 0 0 0 0 0 0 12.6953 0 0 0 0.195312 0 0.390625 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.292969 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.0976562 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.146484 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.0488281 0 0 0 0 0 21.0938 0 0 0 19.4444 0 14.9414 0 28.5714 27.8061 87.5
I1024 15:29:06.032781 10494 sgd_solver.cpp:153] Block Sparsity %:

and how to calculate/know percentage of sparsity in model before and after training ?

wenwei202 · 2017-10-24T15:09:23Z

@ananddb90 During training, you will see some sparsity statistics. The sparsity is shown in the order of layers, and in each layer, in the order of weights and then biases. Basically, it plots sparsity for all parameter blobs in caffe, like parameters for a batch normalization layer. We usually care only about the sparsity of weights.

The "Element Sparsity" is the percentage of zeros. "Block Sparsity" is the percentage of all-zero blocks if you used block_group_lasso. Others are pretty self explained. Thanks.

ananddb90 · 2017-10-25T07:37:27Z

@wenwei202
thank you for your reply
but I couldn't understand your first paragraph. How can I get sparsity in the order of layers, and then layer specific weight/bias since in my log file, I just get my training loss, element, column, row sparsity.

Also, I am using block_group_lasso but I am not getting any output in my log file.

Train_scnn.prototxt
layer {
bottom: "res2b"
top: "res3a_branch1"
name: "res3a_branch1"
type: "Convolution"
convolution_param {
num_output: 512
kernel_size: 1
pad: 0
stride: 2
bias_term: false
}
param {
lr_mult: 1.0
block_group_lasso { # Filter-wise structured sparsity
xdimen: 256 # CxWxH
ydimen: 1 # the block size along the y (row) dimension
block_decay_mult: 1.0 # the local multiplier of weight decay (block_group_decay) by group lasso regularization
}
block_group_lasso { # Channel-wise structured sparsity
xdimen: 1 # CxWxH
ydimen: 512 # the block size along the y (row) dimension
block_decay_mult: 1.0 # the local multiplier of weight decay (block_group_decay) by group lasso regularization
}
}
}

solver.prototxt
#base_lr: 0.001
base_lr: 0.0001 #scnn_lr = 0.1*base_lr
lr_policy: "step"
gamma: 0.1
stepsize: 80000
display: 20
momentum: 0.9

weight_decay: 0.0005

#kernel_shape_decay: 0.0
#breadth_decay: 0.0
block_group_decay: 0.001 #0.007

snapshot: 10000
snapshot_prefix: "rfcn_scnn"

iter_size: 2
debug_info: true

wenwei202 · 2017-10-25T14:59:44Z

@ananddb90 here is how sparsity displayed. Reading the code may be the best way. Alternatively, you may use pycaffe to analyze the trained model.

AllenFenglei · 2018-06-15T08:53:12Z

@jpiabrantes
Hello! I'm a newbie in caffe. May I ask how to get the sparsity in each layer? eg.
conv1 is 75.4 percent sparse
conv2 is 94.7 percent sparse
ip1 is 74.5 percent sparse
ip2 is 89.5 percent spars

Demohai · 2018-09-30T03:51:33Z

@wenwei202 read your paper and watch your blogs recently. I am also studying for a master's degree at Beihang University，very admired you and your research results. I have a question that hope you can help me. In your SSL paper, you want to learn structed sparsity through setting breadth_decay, kernel_shape_decay or block_group_decay, but as you said below, during SSL, zeros in a row or column can still go back to nonzeros if they get a large update by the gradients of the cross entropy. Then, after fine-tuning the SSL, the weights in a row , column or block may not be all zeros, there are some nonzero values at any place, so not the structed sparsity. I don't know am I right?

wenwei202 · 2018-09-30T18:39:18Z

Hello @Demohai , answers may differ for different stages. In a learning stage of structured sparsity using group Lasso, zeros can go back only when those weights are very important since group Lasso regularization enforces them to zeros; in a fine-tuning stage after group Lasso, we simply fixed zeros in all-zero groups/rows/columns and retrained remaining ones, so that structured sparsity were kept. Thanks! :)

Demohai · 2018-10-06T11:08:20Z

hello @wenwei202 I find each time I run the program ,under the caffe root dictionary, there are some weights files for each layer, what are they used for and which section of the source code product them?

wenwei202 · 2018-10-06T15:05:06Z

It's just used to analyze the sparsity pattern. The code generating the weights is here and here.

Demohai · 2018-10-07T01:05:08Z

Thanks a lot !!!

Demohai · 2018-10-08T03:07:58Z

hello @wenwei202 bother you again. When to deploy the fine-tuned SSL network, we have several conv mode for choice, I want to know what the lowered tensors and lowered feature maps are as the following picture shows.

wenwei202 · 2018-10-20T02:02:33Z

@Demohai sorry for late reply. The feature maps are lowered to a matrix for matrix multiplication, and this is how LOWERED comes. Please refer to this post for details.

shelhamer added the speed-up label Aug 16, 2016

wenwei202 closed this as completed Sep 1, 2016

wenwei202 reopened this Sep 1, 2016

Sparse convolutional neural networks #4328

Sparse convolutional neural networks #4328

Comments

wenwei202 commented Jun 18, 2016 • edited Loading

jpiabrantes commented Jun 20, 2016 • edited Loading

wenwei202 commented Jun 21, 2016

jpiabrantes commented Jun 24, 2016 • edited Loading

wenwei202 commented Jun 25, 2016 • edited Loading

jpiabrantes commented Jun 25, 2016

wenwei202 commented Jun 25, 2016

ghost commented Jun 29, 2016

wenwei202 commented Jun 29, 2016 • edited Loading

zhaishengfu commented Jul 13, 2016

zhaishengfu commented Jul 13, 2016

wenwei202 commented Jul 14, 2016

zhaishengfu commented Jul 15, 2016

wenwei202 commented Jul 16, 2016

zhaishengfu commented Jul 16, 2016

wenwei202 commented Aug 15, 2016 • edited Loading

zhaishengfu commented Aug 15, 2016 • edited Loading

zhaishengfu commented Aug 15, 2016

wenwei202 commented Aug 25, 2016

zhaishengfu commented Aug 26, 2016 • edited Loading

hiyijian commented Sep 1, 2016 • edited Loading

zhaishengfu commented Sep 1, 2016

hiyijian commented Sep 1, 2016 • edited Loading

zhaishengfu commented Sep 1, 2016

hiyijian commented Sep 1, 2016

wenwei202 commented Sep 1, 2016

hiyijian commented Sep 2, 2016

hiyijian commented Sep 2, 2016 • edited Loading

gsrivas4 commented Jul 22, 2017

wenwei202 commented Jul 22, 2017

hyunjaelee410 commented Jul 22, 2017

wenwei202 commented Jul 22, 2017 • edited Loading

gsrivas4 commented Aug 9, 2017

wenwei202 commented Aug 9, 2017

gsrivas4 commented Aug 9, 2017

wenwei202 commented Aug 9, 2017

zlheos commented Sep 14, 2017

wenwei202 commented Sep 14, 2017

wenwei202 commented Sep 18, 2017 • edited Loading

m8w commented Sep 18, 2017

joechen24 commented Oct 4, 2017

wenwei202 commented Oct 9, 2017

joechen24 commented Oct 11, 2017

wenwei202 commented Oct 11, 2017

ananddb90 commented Oct 18, 2017

wenwei202 commented Oct 19, 2017

ananddb90 commented Oct 24, 2017

wenwei202 commented Oct 24, 2017 • edited Loading

ananddb90 commented Oct 25, 2017 • edited Loading

wenwei202 commented Oct 25, 2017

AllenFenglei commented Jun 15, 2018

Demohai commented Sep 30, 2018

wenwei202 commented Sep 30, 2018 • edited Loading

Demohai commented Oct 6, 2018

wenwei202 commented Oct 6, 2018

Demohai commented Oct 7, 2018

Demohai commented Oct 8, 2018

wenwei202 commented Oct 20, 2018

wenwei202 commented Jun 18, 2016 •

edited

Loading

jpiabrantes commented Jun 20, 2016 •

edited

Loading

jpiabrantes commented Jun 24, 2016 •

edited

Loading

wenwei202 commented Jun 25, 2016 •

edited

Loading

wenwei202 commented Jun 29, 2016 •

edited

Loading

wenwei202 commented Aug 15, 2016 •

edited

Loading

zhaishengfu commented Aug 15, 2016 •

edited

Loading

zhaishengfu commented Aug 26, 2016 •

edited

Loading

hiyijian commented Sep 1, 2016 •

edited

Loading

hiyijian commented Sep 1, 2016 •

edited

Loading

hiyijian commented Sep 2, 2016 •

edited

Loading

wenwei202 commented Jul 22, 2017 •

edited

Loading

wenwei202 commented Sep 18, 2017 •

edited

Loading

wenwei202 commented Oct 24, 2017 •

edited

Loading

ananddb90 commented Oct 25, 2017 •

edited

Loading

wenwei202 commented Sep 30, 2018 •

edited

Loading