Finetuning out-of-memory and lack of output #682

htzheng · 2014-07-13T09:54:03Z

Hi, i plan to apply the pretrained ImageNet model for a 2-class classification task. So I need to modify the fc8 layer and further finetune the network. I follow shelhamer's suggestion in #186.

Here is what i do:

prepare the input data
change data of input layer and fc8 layer in imagenet_train.prototxt and imagenet_val.prototxt
type finetune_net imagenet_solver.prototxt caffe_reference_imagenet_model in terminal (caffe_reference_imagenet_model is the 244MB pretrained model file)

after that, the terminal doesn't response for a long time without output.

I'm new to caffe, could someone tell me how to finetune a existing model. Thanks for any reply!

htzheng · 2014-07-13T13:34:33Z

update:
in step (3) finetune_net imagenet_solver.prototxt caffe_reference_imagenet_model, it will show this info.

F0713 21:28:27.059324 3532 syncedmem.cpp:47] Check failed: error == cudaSuccess (2 vs. 0) out of memory
*** Check failure stack trace: ***
@ 0x7fb1ce8da9fd google::LogMessage::Fail()
@ 0x7fb1ce8dc89d google::LogMessage::SendToLog()
@ 0x7fb1ce8da5ec google::LogMessage::Flush()
@ 0x7fb1ce8dd1be google::LogMessageFatal::~LogMessageFatal()
@ 0x447284 caffe::SyncedMemory::mutable_gpu_data()
@ 0x43c9d2 caffe::Blob<>::mutable_gpu_diff()
@ 0x4934f2 caffe::InnerProductLayer<>::Backward_gpu()
@ 0x42e403 caffe::Net<>::Backward()
@ 0x445bd7 caffe::Solver<>::Solve()
@ 0x40a0c8 main
@ 0x7fb1cbf56ec5 (unknown)
@ 0x40be37 (unknown)
Aborted (core dumped)

is there any clue?

sguada · 2014-07-13T22:36:46Z

The message is clear you are out of memory in your GPU card. So you will
need to reduce the batch_size

Sergio

2014-07-13 6:34 GMT-07:00 htzheng [email protected]:

update:
in step (3) finetune_net imagenet_solver.prototxt
caffe_reference_imagenet_model, it will show this info.

F0713 21:28:27.059324 3532 syncedmem.cpp:47] Check failed: error ==
cudaSuccess (2 vs. 0) out of memory
*** Check failure stack trace: ***
@ 0x7fb1ce8da9fd google::LogMessage::Fail()
@ 0x7fb1ce8dc89d google::LogMessage::SendToLog()
@ 0x7fb1ce8da5ec google::LogMessage::Flush()
@ 0x7fb1ce8dd1be google::LogMessageFatal::~LogMessageFatal()
@ 0x447284 caffe::SyncedMemory::mutable_gpu_data()
@ 0x43c9d2 caffe::Blob<>::mutable_gpu_diff()
@ 0x4934f2 caffe::InnerProductLayer<>::Backward_gpu()
@ 0x42e403 caffe::Net<>::Backward()
@ 0x445bd7 caffe::Solver<>::Solve()
@ 0x40a0c8 main
@ 0x7fb1cbf56ec5 (unknown)
@ 0x40be37 (unknown)
Aborted (core dumped)

is there any clue?

—
Reply to this email directly or view it on GitHub
#682 (comment).

htzheng · 2014-07-14T07:38:07Z

@sguada Thank you! Now i change to a smaller batch_size and problem solved.

There is a tiny issue for finetune_net.bin on Ubuntu OS: LOG(INFO) does not print info on terminal. I change LOG(INFO) to std::cout. As observed, the finetuning code works well.

shelhamer · 2014-07-14T08:22:07Z

Happy that you figured it out. For logging do

GLOG_logtostderr=1 finetune_net imagenet_solver.prototxt caffe_reference_imagenet_model

Prasanna1991 · 2015-01-14T03:09:03Z

I am trying to implement DeepFace model and the memory required for test 49724060 (with batch size, finally 1) and for train 49724060 (again with batch size 1).... This makes total memory required around 94 MB. But I am still having the 'out of memory' problem. I have got nvidia geforce GT 650M as my GPU. Viewing the status of GPU using $nvidia-smi -q, I can see that the total Memory (FB) is 2047 Mib and Free memory is 1646 MiB. Can anyone point me what am I blinkering??

ToruHironaka · 2015-03-04T15:44:23Z

I got the same GPU error problem when I trained imagenet example. Even though, the limitation of the batch size to 4 from 256 in train_val.prototxt, which fixed the short of my GPU memory issue. I had the second thought about changing solver_mode to CPU from GPU because my GPU had only 512MB of memory due to my old Macbook pro. CPU mode seemed to train without the problem. I think I should use GPU mode for massive parallel computing but I only have several PCs and laptops and I am just learning caffe for my study. Should I use GPU mode always? I think GPU mode can optimize more for accelerating computing speed but not for my caffe learning phase. However, I have to build a new PC for training caffe. Please give me advise for what type of GPUs are good enough for caffe. I am thinking to buy NVIDIA GeForce GTX960 with 2GB memory. I heard GPU with 3GB or above memory is sufficient for caffe.

cervantes-loves-ai · 2016-07-18T13:11:16Z

when i try install fast rcnn than i got like this error? how to slove it?

Loaded network /home/rvlab/Music/fast-rcnn/data/fast_rcnn_models/vgg16_fast_rcnn_iter_40000.caffemodel

Demo for data/demo/000004.jpg
F0718 22:09:35.547049 13693 syncedmem.cpp:51] Check failed: error == cudaSuccess (2 vs. 0)  out of memory
*** Check failure stack trace: ***
Aborted (core dumped)

monajalal · 2016-08-24T17:34:22Z

I get this error when running the following:
jalal@klein:~/computer_vision/py-faster-rcnn$ ./tools/demo.py

I0823 20:13:31.522610 40008 layer_factory.hpp:77] Creating layer relu5_1
I0823 20:13:31.522629 40008 net.cpp:106] Creating Layer relu5_1
I0823 20:13:31.522644 40008 net.cpp:454] relu5_1 <- conv5_1
I0823 20:13:31.522662 40008 net.cpp:397] relu5_1 -> conv5_1 (in-place)
I0823 20:13:31.522843 40008 net.cpp:150] Setting up relu5_1
I0823 20:13:31.522869 40008 net.cpp:157] Top shape: 1 512 14 14 (100352)
I0823 20:13:31.522883 40008 net.cpp:165] Memory required for data: 112795648
I0823 20:13:31.522891 40008 layer_factory.hpp:77] Creating layer conv5_2
I0823 20:13:31.522902 40008 net.cpp:106] Creating Layer conv5_2
I0823 20:13:31.522909 40008 net.cpp:454] conv5_2 <- conv5_1
I0823 20:13:31.522922 40008 net.cpp:411] conv5_2 -> conv5_2
I0823 20:13:31.529803 40008 net.cpp:150] Setting up conv5_2
I0823 20:13:31.529841 40008 net.cpp:157] Top shape: 1 512 14 14 (100352)
I0823 20:13:31.529849 40008 net.cpp:165] Memory required for data: 113197056
I0823 20:13:31.529868 40008 layer_factory.hpp:77] Creating layer relu5_2
I0823 20:13:31.529887 40008 net.cpp:106] Creating Layer relu5_2
I0823 20:13:31.529903 40008 net.cpp:454] relu5_2 <- conv5_2
I0823 20:13:31.529920 40008 net.cpp:397] relu5_2 -> conv5_2 (in-place)
F0823 20:13:31.530177 40008 cudnn_relu_layer.cpp:13] Check failed: status == CUDNN_STATUS_SUCCESS (4 vs. 0)  CUDNN_STATUS_INTERNAL_ERROR
*** Check failure stack trace: ***
Aborted (core dumped)

@sguada how should I reduce the batch size? in which file? can you show an example?

jodusan · 2016-10-11T16:05:46Z

@monajalal Have you figured out which file it was? :D

onkarganjewar · 2017-02-22T04:56:08Z

@Dulex123 @monajalal If you're using py-faster-rcnn then, you can change the batch_size in lib/fast_rcnn/config.py

Also, I would suggest you take a look at this issue. Hope this helps!

wsz912 · 2017-03-30T08:45:01Z

use small batch_size will work

satyakesav · 2018-03-27T23:35:57Z

I changed my batch_size from 128 to 32, but still it fails. So we need to build again after changing the config file?

shelhamer closed this as completed Jul 14, 2014

shelhamer changed the title ~~Fine tuning issue~~ Finetuning out-of-memory and lack of output Jul 14, 2014

shelhamer added the downstream problem? label Jul 14, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Finetuning out-of-memory and lack of output #682

Finetuning out-of-memory and lack of output #682

htzheng commented Jul 13, 2014

htzheng commented Jul 13, 2014

sguada commented Jul 13, 2014

htzheng commented Jul 14, 2014

shelhamer commented Jul 14, 2014

Prasanna1991 commented Jan 14, 2015

ToruHironaka commented Mar 4, 2015

cervantes-loves-ai commented Jul 18, 2016

monajalal commented Aug 24, 2016

jodusan commented Oct 11, 2016

onkarganjewar commented Feb 22, 2017

wsz912 commented Mar 30, 2017

satyakesav commented Mar 27, 2018

Finetuning out-of-memory and lack of output #682

Finetuning out-of-memory and lack of output #682

Comments

htzheng commented Jul 13, 2014

htzheng commented Jul 13, 2014

sguada commented Jul 13, 2014

htzheng commented Jul 14, 2014

shelhamer commented Jul 14, 2014

Prasanna1991 commented Jan 14, 2015

ToruHironaka commented Mar 4, 2015

cervantes-loves-ai commented Jul 18, 2016

monajalal commented Aug 24, 2016

jodusan commented Oct 11, 2016

onkarganjewar commented Feb 22, 2017

wsz912 commented Mar 30, 2017

satyakesav commented Mar 27, 2018