LevelDB memory consumption problem (out of files) #13

reedscot · 2013-12-10T21:09:14Z

When running Caffe on the ImageNet data, I observed that the memory usage (seen via top command) inexorably increases to almost 100%. With batchsize=256, this happens in around 2500 iterations. When I set the batchsize to 100, training was faster but by around 5000 iterations the memory consumption again increased to almost 100%. At that point the training slows down dramatically and in fact the loss does not change at all. I suspect the slowdown may be due to thrashing. I am wondering if there is a memory leak or something in Caffe that is unintentionally allocating more and more memory at each iteration.

The same issue occurs on MNIST, although the dataset is much smaller so the training can actually complete without issues.

I ran the MNIST data through the valgrind tool with --leak-check=full, and indeed some memory leaks were reported. These could be benign if the amount of leaked memory is constant, but maybe it is scaling with respect to the number of batches which could explain the forever-increasing memory consumption.

Any idea what could be the problem?

Update (12/13/2013): The problem may be in LevelDB. I was able to make it work by modifying src/caffe/layers/data_layer.cpp by setting options.max_open_files = 100. I think the default was 1000, which was just too much memory on the machine I was using. I also wonder whether it could be improved by setting ReadOptions::fill_cache=false, since Caffe seems to scan over the whole training set.

shelhamer · 2014-02-25T01:30:09Z

Symptom of the same leveldb number of open files issue as #38.

shelhamer · 2014-02-25T01:31:16Z

Thanks for the update and suggested fix.

Set leveldb options.max_open_files = 100 and Fix #13 and #38

Set leveldb options.max_open_files = 100 and Fix BVLC#13 and BVLC#38

Set leveldb options.max_open_files = 100 and Fix #13 and #38

Merge bvlc_win branch with master branch

update from upstream.

DEV-XXXXX: Update readme to track layer additions

jermainewang mentioned this issue Jan 20, 2014

Cuda kernel crash #39

Closed

Yangqing mentioned this issue Feb 19, 2014

LevelDB core dump while snapshotting (out of files) #38

Closed

shelhamer closed this as completed Feb 25, 2014

sguada mentioned this issue Feb 25, 2014

Set leveldb options.max_open_files = 100. Fix #13 and #38 #154

Merged

shelhamer added a commit that referenced this issue Feb 25, 2014

Merge pull request #154 from sguada/leveldb_max_open_files

ee00953

Set leveldb options.max_open_files = 100 and Fix #13 and #38

shelhamer pushed a commit to shelhamer/caffe that referenced this issue Feb 26, 2014

Set leveldb options.max_open_files = 100. Fix BVLC#13 and BVLC#38

f3d4d0b

shelhamer added a commit to shelhamer/caffe that referenced this issue Feb 26, 2014

Merge pull request BVLC#154 from sguada/leveldb_max_open_files

3f271f2

Set leveldb options.max_open_files = 100 and Fix BVLC#13 and BVLC#38

sguada added a commit that referenced this issue Feb 26, 2014

Set leveldb options.max_open_files = 100. Fix #13 and #38

8d458b0

shelhamer added a commit that referenced this issue Feb 26, 2014

Merge pull request #154 from sguada/leveldb_max_open_files

595849e

Set leveldb options.max_open_files = 100 and Fix #13 and #38

sguada added a commit that referenced this issue Feb 26, 2014

Set leveldb options.max_open_files = 100. Fix #13 and #38

e6077bf

shelhamer added a commit that referenced this issue Feb 26, 2014

Merge pull request #154 from sguada/leveldb_max_open_files

b56517c

Set leveldb options.max_open_files = 100 and Fix #13 and #38

sguada added a commit that referenced this issue Feb 26, 2014

Set leveldb options.max_open_files = 100. Fix #13 and #38

5b333e6

mitmul pushed a commit to mitmul/caffe that referenced this issue Sep 30, 2014

Set leveldb options.max_open_files = 100. Fix BVLC#13 and BVLC#38

a59c696

roseperrone mentioned this issue Oct 6, 2014

boost --with-python required on osx for pycaffe target #465 #1193

Closed

ryotat mentioned this issue Oct 18, 2014

Memory leak in InternalThread::StartInternalThread() #1316

Closed

PiranjaF mentioned this issue Apr 21, 2015

Strange drops in loss/accuracy during training #2343

Closed

chensiqin mentioned this issue Nov 28, 2015

Output accuracies per class. #2935

Merged

xnming mentioned this issue Jan 8, 2016

Segmentation fault when run test #3531

Open

anuphalarnkar mentioned this issue Jan 12, 2016

Segmentation fault after make runtest on Ubuntu 14.04/ppc64le #3539

Closed

anuphalarnkar mentioned this issue Jan 29, 2016

Fix crash when pairing an odd number of devices without P2P (BVLC/github issue #3531) #3586

Closed

happynear pushed a commit to happynear/caffe that referenced this issue Feb 26, 2016

Merge pull request BVLC#13 from pavlejosipovic/master

4eb16ac

Merge bvlc_win branch with master branch

bharatsau mentioned this issue Apr 22, 2016

Wrong output while using combination of DummyData and Dropout layer #4031

Closed

andpol5 pushed a commit to andpol5/caffe that referenced this issue Aug 24, 2016

Merge pull request BVLC#13 from BVLC/master

41f5825

update from upstream.

JonBoyleCoding mentioned this issue Oct 26, 2016

Caffe stuck waiting on multiple boost::condition_variable in all threads in caffe::BlockingQueue #4904

Closed

mbassov pushed a commit to mbassov/caffe that referenced this issue Nov 10, 2017

Merge pull request BVLC#13 from jessebrizzi/DEV-XXXXX

129272b

DEV-XXXXX: Update readme to track layer additions

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LevelDB memory consumption problem (out of files) #13

LevelDB memory consumption problem (out of files) #13

reedscot commented Dec 10, 2013

shelhamer commented Feb 25, 2014

shelhamer commented Feb 25, 2014

LevelDB memory consumption problem (out of files) #13

LevelDB memory consumption problem (out of files) #13

Comments

reedscot commented Dec 10, 2013

shelhamer commented Feb 25, 2014

shelhamer commented Feb 25, 2014