-
Notifications
You must be signed in to change notification settings - Fork 18.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LevelDB memory consumption problem (out of files) #13
Comments
Closed
Symptom of the same leveldb number of open files issue as #38. |
Thanks for the update and suggested fix. |
shelhamer
pushed a commit
to shelhamer/caffe
that referenced
this issue
Feb 26, 2014
sguada
added a commit
that referenced
this issue
Feb 26, 2014
sguada
added a commit
that referenced
this issue
Feb 26, 2014
sguada
added a commit
that referenced
this issue
Feb 26, 2014
mitmul
pushed a commit
to mitmul/caffe
that referenced
this issue
Sep 30, 2014
happynear
pushed a commit
to happynear/caffe
that referenced
this issue
Feb 26, 2016
Merge bvlc_win branch with master branch
andpol5
pushed a commit
to andpol5/caffe
that referenced
this issue
Aug 24, 2016
update from upstream.
mbassov
pushed a commit
to mbassov/caffe
that referenced
this issue
Nov 10, 2017
DEV-XXXXX: Update readme to track layer additions
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
When running Caffe on the ImageNet data, I observed that the memory usage (seen via top command) inexorably increases to almost 100%. With batchsize=256, this happens in around 2500 iterations. When I set the batchsize to 100, training was faster but by around 5000 iterations the memory consumption again increased to almost 100%. At that point the training slows down dramatically and in fact the loss does not change at all. I suspect the slowdown may be due to thrashing. I am wondering if there is a memory leak or something in Caffe that is unintentionally allocating more and more memory at each iteration.
The same issue occurs on MNIST, although the dataset is much smaller so the training can actually complete without issues.
I ran the MNIST data through the valgrind tool with --leak-check=full, and indeed some memory leaks were reported. These could be benign if the amount of leaked memory is constant, but maybe it is scaling with respect to the number of batches which could explain the forever-increasing memory consumption.
Any idea what could be the problem?
Update (12/13/2013): The problem may be in LevelDB. I was able to make it work by modifying src/caffe/layers/data_layer.cpp by setting options.max_open_files = 100. I think the default was 1000, which was just too much memory on the machine I was using. I also wonder whether it could be improved by setting ReadOptions::fill_cache=false, since Caffe seems to scan over the whole training set.
The text was updated successfully, but these errors were encountered: