Possible Memory Leak #195

Cpruce · 2018-10-09T06:37:54Z

Please make sure that the boxes below are checked before you submit your issue.
If your issue is an implementation question, please ask your question on StackOverflow or on the Keras Slack channel instead of opening a GitHub issue.

Thank you!

[ X] Check that you are up-to-date with the master branch of Keras. You can update with:
pip install git+git://github.com/keras-team/keras.git --upgrade --no-deps
[X ] Check that your version of TensorFlow is up-to-date. The installation instructions can be found here.
[X ] Provide a link to a GitHub Gist of a Python script that can reproduce your issue (or just copy the script here if it is short).

Please see:

https://discuss.mxnet.io/t/possible-memory-leak/1973

The text was updated successfully, but these errors were encountered:

roywei · 2018-10-09T15:23:21Z

@Cpruce Thanks for the issue, I am looking into this, possibly caused by the use of foreach operator.

Cpruce · 2018-10-09T16:12:24Z

@roywei Thanks for looking into this

roywei · 2018-10-11T17:02:11Z

@Cpruce I was able to narrow down the memory leak at validation time after each epoch. For now, removing validation during model.fit() resolved this, and use model.evaludate(test_data, test_label) to do validation at the end works fine.
We are using bucketing module in keras-mxnet, maybe switching bucket between train and validation caused the memory leak in foreach operator. Need to take another look at that.

Cpruce · 2018-10-11T19:00:44Z

@roywei awesome thanks I'll try it out soon 👍

roywei · 2018-10-22T20:44:33Z

For now removing validation dataset resolves the memory leak issue
using the following command for training:

history = model1.fit(x_train, y_train,
                    epochs=epochs,
                    batch_size=batch_size,
                    callbacks=[reduce_lr],
                    verbose=2)

need to investigate on how to re-enbale validation stage

julioasotodv · 2018-11-21T18:02:20Z

I can confirm that the memory leak is happening in mxnet-mkl 1.13.1 under Linux, when running the imdb_bidirectional_lstm.py in the examples folder (which includes a validation set)

MandarGogate · 2018-12-02T09:23:08Z

There is no memory leak when mxnet-cu90mkl==1.2.1 is used. However, mxnet-cu90mkl==1.3.1 throws error when validation data is used.

roywei added the training label Oct 9, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible Memory Leak #195

Possible Memory Leak #195

Cpruce commented Oct 9, 2018

roywei commented Oct 9, 2018

Cpruce commented Oct 9, 2018

roywei commented Oct 11, 2018

Cpruce commented Oct 11, 2018

roywei commented Oct 22, 2018

julioasotodv commented Nov 21, 2018

MandarGogate commented Dec 2, 2018

Possible Memory Leak #195

Possible Memory Leak #195

Comments

Cpruce commented Oct 9, 2018

roywei commented Oct 9, 2018

Cpruce commented Oct 9, 2018

roywei commented Oct 11, 2018

Cpruce commented Oct 11, 2018

roywei commented Oct 22, 2018

julioasotodv commented Nov 21, 2018

MandarGogate commented Dec 2, 2018