Skip to content
This repository has been archived by the owner on Jul 1, 2024. It is now read-only.

Possible Memory Leak #195

Open
Cpruce opened this issue Oct 9, 2018 · 7 comments
Open

Possible Memory Leak #195

Cpruce opened this issue Oct 9, 2018 · 7 comments
Labels

Comments

@Cpruce
Copy link

Cpruce commented Oct 9, 2018

Please make sure that the boxes below are checked before you submit your issue.
If your issue is an implementation question, please ask your question on StackOverflow or on the Keras Slack channel instead of opening a GitHub issue.

Thank you!

  • [ X] Check that you are up-to-date with the master branch of Keras. You can update with:
    pip install git+git://github.com/keras-team/keras.git --upgrade --no-deps

  • [X ] Check that your version of TensorFlow is up-to-date. The installation instructions can be found here.

  • [X ] Provide a link to a GitHub Gist of a Python script that can reproduce your issue (or just copy the script here if it is short).

Please see:

https://discuss.mxnet.io/t/possible-memory-leak/1973

@roywei roywei added the training label Oct 9, 2018
@roywei
Copy link

roywei commented Oct 9, 2018

@Cpruce Thanks for the issue, I am looking into this, possibly caused by the use of foreach operator.

@Cpruce
Copy link
Author

Cpruce commented Oct 9, 2018

@roywei Thanks for looking into this

@roywei
Copy link

roywei commented Oct 11, 2018

@Cpruce I was able to narrow down the memory leak at validation time after each epoch. For now, removing validation during model.fit() resolved this, and use model.evaludate(test_data, test_label) to do validation at the end works fine.
We are using bucketing module in keras-mxnet, maybe switching bucket between train and validation caused the memory leak in foreach operator. Need to take another look at that.

@Cpruce
Copy link
Author

Cpruce commented Oct 11, 2018

@roywei awesome thanks I'll try it out soon 👍

@roywei
Copy link

roywei commented Oct 22, 2018

For now removing validation dataset resolves the memory leak issue
using the following command for training:

history = model1.fit(x_train, y_train,
                    epochs=epochs,
                    batch_size=batch_size,
                    callbacks=[reduce_lr],
                    verbose=2)

need to investigate on how to re-enbale validation stage

@julioasotodv
Copy link

I can confirm that the memory leak is happening in mxnet-mkl 1.13.1 under Linux, when running the imdb_bidirectional_lstm.py in the examples folder (which includes a validation set)

@MandarGogate
Copy link

There is no memory leak when mxnet-cu90mkl==1.2.1 is used. However, mxnet-cu90mkl==1.3.1 throws error when validation data is used.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

4 participants