Solve memory inefficiency in RNNs #16174

atmguille · 2022-03-04T22:16:46Z

As discussed with @mattdangerw, I am opening this PR to fix #16113 memory inefficiency in RNNs.

google-cla · 2022-03-04T22:16:50Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

For more information, open the CLA check for this pull request.

fchollet

Thanks for the PR!

keras/backend.py

fchollet

LGTM, thanks

fchollet

There are test failures, please take a look: https://source.cloud.google.com/results/invocations/919aaccc-7dd1-4d0e-9912-0596f18e60c1/targets/keras%2Fgithub%2Fubuntu%2Fgpu%2Fpresubmit/log

atmguille · 2022-03-07T23:00:55Z

@fchollet just saw the test log. I think it comes from the following:
LSTM and GRU have different functions to execute in CPU and GPU (the other RNNs layers don't). CPU version calls backend.rnn (the function I updated), but GPU calls tf.raw_ops.CudnnRNN. I updated the CPU version to properly call the new version of backend.rnn, but the GPU one will always return outputs as tf.raw_ops.CudnnRNN always returns the full array.

The error happens in the tests that combine CPU and GPU environments.

I see three options to solve this:

Deactivate new functionality in LSTM and GRU. As mentioned in the original issue Memory inefficiency in RNNs #16113, the layers most affected by the current behavior are convolutional RNNs, for which the update works perfectly.
Although tf.raw_ops.CudnnRNN returns outputs, the GPU method can still return outputs=[last_output]. The memory advantage will not be available in GPU for LSTM and GRU, but the return format from both CPU and GPU would be consistent.
Update tf.raw_ops.CudnnRNN in a similar way than backend.rnn. However, I have not been able to find the source code of that function in the tensorflow repository.

I think the ideal solution would be 3. For the time being, I would go for 2. Let me know what you think!

PS: I had executed the tests locally with no errors. I have re-executed them again and no errors were found, as can be seen in this notebook.

fchollet · 2022-03-08T21:10:03Z

Let's do option 2. Option 3. is not realistically doable given that the computation is delegated to cuDNN.

atmguille · 2022-03-09T01:02:53Z

I believe it should be fixed now. I have executed all tests locally and there are no errors. However, there were no errors also before this fix, as I mentioned in my last comment. Let's see

Thanks again for your time @fchollet

fchollet

LGTM, thanks!

atmguille · 2022-03-09T19:18:47Z

@fchollet I saw how the tests you requested worked but the tests that have just executed do not.
The log shows the following:

Maybe floating point operations errors being not deterministic? That would explain why your tests passed and these do not. Should we try to run them again? Let me know how I can help

Edit: I have checked the code of the test that fails and apparently it has nothing to do with my updates. The model being tested is this one

fchollet · 2022-03-10T17:42:55Z

I think this is possibly a case of flaky test. Let's try rerunning the tests.

atmguille · 2022-03-10T18:55:48Z

The tests have passed! I am not sure if you need to approve my last review request, as the PR is "Approved by Reviewer" but it also shows that there is 1 pending reviewer.

Anyway, thanks a lot for your time! It was a good learning experience as my first contribution @fchollet

Solve memory inefficiency in RNNs

3fe7a48

google-ml-butler bot added the size:M label Mar 4, 2022

google-ml-butler bot assigned gbaned Mar 4, 2022

Update other calls to backend.rnn

aada299

fchollet reviewed Mar 6, 2022

View reviewed changes

keras/backend.py Outdated Show resolved Hide resolved

keras/backend.py Outdated Show resolved Hide resolved

keras/backend.py Outdated Show resolved Hide resolved

keras/backend.py Show resolved Hide resolved

atmguille added 2 commits March 6, 2022 01:49

Update docstring

72721ce

Update other docstrings

b7e75ed

gbaned requested a review from fchollet March 7, 2022 11:17

google-ml-butler bot added the keras-team-review-pending Pending review by a Keras team member. label Mar 7, 2022

fchollet approved these changes Mar 7, 2022

View reviewed changes

google-ml-butler bot added kokoro:force-run ready to pull Ready to be merged into the codebase labels Mar 7, 2022

kokoro-team removed the kokoro:force-run label Mar 7, 2022

fchollet removed ready to pull Ready to be merged into the codebase keras-team-review-pending Pending review by a Keras team member. labels Mar 7, 2022

fchollet requested changes Mar 7, 2022

View reviewed changes

Update GRU and LSTM GPU functions for correct output format

6d882bb

Update output shape in docstrings

88c26e2

atmguille requested a review from fchollet March 9, 2022 01:06

google-ml-butler bot added the keras-team-review-pending Pending review by a Keras team member. label Mar 9, 2022

fchollet added the kokoro:force-run label Mar 9, 2022

kokoro-team removed the kokoro:force-run label Mar 9, 2022

fchollet approved these changes Mar 9, 2022

View reviewed changes

google-ml-butler bot added kokoro:force-run ready to pull Ready to be merged into the codebase labels Mar 9, 2022

fchollet removed the keras-team-review-pending Pending review by a Keras team member. label Mar 9, 2022

kokoro-team removed the kokoro:force-run label Mar 9, 2022

atmguille requested a review from fchollet March 9, 2022 19:29

google-ml-butler bot added the keras-team-review-pending Pending review by a Keras team member. label Mar 9, 2022

fchollet added kokoro:force-run and removed keras-team-review-pending Pending review by a Keras team member. labels Mar 10, 2022

kokoro-team removed the kokoro:force-run label Mar 10, 2022

copybara-service bot merged commit 10995e7 into keras-team:master Mar 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Solve memory inefficiency in RNNs #16174

Solve memory inefficiency in RNNs #16174

atmguille commented Mar 4, 2022 •

edited

Loading

google-cla bot commented Mar 4, 2022

fchollet left a comment

fchollet left a comment

fchollet left a comment

atmguille commented Mar 7, 2022 •

edited

Loading

fchollet commented Mar 8, 2022

atmguille commented Mar 9, 2022

fchollet left a comment

atmguille commented Mar 9, 2022 •

edited

Loading

fchollet commented Mar 10, 2022

atmguille commented Mar 10, 2022

Solve memory inefficiency in RNNs #16174

Solve memory inefficiency in RNNs #16174

Conversation

atmguille commented Mar 4, 2022 • edited Loading

google-cla bot commented Mar 4, 2022

fchollet left a comment

Choose a reason for hiding this comment

fchollet left a comment

Choose a reason for hiding this comment

fchollet left a comment

Choose a reason for hiding this comment

atmguille commented Mar 7, 2022 • edited Loading

fchollet commented Mar 8, 2022

atmguille commented Mar 9, 2022

fchollet left a comment

Choose a reason for hiding this comment

atmguille commented Mar 9, 2022 • edited Loading

fchollet commented Mar 10, 2022

atmguille commented Mar 10, 2022

atmguille commented Mar 4, 2022 •

edited

Loading

atmguille commented Mar 7, 2022 •

edited

Loading

atmguille commented Mar 9, 2022 •

edited

Loading