Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Solve memory inefficiency in RNNs #16174

Merged
merged 6 commits into from
Mar 15, 2022
Merged

Solve memory inefficiency in RNNs #16174

merged 6 commits into from
Mar 15, 2022

Conversation

atmguille
Copy link
Contributor

@atmguille atmguille commented Mar 4, 2022

As discussed with @mattdangerw, I am opening this PR to fix #16113 memory inefficiency in RNNs.

@google-cla
Copy link

google-cla bot commented Mar 4, 2022

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

For more information, open the CLA check for this pull request.

Copy link
Member

@fchollet fchollet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

keras/backend.py Outdated Show resolved Hide resolved
keras/backend.py Outdated Show resolved Hide resolved
keras/backend.py Outdated Show resolved Hide resolved
keras/backend.py Show resolved Hide resolved
@gbaned gbaned requested a review from fchollet March 7, 2022 11:17
@google-ml-butler google-ml-butler bot added the keras-team-review-pending Pending review by a Keras team member. label Mar 7, 2022
Copy link
Member

@fchollet fchollet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks

@google-ml-butler google-ml-butler bot added kokoro:force-run ready to pull Ready to be merged into the codebase labels Mar 7, 2022
@fchollet fchollet removed ready to pull Ready to be merged into the codebase keras-team-review-pending Pending review by a Keras team member. labels Mar 7, 2022
Copy link
Member

@fchollet fchollet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@atmguille
Copy link
Contributor Author

atmguille commented Mar 7, 2022

@fchollet just saw the test log. I think it comes from the following:
LSTM and GRU have different functions to execute in CPU and GPU (the other RNNs layers don't). CPU version calls backend.rnn (the function I updated), but GPU calls tf.raw_ops.CudnnRNN. I updated the CPU version to properly call the new version of backend.rnn, but the GPU one will always return outputs as tf.raw_ops.CudnnRNN always returns the full array.

The error happens in the tests that combine CPU and GPU environments.

I see three options to solve this:

  1. Deactivate new functionality in LSTM and GRU. As mentioned in the original issue Memory inefficiency in RNNs #16113, the layers most affected by the current behavior are convolutional RNNs, for which the update works perfectly.
  2. Although tf.raw_ops.CudnnRNN returns outputs, the GPU method can still return outputs=[last_output]. The memory advantage will not be available in GPU for LSTM and GRU, but the return format from both CPU and GPU would be consistent.
  3. Update tf.raw_ops.CudnnRNN in a similar way than backend.rnn. However, I have not been able to find the source code of that function in the tensorflow repository.

I think the ideal solution would be 3. For the time being, I would go for 2. Let me know what you think!

PS: I had executed the tests locally with no errors. I have re-executed them again and no errors were found, as can be seen in this notebook.

@fchollet
Copy link
Member

fchollet commented Mar 8, 2022

Let's do option 2. Option 3. is not realistically doable given that the computation is delegated to cuDNN.

@atmguille
Copy link
Contributor Author

I believe it should be fixed now. I have executed all tests locally and there are no errors. However, there were no errors also before this fix, as I mentioned in my last comment. Let's see

Thanks again for your time @fchollet

@atmguille atmguille requested a review from fchollet March 9, 2022 01:06
@google-ml-butler google-ml-butler bot added the keras-team-review-pending Pending review by a Keras team member. label Mar 9, 2022
Copy link
Member

@fchollet fchollet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@google-ml-butler google-ml-butler bot added kokoro:force-run ready to pull Ready to be merged into the codebase labels Mar 9, 2022
@fchollet fchollet removed the keras-team-review-pending Pending review by a Keras team member. label Mar 9, 2022
@atmguille
Copy link
Contributor Author

atmguille commented Mar 9, 2022

@fchollet I saw how the tests you requested worked but the tests that have just executed do not.
The log shows the following:
image
Maybe floating point operations errors being not deterministic? That would explain why your tests passed and these do not. Should we try to run them again? Let me know how I can help

Edit: I have checked the code of the test that fails and apparently it has nothing to do with my updates. The model being tested is this one

@atmguille atmguille requested a review from fchollet March 9, 2022 19:29
@google-ml-butler google-ml-butler bot added the keras-team-review-pending Pending review by a Keras team member. label Mar 9, 2022
@fchollet
Copy link
Member

I think this is possibly a case of flaky test. Let's try rerunning the tests.

@fchollet fchollet added kokoro:force-run and removed keras-team-review-pending Pending review by a Keras team member. labels Mar 10, 2022
@atmguille
Copy link
Contributor Author

The tests have passed! I am not sure if you need to approve my last review request, as the PR is "Approved by Reviewer" but it also shows that there is 1 pending reviewer.

Anyway, thanks a lot for your time! It was a good learning experience as my first contribution @fchollet

@copybara-service copybara-service bot merged commit 10995e7 into keras-team:master Mar 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready to pull Ready to be merged into the codebase size:M
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Memory inefficiency in RNNs
4 participants