-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
reclaiming memory for inference #897
Comments
I think these things that you mentioned makes sense. We also need to make sure such freeing or not allocating the memory for those training-related parts not happen in an automatic way when we are at inference mode. I mean that user don't need to specifically call a function like free_optimizer_and_scheduler to free those memory, but have an easy way of switching mode like eval mode in PyTorch. |
I agree! That would be nice indeed. But torch's |
yes, that can be a viable option, as we have to eventually control the checkpointing through deepspeed if we want to seamlessly switch between these two modes. I think we can hide all the necessary operations for the switching between inference and training modes, and the user still feels it's like switching a flag on and off. |
yes, please! |
While #896 solves the leak problem, ideally we should also have a new method to free all optimizer/scheduler related parts to pave wave for inference. In some environments like google colab general RAM is very scarce so every bit counts.
Here is one way to approach this:
with a new deepspeed method:
That way after training is done a lion part of the general RAM used by deepspeed is reclaimed. There are probably other bits to manually clean to reclaim even more.
Let me know if it sounds good to you and I will make another PR with this feature. We can in the future extend it if need be to support other things to benefit inference.
Thank you.
@jeffra, @RezaYazdaniAminabadi
The text was updated successfully, but these errors were encountered: