-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cuda Memory Overflow in Jacobian Computation #1058
Comments
We've been planning a feature to let users control the "vectorization" factor of the jacobian computation (#680). At one extreme, one can compute the jacobian row-by-row. At the other extreme we can use vmap to turn the for-loop into a vectorized computation for more performance (at the cost of using more peak memory). So there is a performance <-> memory tradeoff here. Today |
If the output size is much greater than the input size, then it's likely that |
Yes, I have tried the I understand that forward-mode autodiff is faster than reverse-mode, if the input size is smaller than the output size. But is forward-mode also more memory efficient?
|
Yeah,
|
I have a general question about automatic differentiation. I have a code base that computes the jacobian of the above function manually (derive the math expression of the jacobian and type it in the code), and the "manual differentiaion" does not have memory issue on a 24 GB GPU. Theoretically, does automatic differentiation have to cost more memory than manual differentiation when computing jacobian of vector functions? It looks like automatic differentiation needs to store all intermediate matrices and therefore might be more memory consuming? |
It depends on what exactly manual differentiation is. But yes reverse-mode AD needs to store intermediates and this will increase the memory usage. |
Hi,
I implemented a Jacobian computation using functorch, but encoutnered a memory overflow issue.
The function that I want to differentiate is
ResidualFunctional.residual
. I'd like to compute the Jacobian of this function w.r.t. its first argumentinputs
.The output of
ResidualFunctional.residual
is a tensor of size (10000, ) andinputs
is a tensor of size (1001, ). Thus, the Jacobian is 10000 by 1001, which takes about 74 MB using double precision.However,
functorch.jacrev
had a memory overflow error on a 24 GB GPU. The error message is shown below. I am wondering why FuncTorch takes so much memory in the reverse mode autodiff, and if there is a solution to this issue.Below is a working example that reproduce this issue.
CUDA 11.4
FuncTorch 1.13.0
PyTorch 1.13.0
GPyTorch 1.9.0
Thanks!
The text was updated successfully, but these errors were encountered: