You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In my experience, asdl works well for differentiable Jacobian computation. However, I would hope that curvlinops eventually offers such functionality as well since asdl is not actively maintained and, specifically, the differentiability aspect is only available in branches that have not been merged. It also is not present anymore in asdfghjkl, where differentiability broke for some reason I couldn't figure out.
Computation with for-loop (for large K & don't need differentiability)
stack
(k, k)
More memory efficient than computing J(x)SJ(x)^T explicitly
from J(x) and S since we only store (k, 1) or (p, 1) tensor each time.
func_var=torch.stack([JSJT(v).detach() forvinI])
If only care about diag of func_var:
func_var=torch.stack([JSJT(v).detach()[i] for (i, v) inenumerate(I)])
Sampling
For sampling f(x) this can be done cheaply (see Laurence paper).
For LLMs, this might be better because we don't really care about the explicit J(x)SJ(x),
but only the resulting \int softmax(f(x)) N(f(x) | f_\theta(x), J(x)SJ(x)^T) df(x)
I.e. the costs is now wrt. number of samples, instead of K.
Further things
All of them can be optimized further depending on the form of S
vmap
We need a Jacobian backend that is:
The text was updated successfully, but these errors were encountered: