-
-
Notifications
You must be signed in to change notification settings - Fork 212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to write Jacobian and Hessian functions stripped of ForwardDiff #865
Comments
It would be better if the gradient in #747 used However, if re-writing the The answer to 1. is that reverse-mode AD works well on (say) 100 -> 1 dim functions, i.e. (large) vector -> scalar. But the gradient being differentiated to make the Hessian is a 100 -> 100 dim function, its output is the same size as its input. Then the ideal complexity of forward & reverse-mode Jacobians should be similar (I think), but in practice forward is easier & more efficient. Zygote actually has its own forward-mode, but relatively young & much less well tested, so it still uses ForwardDiff, which is the opposite. (Although complex numbers in ForwardDiff also only started working recently, I think.) But if neither of these support FFT, then using the reverse-mode Jacobian is probably the shortest path to getting this working at all. |
Thank you for your help Michael. I'm not sure how to replace the matrix with
I also precomputed e_i in a matrix, to avoid mutating e_i.
I'm still getting the same error:
|
I had in mind something like this, but I don't know how much this change matters:
I think this is the same error you have above. The gradient of indexing works by filling in an array, and attempting to differentiate that doesn't work at all. It might not be super-hard to fix by adding some But I'm not sure your original function should involve indexing at all. It may involve other things which don't work, but FFT itself has a gradient which is other FFT functions, so perhaps that can work?
|
Thank you so much for the detailed help! You are right that I shouldn't use indexing for the toy objective function. Your function removed the mutating array error. However, when I tried on a more fitting toy function
I'm getting a new error:
I got the same error on my original objective function so this toy function is probably representative of my problem. I couldn't find anything related to this error. Do you happen to know what might cause this? Thank you! |
I don't have great suggestions, except trying variations to narrow things down. Some FFT things work, weirdly when I add broadcasting (with just a scalar, but not with a literal) then it breaks. Also,
Would be curious if you get the same error with |
That makes sense. It looks like because there are tons of elementwise matrix multiplication in my function, the Do you think this route is worth pursuing? Or should I just stick to writing methods for Dual Numbers instead? |
At this point I think you should try the dual numbers! That's certainly solvable, whereas making 2nd derivatives work here (reverse mode) may be a bottomless pit of hard problems. |
890: Add `jacobian`, at last? r=CarloLucibello a=mcabbott This adds a Jacobian function. Compared to #747 this one: * has tests * accepts multiple arguments * shouldn't fail if `back` returns `nothing` * inserts `vec` a few more places * has a method which accepts implicit `Params`, and returns `Grads`. (This was the only part I actually needed in real life.) * now works on the GPU too! Compared to #414 this one: * always inserts `vec`, never makes higher-dimensional arrays * runs on current Zygote * has tests. Compared to #235 this one: * doesn't try to provide numerical jacobian * doesn't alter testing infrastructure * doesn't provide `jacobian!`, nor any code for structured matrices. This does not address #564's concerns about functions which return a tuple of arrays. Only functions returning an array (or a scalar) are permitted. Similar considerations might give sensible jacobians when the argument of a function is a tuple, or some other struct, but for now these are handled by putting up a giant warning sign. Nothing in the file `utils.jl` seems to have any tests at all. So I also added tests for `hessian`. And, while I was there, made `hessian` actually accept a real number like its docstring promises. (Hence this closes #891.) And, made a version that is reverse-over-reverse, using this `jacobian`, which works less well of course but may as well exist to test things. (See for example #865.) Ideally there would be a pure-Zygote version using its own forward mode, but I didn't write that. Fixes #51, fixes #98, fixes #413. Closes #747. Co-authored-by: Michael Abbott <me@escbook> Co-authored-by: Michael Abbott <[email protected]>
ForwardDiff doesn't seem to support FFT or complex-type, but Zygote.gradient does. Per this discussion,
https://discourse.julialang.org/t/forwarddiff-and-zygote-cannot-automatically-differentiate-ad-function-from-c-n-to-r-that-uses-fft/52440/4?u=tholdem
I was made aware of that Zygote.hessian uses ForwardDiff. This prevents Zygote to AD the Hessian of my objective function involving FFT and complex-numbers. I don't have much experience with these things and I'm new to Julia. So I don't really know if I can pull off writing a Hessian function myself using Zygote only. I was wondering 1. why would Zygote call ForwardDiff instead of its own functions? 2. if there is more guidance or resource on how to write a Hessian AD function that calls Zygote only that supports complex-numbers and FFT? Thank you so much for your help.
For example, this code doesn't work because Zygote doesn't support mutating arrays, and it's unclear whether this code handles complex-numbers.
There seems to be a workaround for mutating arrays, https://github.com/rakeshvar/Zygote-Mutating-Arrays-WorkAround.jl, but it is much harder to find the gradient of the mutating steps in the code above so it would be very daunting to find a workaround myself.
The text was updated successfully, but these errors were encountered: