Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

workaround for more complicated gradient #1

Closed
tholdem opened this issue Dec 31, 2020 · 6 comments
Closed

workaround for more complicated gradient #1

tholdem opened this issue Dec 31, 2020 · 6 comments

Comments

@tholdem
Copy link

tholdem commented Dec 31, 2020

Hello,

Thank you for this workaround it was very helpful for my understanding of the same bug. However, I'm trying to write AD Hessian function using only Zygote to avoid bugs I got in ForwardDiff. The function unfortunately is a bit more complicated than matrix multiplication, and I'm not sure if finding the gradient ourselves is feasible.

function jacobian(f,x)
    y,back  = Zygote.pullback(f,x)
    k  = length(y)
    n  = length(x)
    J  = Matrix{eltype(y)}(undef,k,n)
    e_mat = Matrix(I,k,k)
    @inbounds for i = 1:k
        J[i,:] = back(e_mat[:,i])[1]
    end
    (J,)
end

hessian(f, x) = jacobian(x -> gradient(f, x)[1], x)

I got the following bug:

Mutating arrays is not supported

    error(::String)@error.jl:33
    (::Zygote.var"#364#365")(::Nothing)@array.jl:58
    (::Zygote.var"#2245#back#366"{Zygote.var"#364#365"})(::Nothing)@adjoint.jl:59
    (::Zygote.var"#150#151"{Zygote.var"#2245#back#366"{Zygote.var"#364#365"},Tuple{Tuple{Nothing,Nothing},Tuple{Nothing}}})(::Nothing)@lib.jl:191
    (::Zygote.var"#1693#back#152"{Zygote.var"#150#151"{Zygote.var"#2245#back#366"{Zygote.var"#364#365"},Tuple{Tuple{Nothing,Nothing},Tuple{Nothing}}}})(::Nothing)@adjoint.jl:59
    #[email protected]:38[inlined]
    (::typeof(∂(λ)))(::Tuple{Array{Bool,2},Nothing})@interface2.jl:0
    #2209#[email protected]:59[inlined]
    (::typeof(∂(λ)))(::Tuple{Nothing,Array{Bool,2},Nothing})@interface2.jl:0
    [email protected]:15[inlined]
    (::typeof(∂(λ)))(::Tuple{Nothing,Array{Bool,2},Nothing})@interface2.jl:0
    F@Other: 1[inlined]
    (::typeof(∂(λ)))(::Tuple{Nothing,Array{Bool,1}})@interface2.jl:0
    #[email protected]:40[inlined]
    (::typeof(∂(λ)))(::Tuple{Array{Bool,1}})@interface2.jl:0
    [email protected]:49[inlined]
    (::typeof(∂(gradient)))(::Tuple{Array{Bool,1}})@interface2.jl:0
    #1@Local: 1[inlined]
    (::typeof(∂(#1)))(::Array{Bool,1})@interface2.jl:0
    (::Zygote.var"#41#42"{typeof(∂(#1))})(::Array{Bool,1})@interface.jl:40
    jacobian(::Function, ::Array{Float64,2})@Other: 8
    top-level scope@Local: 1[inlined]

Because I don't know what the gradient of back function from Zygote.pullback is, I cannot think of a way to not use mutating array to accomplish creating the Jacobian. I was wondering if you might have any insights on this. Thank you so much, and happy new year!

@rakeshvar
Copy link
Owner

There should be a much simpler way of doing this.
Look at the Hessian Function from Zygote itself.

@tholdem
Copy link
Author

tholdem commented Jan 1, 2021

Sorry if I wasn't clear, I'm trying to avoid using Zygote.hessian because it uses functions from ForwardDiff and is not purely Zygote, thus giving me lots of difficult bugs that would not happen if I use a purely Zygote hessian function.

@rakeshvar
Copy link
Owner

Hmmm... you are right. I did not know that.
The st. forward way to do this would be to define h(x) as in the example below. But that is error-ing out. This seems to be a fundamental limitation of Zygote that can not be surmounted by the trick you are using above. If it could be they would not be using ForwardDiff in the first place. May be we can raise a ticket there or in discourse and see.

> n = 3
> A = reshape(0:(n^2-1), n, n) .% (n+1)
3×3 Array{Int64,2}:
 0  3  2
 1  0  3
 2  1  0

> H = 2*A'*A
3×3 Array{Int64,2}:
 10   4   6
  4  20  12
  6  12  26

> x1 = collect(0:(n-1))
3-element Array{Int64,1}:
 0
 1
 2

> f(x) = sum(abs2, A*x)
> g(x) = Zygote.gradient(x_ -> f(x_), x)[1]
> h(x) = Zygote.gradient(x_ -> g(x_), x)[1]
> f(x1)
86

> [g(x1) f'(x1)]
3×2 Array{Int64,2}:
 16  16
 44  44
 64  64

> [Zygote.hessian(f, x1) H]
3×6 Array{Int64,2}:
 10   4   6  10   4   6
  4  20  12   4  20  12
  6  12  26   6  12  26

> h(x1)
ERROR: BoundsError

@tholdem
Copy link
Author

tholdem commented Jan 1, 2021

That makes sense. I've already opened an issue here FluxML/Zygote.jl#865 but I will give Discourse a try too. Thank you for your help!

@rakeshvar
Copy link
Owner

I already started a discussion on discourse here. I did not think through some obvious things before posting there. It let to long discussions there which are very insightful. 🙂

@tholdem
Copy link
Author

tholdem commented Jan 2, 2021

I really appreciate your help! Yes I actually saw this earlier and was fascinated by the technical details. In short ForwardDiff is the right tool for Zygote.hessian, I just need to figure out a way to make it compatible with my function. Definitely worth the time to dig into more. Cheers!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants