Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression in comprehension support #1290

Open
ChrisRackauckas opened this issue Aug 20, 2022 · 12 comments
Open

Regression in comprehension support #1290

ChrisRackauckas opened this issue Aug 20, 2022 · 12 comments
Labels
bug Something isn't working

Comments

@ChrisRackauckas
Copy link
Member

MWE:

using Zygote

function loss_adjoint(p)
    prediction = p.*rand(2,100)
    prediction = [prediction[:, i] for i in axes(prediction, 2)]
    sum(sum.(prediction))
end

Zygote.gradient(loss_adjoint, ones(2))
ERROR: MethodError: no method matching +(::Base.RefValue{Any}, ::NamedTuple{(:contents,), Tuple{Matrix{Float64}}})
Closest candidates are:
  +(::Any, ::Any, ::Any, ::Any...) at operators.jl:591
  +(::ChainRulesCore.Tangent{P}, ::P) where P at C:\Users\accou\.julia\packages\ChainRulesCore\ctmSK\src\tangent_arithmetic.jl:146
  +(::VectorizationBase.CartesianVIndex, ::Any) at C:\Users\accou\.julia\packages\VectorizationBase\oCgEJ\src\cartesianvindex.jl:67
  ...
Stacktrace:
 [1] accum(x::Base.RefValue{Any}, y::NamedTuple{(:contents,), Tuple{Matrix{Float64}}})
   @ Zygote C:\Users\accou\.julia\packages\Zygote\xGkZ5\src\lib\lib.jl:17
 [2] accum(x::Base.RefValue{Any}, y::NamedTuple{(:contents,), Tuple{Matrix{Float64}}}, zs::Nothing)
   @ Zygote C:\Users\accou\.julia\packages\Zygote\xGkZ5\src\lib\lib.jl:22
 [3] Pullback
   @ c:\Users\accou\OneDrive\Computer\Desktop\test.jl:144 [inlined]
 [4] (::typeof(∂(loss_adjoint)))(Δ::Float64)
   @ Zygote C:\Users\accou\.julia\packages\Zygote\xGkZ5\src\compiler\interface2.jl:0
 [5] (::Zygote.var"#60#61"{typeof(∂(loss_adjoint))})(Δ::Float64)
   @ Zygote C:\Users\accou\.julia\packages\Zygote\xGkZ5\src\compiler\interface.jl:45
 [6] gradient(f::Function, args::Vector{Float64})
   @ Zygote C:\Users\accou\.julia\packages\Zygote\xGkZ5\src\compiler\interface.jl:97
 [7] top-level scope
   @ c:\Users\accou\OneDrive\Computer\Desktop\test.jl:149
@mcabbott
Copy link
Member

You say regression, but not on which version this is known to work.

It appears to be one of the cases where re-using a variable name (for something with a different type) confuses Zygote. This is (IMO) a bad idea for humans too, and avoiding it avoids the problem:

julia> function loss_adjoint2(p)
           prediction = p.*rand(2,100)
           prediction2 = [prediction[:, i] for i in axes(prediction, 2)]
           sum(sum.(prediction2))
       end
loss_adjoint2 (generic function with 1 method)

julia> Zygote.gradient(loss_adjoint2, ones(2))
([52.379754440153576, 53.055517322087056],)

(Also, eachcol will be more efficient.)

@mcabbott mcabbott added the bug Something isn't working label Aug 20, 2022
@ToucheSir
Copy link
Member

ToucheSir commented Aug 20, 2022

Testing in a fresh env with just Zygote, this works on 0.6.43. No other packages were changed between etsts. Looking into things now...

@mcabbott
Copy link
Member

mcabbott commented Aug 20, 2022

That's more concerning. I got as far as trying 0.6.0... note that internally it's still working harder when the names are confusing, but does not accumulate a wrong answer:

(@v1.5) pkg> add [email protected]

julia> function loss_adjoint1(p)  # re-uses name like the question, but has well-defined answer
           prediction = p.*ones(2,100)
           prediction = [prediction[:, i] for i in axes(prediction, 2)]
           sum(sum.(prediction))
       end
loss_adjoint1 (generic function with 1 method)

julia> function loss_adjoint3(p)  # does not re-use name, 3x faster, same answer
           prediction = p.*ones(2,100)
           prediction3 = [prediction[:, i] for i in axes(prediction, 2)]
           sum(sum.(prediction3))
       end
loss_adjoint3 (generic function with 1 method)

julia> @btime Zygote.gradient(loss_adjoint1, ones(2))
  643.416 μs (3483 allocations: 464.59 KiB)
([100.0, 100.0],)

julia> @btime Zygote.gradient(loss_adjoint3, ones(2))
  121.750 μs (1218 allocations: 415.53 KiB)
([100.0, 100.0],)

@ToucheSir
Copy link
Member

ToucheSir commented Aug 20, 2022

Reduced:

using Zygote

function loss_adjoint(p)
  prediction = 2p
  boxed_fn(i) = prediction^i
  # Trigger https://github.com/JuliaLang/julia/issues/15276
  prediction = boxed_fn(2)
  return prediction
end

Zygote.gradient(loss_adjoint, 1.0)

The error is coming from attempted accumulation of gradients for the inner closure (which contains a Core.Box). One is the expected (; prediction=Ref{Any}((; contents=...))), but the other has an unwrapped inner NamedTuple (; prediction=(; contents=...)).

With some logging in @adjoint literal_getproperty, we can get a clearer picture of what's going on:

┌ Info: getfield fwd
│   x = (::var"#boxed_fn#1") (generic function with 1 method)
│   f = :prediction
└   val = Core.Box(2.0)
┌ Info: getfield fwd
│   x = Core.Box(2.0)
│   f = :contents
└   val = 2.0
┌ Info: getfield fwd
│   x = Core.Box(4.0)
│   f = :contents
└   val = 4.0
┌ Info: getfield back mut
│   x = Core.Box(4.0)
│   Δ = 1.0
└   dx = Base.RefValue{Any}((contents = 1.0,))
┌ Info: getfield back mut
│   x = Core.Box(4.0)
│   Δ = 4.0
└   dx = Base.RefValue{Any}((contents = 4.0,))
┌ Info: getfield back imm
│   x = (::var"#boxed_fn#1") (generic function with 1 method)
│   Δ = Base.RefValue{Any}((contents = 4.0,))
│   dx = (prediction = Base.RefValue{Any}((contents = 4.0,)),)
└   _project(x, dx) = (prediction = (contents = 4.0,),)

_project is doing the recursive unwrapping of Refs in

z2d(dx::Ref, primal) = z2d(dx[], primal) # mutable structs
.

Why did this work before? Prior to #1248, Zygote was simply accumulating the Boxed closure field as an implicit param (even when they weren't being used) and returning nothing. This caused bugs with differentiating wrt. inner mutable types, e.g.:

julia> gradient(nt -> 2*nt.a.x, (; a=Ref(1.0)))
(nothing,)           # 0.6.43
((a = (x = 2.0,),),) # 0.6.44

Thinking about fixes, I feel _project round-tripping through CRC may be too lossy for this case. Perhaps we should have dedicated Z-Z projection machinery for mutable structs?

@axsk
Copy link

axsk commented Aug 23, 2022

I don't know whether my problem is the same, but it persists on 0.6.44 and 0.6.43.

function logvar(prob; ps_=prob.p, n=100)
    sum(msolve(prob, ps=ps_) for i in 1:n)
end

function dlogvar(prob, n=100)
    Zygote.gradient(ps_ ->logvar(prob; ps=ps_, n=n), prob.p)[1]
end

Here msolve essentially solves a Neural SDE and in dlogvar I want to differentiate wrt. the Lux model parameters ps.
I cannot make out any reuse of names (even added the extra _ to make sure), but there is also something wrong with accum and NamedTuples:

MethodError: no method matching +(::Tuple{}, ::NamedTuple{(), Tuple{}})
Closest candidates are:
  +(::Any, ::Any, ::Any, ::Any...) at operators.jl:591
  +(::VectorizationBase.CartesianVIndex, ::Any) at ~/.julia/packages/VectorizationBase/oCgEJ/src/cartesianvindex.jl:67
  +(::ChainRulesCore.Tangent{P}, ::P) where P at ~/.julia/packages/ChainRulesCore/ctmSK/src/tangent_arithmetic.jl:146
  ...
Stacktrace:
  [1] accum(x::Tuple{}, y::NamedTuple{(), Tuple{}})
    @ Zygote ~/.julia/packages/Zygote/DkIUK/src/lib/lib.jl:17
  [2] macro expansion
    @ ~/.julia/packages/Zygote/DkIUK/src/lib/lib.jl:27 [inlined]
  [3] accum(x::NamedTuple{(:data, :itr), Tuple{Tuple{}, Nothing}}, y::NamedTuple{(:data, :itr), Tuple{NamedTuple{(), Tuple{}}, Nothing}})
    @ Zygote ~/.julia/packages/Zygote/DkIUK/src/lib/lib.jl:27
  [4] macro expansion
    @ ~/.julia/packages/Zygote/DkIUK/src/lib/lib.jl:27 [inlined]
  [5] accum(x::NamedTuple{(:f, :g, :u0, :tspan, :p, :noise, :kwargs, :noise_rate_prototype, :seed), Tuple{Nothing, Nothing, Vector{Float64}, Nothing, Nothing, Nothing, NamedTuple{(:data, :itr), Tuple{Tuple{}, Nothing}}, Nothing, Nothing}}, y::NamedTuple{(:f, :g, :u0, :tspan, :p, :noise, :kwargs, :noise_rate_prototype, :seed), Tuple{Nothing, Nothing, Vector{Float64}, Nothing, Nothing, Nothing, NamedTuple{(:data, :itr), Tuple{NamedTuple{(), Tuple{}}, Nothing}}, Nothing, Nothing}})
    @ Zygote ~/.julia/packages/Zygote/DkIUK/src/lib/lib.jl:27
  [6] macro expansion
    @ ~/.julia/packages/Zygote/DkIUK/src/lib/lib.jl:27 [inlined]
  [7] accum(x::NamedTuple{(:ps, :prob), Tuple{Vector{Float32}, NamedTuple{(:f, :g, :u0, :tspan, :p, :noise, :kwargs, :noise_rate_prototype, :seed), Tuple{Nothing, Nothing, Vector{Float64}, Nothing, Nothing, Nothing, NamedTuple{(:data, :itr), Tuple{Tuple{}, Nothing}}, Nothing, Nothing}}}}, y::NamedTuple{(:ps, :prob), Tuple{Vector{Float32}, NamedTuple{(:f, :g, :u0, :tspan, :p, :noise, :kwargs, :noise_rate_prototype, :seed), Tuple{Nothing, Nothing, Vector{Float64}, Nothing, Nothing, Nothing, NamedTuple{(:data, :itr), Tuple{NamedTuple{(), Tuple{}}, Nothing}}, Nothing, Nothing}}}})
    @ Zygote ~/.julia/packages/Zygote/DkIUK/src/lib/lib.jl:27
  [8] macro expansion
    @ ~/.julia/packages/Zygote/DkIUK/src/lib/lib.jl:27 [inlined]
  [9] accum(x::NamedTuple{(:f, :rf), Tuple{NamedTuple{(:ps, :prob), Tuple{Vector{Float32}, NamedTuple{(:f, :g, :u0, :tspan, :p, :noise, :kwargs, :noise_rate_prototype, :seed), Tuple{Nothing, Nothing, Vector{Float64}, Nothing, Nothing, Nothing, NamedTuple{(:data, :itr), Tuple{Tuple{}, Nothing}}, Nothing, Nothing}}}}, Nothing}}, y::NamedTuple{(:f, :rf), Tuple{NamedTuple{(:ps, :prob), Tuple{Vector{Float32}, NamedTuple{(:f, :g, :u0, :tspan, :p, :noise, :kwargs, :noise_rate_prototype, :seed), Tuple{Nothing, Nothing, Vector{Float64}, Nothing, Nothing, Nothing, NamedTuple{(:data, :itr), Tuple{NamedTuple{(), Tuple{}}, Nothing}}, Nothing, Nothing}}}}, Nothing}})
    @ Zygote ~/.julia/packages/Zygote/DkIUK/src/lib/lib.jl:27
 [10] Pullback
    @ ./reduce.jl:62 [inlined]
 [11] (::typeof((_foldl_impl)))(Δ::Float64)
    @ Zygote ~/.julia/packages/Zygote/DkIUK/src/compiler/interface2.jl:0

It works for n=2.
Changing the comprehension for a loop doesn't help either

function logvar(prob; ps_=prob.p, n=100)
    #sum(msolve(prob, ps=ps_) for i in 1:n)
    x = 0.
    for i in 1:n
        x+=msolve(prob, ps=ps_)
    end
    x
end
MethodError: no method matching +(::Tuple{}, ::NamedTuple{(), Tuple{}})
Closest candidates are:
  +(::Any, ::Any, ::Any, ::Any...) at operators.jl:591
  +(::VectorizationBase.CartesianVIndex, ::Any) at ~/.julia/packages/VectorizationBase/oCgEJ/src/cartesianvindex.jl:67
  +(::ChainRulesCore.Tangent{P}, ::P) where P at ~/.julia/packages/ChainRulesCore/ctmSK/src/tangent_arithmetic.jl:146
  ...
Stacktrace:
  [1] accum(x::Tuple{}, y::NamedTuple{(), Tuple{}})
    @ Zygote ~/.julia/packages/Zygote/xGkZ5/src/lib/lib.jl:17
  [2] macro expansion
    @ ~/.julia/packages/Zygote/xGkZ5/src/lib/lib.jl:27 [inlined]
  [3] accum(x::NamedTuple{(:data, :itr), Tuple{Tuple{}, Nothing}}, y::NamedTuple{(:data, :itr), Tuple{NamedTuple{(), Tuple{}}, Nothing}})
    @ Zygote ~/.julia/packages/Zygote/xGkZ5/src/lib/lib.jl:27
  [4] macro expansion
    @ ~/.julia/packages/Zygote/xGkZ5/src/lib/lib.jl:27 [inlined]
  [5] accum(x::NamedTuple{(:f, :g, :u0, :tspan, :p, :noise, :kwargs, :noise_rate_prototype, :seed), Tuple{Nothing, Nothing, Vector{Float64}, Nothing, Nothing, Nothing, NamedTuple{(:data, :itr), Tuple{Tuple{}, Nothing}}, Nothing, Nothing}}, y::NamedTuple{(:f, :g, :u0, :tspan, :p, :noise, :kwargs, :noise_rate_prototype, :seed), Tuple{Nothing, Nothing, Vector{Float64}, Nothing, Nothing, Nothing, NamedTuple{(:data, :itr), Tuple{NamedTuple{(), Tuple{}}, Nothing}}, Nothing, Nothing}})
    @ Zygote ~/.julia/packages/Zygote/xGkZ5/src/lib/lib.jl:27
  [6] Pullback
    @ ~/code/OptImpSampling.jl/src/logvar.jl:87 [inlined]

@mcabbott
Copy link
Member

@aksk this looks unrelated to me. This line

[3] accum(x::NamedTuple{(:data, :itr), Tuple{Tuple{}, Nothing}}, y::NamedTuple{(:data, :itr), Tuple{NamedTuple{(), Tuple{}}, Nothing}})

says that something is producing both (), nothing and (;), nothing as gradients for the same object, which has field names :data, :itr. That shouldn't be possible, but might be an easy bug to introduce by something like having two f(x::T...) methods and calling f() somewhere. If you have a MWE I'm sure this can be hunted down.

@axsk
Copy link

axsk commented Aug 23, 2022

In that case I am sorry for the noise and thank you for your suggestions. I checked my code (not a true MWE, but only 80 lines) for them but couldn't find anything the like :(

@ToucheSir
Copy link
Member

StochasticDiffEq, SciMLSensitivity and Lux are a touch over 80 lines :P. You may have more luck rewriting the sum over generator to sum(_ -> msolve(prob, ps=ps_), 1:n). Usually this is both faster and less brittle.

@ToucheSir
Copy link
Member

Just so #1290 (comment) doesn't get buried, here is a summary of the issue and proposal for how to address it.

For many rules, Zygote uses _project to try to harmonize input and differential types. That roundtrips through CRC.ProjectTo. One consequence of using _project is that it will recursively unwrap Zygote's own representation for mutable structural tangents (Ref{Any}(NamedTuple(...))) into just the wrapped NamedTuple. This makes a lot of sense for the user-facing interface, but if done between rules it can lead to scenarios where one unwraps immutabletangent(mutabletangent(...)) -> immutabletangent(immutabletangent(...)) and tries to accumulate it with an unwrapped tangent.

Why was this not showing up before? Simply speaking, Zygote was throwing away a lot of gradients of mutable types because they took the same code path as implicit params (shoved in the cache and summarily ignored). That doesn't seem to have lead to major issues, but it is incorrect if you wanted said gradients at any point. 0.6.44 fixed this by not hitting the implicit param path unless one specifically asks for it, but it turned up this unforeseen edge case in _project.

What can be done? I think the most straightforward solution is to have separate projection routines for what is used at the user <-> AD boundary (i.e. pullback) and what is used internally. Alternative ideas are very much welcome too. The only constraint I can think of is that gradient must never return Ref{Any} at the top-level or nested inside the grads unless the original argument had a Ref there as well.

@ChrisRackauckas
Copy link
Member Author

Is #1304 the same?

@ToucheSir
Copy link
Member

ToucheSir commented Sep 9, 2022

Most likely, yes. Since that doesn't involve Core.Box though, it's hard to say for certain without a MWE.

@DhairyaLGandhi
Copy link
Member

DhairyaLGandhi commented Sep 9, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants