Regression in comprehension support #1290

ChrisRackauckas · 2022-08-20T17:07:22Z

MWE:

using Zygote

function loss_adjoint(p)
    prediction = p.*rand(2,100)
    prediction = [prediction[:, i] for i in axes(prediction, 2)]
    sum(sum.(prediction))
end

Zygote.gradient(loss_adjoint, ones(2))

ERROR: MethodError: no method matching +(::Base.RefValue{Any}, ::NamedTuple{(:contents,), Tuple{Matrix{Float64}}})
Closest candidates are:
  +(::Any, ::Any, ::Any, ::Any...) at operators.jl:591
  +(::ChainRulesCore.Tangent{P}, ::P) where P at C:\Users\accou\.julia\packages\ChainRulesCore\ctmSK\src\tangent_arithmetic.jl:146
  +(::VectorizationBase.CartesianVIndex, ::Any) at C:\Users\accou\.julia\packages\VectorizationBase\oCgEJ\src\cartesianvindex.jl:67
  ...
Stacktrace:
 [1] accum(x::Base.RefValue{Any}, y::NamedTuple{(:contents,), Tuple{Matrix{Float64}}})
   @ Zygote C:\Users\accou\.julia\packages\Zygote\xGkZ5\src\lib\lib.jl:17
 [2] accum(x::Base.RefValue{Any}, y::NamedTuple{(:contents,), Tuple{Matrix{Float64}}}, zs::Nothing)
   @ Zygote C:\Users\accou\.julia\packages\Zygote\xGkZ5\src\lib\lib.jl:22
 [3] Pullback
   @ c:\Users\accou\OneDrive\Computer\Desktop\test.jl:144 [inlined]
 [4] (::typeof(∂(loss_adjoint)))(Δ::Float64)
   @ Zygote C:\Users\accou\.julia\packages\Zygote\xGkZ5\src\compiler\interface2.jl:0
 [5] (::Zygote.var"#60#61"{typeof(∂(loss_adjoint))})(Δ::Float64)
   @ Zygote C:\Users\accou\.julia\packages\Zygote\xGkZ5\src\compiler\interface.jl:45
 [6] gradient(f::Function, args::Vector{Float64})
   @ Zygote C:\Users\accou\.julia\packages\Zygote\xGkZ5\src\compiler\interface.jl:97
 [7] top-level scope
   @ c:\Users\accou\OneDrive\Computer\Desktop\test.jl:149

The text was updated successfully, but these errors were encountered:

mcabbott · 2022-08-20T19:38:24Z

You say regression, but not on which version this is known to work.

It appears to be one of the cases where re-using a variable name (for something with a different type) confuses Zygote. This is (IMO) a bad idea for humans too, and avoiding it avoids the problem:

julia> function loss_adjoint2(p)
           prediction = p.*rand(2,100)
           prediction2 = [prediction[:, i] for i in axes(prediction, 2)]
           sum(sum.(prediction2))
       end
loss_adjoint2 (generic function with 1 method)

julia> Zygote.gradient(loss_adjoint2, ones(2))
([52.379754440153576, 53.055517322087056],)

(Also, eachcol will be more efficient.)

ToucheSir · 2022-08-20T20:27:49Z

Testing in a fresh env with just Zygote, this works on 0.6.43. No other packages were changed between etsts. Looking into things now...

mcabbott · 2022-08-20T20:39:48Z

That's more concerning. I got as far as trying 0.6.0... note that internally it's still working harder when the names are confusing, but does not accumulate a wrong answer:

(@v1.5) pkg> add [email protected]

julia> function loss_adjoint1(p)  # re-uses name like the question, but has well-defined answer
           prediction = p.*ones(2,100)
           prediction = [prediction[:, i] for i in axes(prediction, 2)]
           sum(sum.(prediction))
       end
loss_adjoint1 (generic function with 1 method)

julia> function loss_adjoint3(p)  # does not re-use name, 3x faster, same answer
           prediction = p.*ones(2,100)
           prediction3 = [prediction[:, i] for i in axes(prediction, 2)]
           sum(sum.(prediction3))
       end
loss_adjoint3 (generic function with 1 method)

julia> @btime Zygote.gradient(loss_adjoint1, ones(2))
  643.416 μs (3483 allocations: 464.59 KiB)
([100.0, 100.0],)

julia> @btime Zygote.gradient(loss_adjoint3, ones(2))
  121.750 μs (1218 allocations: 415.53 KiB)
([100.0, 100.0],)

ToucheSir · 2022-08-20T23:48:39Z

Reduced:

using Zygote

function loss_adjoint(p)
  prediction = 2p
  boxed_fn(i) = prediction^i
  # Trigger https://github.com/JuliaLang/julia/issues/15276
  prediction = boxed_fn(2)
  return prediction
end

Zygote.gradient(loss_adjoint, 1.0)

The error is coming from attempted accumulation of gradients for the inner closure (which contains a Core.Box). One is the expected (; prediction=Ref{Any}((; contents=...))), but the other has an unwrapped inner NamedTuple (; prediction=(; contents=...)).

With some logging in @adjoint literal_getproperty, we can get a clearer picture of what's going on:

┌ Info: getfield fwd
│   x = (::var"#boxed_fn#1") (generic function with 1 method)
│   f = :prediction
└   val = Core.Box(2.0)
┌ Info: getfield fwd
│   x = Core.Box(2.0)
│   f = :contents
└   val = 2.0
┌ Info: getfield fwd
│   x = Core.Box(4.0)
│   f = :contents
└   val = 4.0
┌ Info: getfield back mut
│   x = Core.Box(4.0)
│   Δ = 1.0
└   dx = Base.RefValue{Any}((contents = 1.0,))
┌ Info: getfield back mut
│   x = Core.Box(4.0)
│   Δ = 4.0
└   dx = Base.RefValue{Any}((contents = 4.0,))
┌ Info: getfield back imm
│   x = (::var"#boxed_fn#1") (generic function with 1 method)
│   Δ = Base.RefValue{Any}((contents = 4.0,))
│   dx = (prediction = Base.RefValue{Any}((contents = 4.0,)),)
└   _project(x, dx) = (prediction = (contents = 4.0,),)

_project is doing the recursive unwrapping of Refs in

Zygote.jl/src/compiler/chainrules.jl

Line 347 in 328eb4d

z2d(dx::Ref, primal) = z2d(dx[], primal) # mutable structs

.

Why did this work before? Prior to #1248, Zygote was simply accumulating the Boxed closure field as an implicit param (even when they weren't being used) and returning nothing. This caused bugs with differentiating wrt. inner mutable types, e.g.:

julia> gradient(nt -> 2*nt.a.x, (; a=Ref(1.0)))
(nothing,)           # 0.6.43
((a = (x = 2.0,),),) # 0.6.44

Thinking about fixes, I feel _project round-tripping through CRC may be too lossy for this case. Perhaps we should have dedicated Z-Z projection machinery for mutable structs?

axsk · 2022-08-23T12:56:49Z

I don't know whether my problem is the same, but it persists on 0.6.44 and 0.6.43.

function logvar(prob; ps_=prob.p, n=100)
    sum(msolve(prob, ps=ps_) for i in 1:n)
end

function dlogvar(prob, n=100)
    Zygote.gradient(ps_ ->logvar(prob; ps=ps_, n=n), prob.p)[1]
end

Here msolve essentially solves a Neural SDE and in dlogvar I want to differentiate wrt. the Lux model parameters ps.
I cannot make out any reuse of names (even added the extra _ to make sure), but there is also something wrong with accum and NamedTuples:

MethodError: no method matching +(::Tuple{}, ::NamedTuple{(), Tuple{}})
Closest candidates are:
  +(::Any, ::Any, ::Any, ::Any...) at operators.jl:591
  +(::VectorizationBase.CartesianVIndex, ::Any) at ~/.julia/packages/VectorizationBase/oCgEJ/src/cartesianvindex.jl:67
  +(::ChainRulesCore.Tangent{P}, ::P) where P at ~/.julia/packages/ChainRulesCore/ctmSK/src/tangent_arithmetic.jl:146
  ...
Stacktrace:
  [1] accum(x::Tuple{}, y::NamedTuple{(), Tuple{}})
    @ Zygote ~/.julia/packages/Zygote/DkIUK/src/lib/lib.jl:17
  [2] macro expansion
    @ ~/.julia/packages/Zygote/DkIUK/src/lib/lib.jl:27 [inlined]
  [3] accum(x::NamedTuple{(:data, :itr), Tuple{Tuple{}, Nothing}}, y::NamedTuple{(:data, :itr), Tuple{NamedTuple{(), Tuple{}}, Nothing}})
    @ Zygote ~/.julia/packages/Zygote/DkIUK/src/lib/lib.jl:27
  [4] macro expansion
    @ ~/.julia/packages/Zygote/DkIUK/src/lib/lib.jl:27 [inlined]
  [5] accum(x::NamedTuple{(:f, :g, :u0, :tspan, :p, :noise, :kwargs, :noise_rate_prototype, :seed), Tuple{Nothing, Nothing, Vector{Float64}, Nothing, Nothing, Nothing, NamedTuple{(:data, :itr), Tuple{Tuple{}, Nothing}}, Nothing, Nothing}}, y::NamedTuple{(:f, :g, :u0, :tspan, :p, :noise, :kwargs, :noise_rate_prototype, :seed), Tuple{Nothing, Nothing, Vector{Float64}, Nothing, Nothing, Nothing, NamedTuple{(:data, :itr), Tuple{NamedTuple{(), Tuple{}}, Nothing}}, Nothing, Nothing}})
    @ Zygote ~/.julia/packages/Zygote/DkIUK/src/lib/lib.jl:27
  [6] macro expansion
    @ ~/.julia/packages/Zygote/DkIUK/src/lib/lib.jl:27 [inlined]
  [7] accum(x::NamedTuple{(:ps, :prob), Tuple{Vector{Float32}, NamedTuple{(:f, :g, :u0, :tspan, :p, :noise, :kwargs, :noise_rate_prototype, :seed), Tuple{Nothing, Nothing, Vector{Float64}, Nothing, Nothing, Nothing, NamedTuple{(:data, :itr), Tuple{Tuple{}, Nothing}}, Nothing, Nothing}}}}, y::NamedTuple{(:ps, :prob), Tuple{Vector{Float32}, NamedTuple{(:f, :g, :u0, :tspan, :p, :noise, :kwargs, :noise_rate_prototype, :seed), Tuple{Nothing, Nothing, Vector{Float64}, Nothing, Nothing, Nothing, NamedTuple{(:data, :itr), Tuple{NamedTuple{(), Tuple{}}, Nothing}}, Nothing, Nothing}}}})
    @ Zygote ~/.julia/packages/Zygote/DkIUK/src/lib/lib.jl:27
  [8] macro expansion
    @ ~/.julia/packages/Zygote/DkIUK/src/lib/lib.jl:27 [inlined]
  [9] accum(x::NamedTuple{(:f, :rf), Tuple{NamedTuple{(:ps, :prob), Tuple{Vector{Float32}, NamedTuple{(:f, :g, :u0, :tspan, :p, :noise, :kwargs, :noise_rate_prototype, :seed), Tuple{Nothing, Nothing, Vector{Float64}, Nothing, Nothing, Nothing, NamedTuple{(:data, :itr), Tuple{Tuple{}, Nothing}}, Nothing, Nothing}}}}, Nothing}}, y::NamedTuple{(:f, :rf), Tuple{NamedTuple{(:ps, :prob), Tuple{Vector{Float32}, NamedTuple{(:f, :g, :u0, :tspan, :p, :noise, :kwargs, :noise_rate_prototype, :seed), Tuple{Nothing, Nothing, Vector{Float64}, Nothing, Nothing, Nothing, NamedTuple{(:data, :itr), Tuple{NamedTuple{(), Tuple{}}, Nothing}}, Nothing, Nothing}}}}, Nothing}})
    @ Zygote ~/.julia/packages/Zygote/DkIUK/src/lib/lib.jl:27
 [10] Pullback
    @ ./reduce.jl:62 [inlined]
 [11] (::typeof(∂(_foldl_impl)))(Δ::Float64)
    @ Zygote ~/.julia/packages/Zygote/DkIUK/src/compiler/interface2.jl:0

It works for n=2.
Changing the comprehension for a loop doesn't help either

function logvar(prob; ps_=prob.p, n=100)
    #sum(msolve(prob, ps=ps_) for i in 1:n)
    x = 0.
    for i in 1:n
        x+=msolve(prob, ps=ps_)
    end
    x
end

MethodError: no method matching +(::Tuple{}, ::NamedTuple{(), Tuple{}})
Closest candidates are:
  +(::Any, ::Any, ::Any, ::Any...) at operators.jl:591
  +(::VectorizationBase.CartesianVIndex, ::Any) at ~/.julia/packages/VectorizationBase/oCgEJ/src/cartesianvindex.jl:67
  +(::ChainRulesCore.Tangent{P}, ::P) where P at ~/.julia/packages/ChainRulesCore/ctmSK/src/tangent_arithmetic.jl:146
  ...
Stacktrace:
  [1] accum(x::Tuple{}, y::NamedTuple{(), Tuple{}})
    @ Zygote ~/.julia/packages/Zygote/xGkZ5/src/lib/lib.jl:17
  [2] macro expansion
    @ ~/.julia/packages/Zygote/xGkZ5/src/lib/lib.jl:27 [inlined]
  [3] accum(x::NamedTuple{(:data, :itr), Tuple{Tuple{}, Nothing}}, y::NamedTuple{(:data, :itr), Tuple{NamedTuple{(), Tuple{}}, Nothing}})
    @ Zygote ~/.julia/packages/Zygote/xGkZ5/src/lib/lib.jl:27
  [4] macro expansion
    @ ~/.julia/packages/Zygote/xGkZ5/src/lib/lib.jl:27 [inlined]
  [5] accum(x::NamedTuple{(:f, :g, :u0, :tspan, :p, :noise, :kwargs, :noise_rate_prototype, :seed), Tuple{Nothing, Nothing, Vector{Float64}, Nothing, Nothing, Nothing, NamedTuple{(:data, :itr), Tuple{Tuple{}, Nothing}}, Nothing, Nothing}}, y::NamedTuple{(:f, :g, :u0, :tspan, :p, :noise, :kwargs, :noise_rate_prototype, :seed), Tuple{Nothing, Nothing, Vector{Float64}, Nothing, Nothing, Nothing, NamedTuple{(:data, :itr), Tuple{NamedTuple{(), Tuple{}}, Nothing}}, Nothing, Nothing}})
    @ Zygote ~/.julia/packages/Zygote/xGkZ5/src/lib/lib.jl:27
  [6] Pullback
    @ ~/code/OptImpSampling.jl/src/logvar.jl:87 [inlined]

mcabbott · 2022-08-23T13:12:19Z

@aksk this looks unrelated to me. This line

[3] accum(x::NamedTuple{(:data, :itr), Tuple{Tuple{}, Nothing}}, y::NamedTuple{(:data, :itr), Tuple{NamedTuple{(), Tuple{}}, Nothing}})

says that something is producing both (), nothing and (;), nothing as gradients for the same object, which has field names :data, :itr. That shouldn't be possible, but might be an easy bug to introduce by something like having two f(x::T...) methods and calling f() somewhere. If you have a MWE I'm sure this can be hunted down.

axsk · 2022-08-23T15:39:40Z

In that case I am sorry for the noise and thank you for your suggestions. I checked my code (not a true MWE, but only 80 lines) for them but couldn't find anything the like :(

ToucheSir · 2022-08-24T02:15:49Z

StochasticDiffEq, SciMLSensitivity and Lux are a touch over 80 lines :P. You may have more luck rewriting the sum over generator to sum(_ -> msolve(prob, ps=ps_), 1:n). Usually this is both faster and less brittle.

ToucheSir · 2022-09-09T00:18:26Z

Just so #1290 (comment) doesn't get buried, here is a summary of the issue and proposal for how to address it.

For many rules, Zygote uses _project to try to harmonize input and differential types. That roundtrips through CRC.ProjectTo. One consequence of using _project is that it will recursively unwrap Zygote's own representation for mutable structural tangents (Ref{Any}(NamedTuple(...))) into just the wrapped NamedTuple. This makes a lot of sense for the user-facing interface, but if done between rules it can lead to scenarios where one unwraps immutabletangent(mutabletangent(...)) -> immutabletangent(immutabletangent(...)) and tries to accumulate it with an unwrapped tangent.

Why was this not showing up before? Simply speaking, Zygote was throwing away a lot of gradients of mutable types because they took the same code path as implicit params (shoved in the cache and summarily ignored). That doesn't seem to have lead to major issues, but it is incorrect if you wanted said gradients at any point. 0.6.44 fixed this by not hitting the implicit param path unless one specifically asks for it, but it turned up this unforeseen edge case in _project.

What can be done? I think the most straightforward solution is to have separate projection routines for what is used at the user <-> AD boundary (i.e. pullback) and what is used internally. Alternative ideas are very much welcome too. The only constraint I can think of is that gradient must never return Ref{Any} at the top-level or nested inside the grads unless the original argument had a Ref there as well.

ChrisRackauckas · 2022-09-09T00:31:28Z

Is #1304 the same?

ToucheSir · 2022-09-09T00:54:41Z

Most likely, yes. Since that doesn't involve Core.Box though, it's hard to say for certain without a MWE.

DhairyaLGandhi · 2022-09-09T09:57:50Z

If the motivation is to unify the return types to the user then we can unwrap refs as a last pass. The "implicit params code paths" would be something to investigate, do you have specific instances of those in mind? Adding projection primitives is one way, but it's not clear whether they solve the issue. It's better to have conversion rules that Julia can use instead.

…

On Fri, Sep 9, 2022, 06:24 Brian Chen ***@***.***> wrote: Most likely, yes. — Reply to this email directly, view it on GitHub <#1290 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AJOZVVOEOZ6TPES5LNZHA5LV5KDF5ANCNFSM57DQE6OQ> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

mcabbott added the bug Something isn't working label Aug 20, 2022

axsk mentioned this issue Aug 24, 2022

Something wrong with empty (Named-)Tuples and generators #1294

Open

mcabbott mentioned this issue Sep 4, 2022

Accumulate NamedTuple + Tangent JuliaDiff/Diffractor.jl#88

Merged

ToucheSir mentioned this issue Oct 1, 2022

ERROR: MethodError: no method matching +(::Base.RefValue{Any}, ::NamedTuple{(:contents,), #1317

Closed

CarloLucibello mentioned this issue Nov 16, 2022

Avoid capturing external variables into closures JuliaGraphs/GraphNeuralNetworks.jl#225

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regression in comprehension support #1290

Regression in comprehension support #1290

ChrisRackauckas commented Aug 20, 2022

mcabbott commented Aug 20, 2022

ToucheSir commented Aug 20, 2022 •

edited

Loading

mcabbott commented Aug 20, 2022 •

edited

Loading

ToucheSir commented Aug 20, 2022 •

edited

Loading

axsk commented Aug 23, 2022

mcabbott commented Aug 23, 2022

axsk commented Aug 23, 2022 •

edited

Loading

ToucheSir commented Aug 24, 2022

ToucheSir commented Sep 9, 2022

ChrisRackauckas commented Sep 9, 2022

ToucheSir commented Sep 9, 2022 •

edited

Loading

DhairyaLGandhi commented Sep 9, 2022 via email

Regression in comprehension support #1290

Regression in comprehension support #1290

Comments

ChrisRackauckas commented Aug 20, 2022

mcabbott commented Aug 20, 2022

ToucheSir commented Aug 20, 2022 • edited Loading

mcabbott commented Aug 20, 2022 • edited Loading

ToucheSir commented Aug 20, 2022 • edited Loading

axsk commented Aug 23, 2022

mcabbott commented Aug 23, 2022

axsk commented Aug 23, 2022 • edited Loading

ToucheSir commented Aug 24, 2022

ToucheSir commented Sep 9, 2022

ChrisRackauckas commented Sep 9, 2022

ToucheSir commented Sep 9, 2022 • edited Loading

DhairyaLGandhi commented Sep 9, 2022 via email

ToucheSir commented Aug 20, 2022 •

edited

Loading

mcabbott commented Aug 20, 2022 •

edited

Loading

ToucheSir commented Aug 20, 2022 •

edited

Loading

axsk commented Aug 23, 2022 •

edited

Loading

ToucheSir commented Sep 9, 2022 •

edited

Loading