-
-
Notifications
You must be signed in to change notification settings - Fork 213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regression in comprehension support #1290
Comments
You say regression, but not on which version this is known to work. It appears to be one of the cases where re-using a variable name (for something with a different type) confuses Zygote. This is (IMO) a bad idea for humans too, and avoiding it avoids the problem: julia> function loss_adjoint2(p)
prediction = p.*rand(2,100)
prediction2 = [prediction[:, i] for i in axes(prediction, 2)]
sum(sum.(prediction2))
end
loss_adjoint2 (generic function with 1 method)
julia> Zygote.gradient(loss_adjoint2, ones(2))
([52.379754440153576, 53.055517322087056],) (Also, |
Testing in a fresh env with just Zygote, this works on 0.6.43. No other packages were changed between etsts. Looking into things now... |
That's more concerning. I got as far as trying 0.6.0... note that internally it's still working harder when the names are confusing, but does not accumulate a wrong answer:
|
Reduced: using Zygote
function loss_adjoint(p)
prediction = 2p
boxed_fn(i) = prediction^i
# Trigger https://github.com/JuliaLang/julia/issues/15276
prediction = boxed_fn(2)
return prediction
end
Zygote.gradient(loss_adjoint, 1.0) The error is coming from attempted accumulation of gradients for the inner closure (which contains a With some logging in ┌ Info: getfield fwd
│ x = (::var"#boxed_fn#1") (generic function with 1 method)
│ f = :prediction
└ val = Core.Box(2.0)
┌ Info: getfield fwd
│ x = Core.Box(2.0)
│ f = :contents
└ val = 2.0
┌ Info: getfield fwd
│ x = Core.Box(4.0)
│ f = :contents
└ val = 4.0
┌ Info: getfield back mut
│ x = Core.Box(4.0)
│ Δ = 1.0
└ dx = Base.RefValue{Any}((contents = 1.0,))
┌ Info: getfield back mut
│ x = Core.Box(4.0)
│ Δ = 4.0
└ dx = Base.RefValue{Any}((contents = 4.0,))
┌ Info: getfield back imm
│ x = (::var"#boxed_fn#1") (generic function with 1 method)
│ Δ = Base.RefValue{Any}((contents = 4.0,))
│ dx = (prediction = Base.RefValue{Any}((contents = 4.0,)),)
└ _project(x, dx) = (prediction = (contents = 4.0,),)
Zygote.jl/src/compiler/chainrules.jl Line 347 in 328eb4d
Why did this work before? Prior to #1248, Zygote was simply accumulating the julia> gradient(nt -> 2*nt.a.x, (; a=Ref(1.0)))
(nothing,) # 0.6.43
((a = (x = 2.0,),),) # 0.6.44 Thinking about fixes, I feel |
I don't know whether my problem is the same, but it persists on 0.6.44 and 0.6.43. function logvar(prob; ps_=prob.p, n=100)
sum(msolve(prob, ps=ps_) for i in 1:n)
end
function dlogvar(prob, n=100)
Zygote.gradient(ps_ ->logvar(prob; ps=ps_, n=n), prob.p)[1]
end Here MethodError: no method matching +(::Tuple{}, ::NamedTuple{(), Tuple{}})
Closest candidates are:
+(::Any, ::Any, ::Any, ::Any...) at operators.jl:591
+(::VectorizationBase.CartesianVIndex, ::Any) at ~/.julia/packages/VectorizationBase/oCgEJ/src/cartesianvindex.jl:67
+(::ChainRulesCore.Tangent{P}, ::P) where P at ~/.julia/packages/ChainRulesCore/ctmSK/src/tangent_arithmetic.jl:146
...
Stacktrace:
[1] accum(x::Tuple{}, y::NamedTuple{(), Tuple{}})
@ Zygote ~/.julia/packages/Zygote/DkIUK/src/lib/lib.jl:17
[2] macro expansion
@ ~/.julia/packages/Zygote/DkIUK/src/lib/lib.jl:27 [inlined]
[3] accum(x::NamedTuple{(:data, :itr), Tuple{Tuple{}, Nothing}}, y::NamedTuple{(:data, :itr), Tuple{NamedTuple{(), Tuple{}}, Nothing}})
@ Zygote ~/.julia/packages/Zygote/DkIUK/src/lib/lib.jl:27
[4] macro expansion
@ ~/.julia/packages/Zygote/DkIUK/src/lib/lib.jl:27 [inlined]
[5] accum(x::NamedTuple{(:f, :g, :u0, :tspan, :p, :noise, :kwargs, :noise_rate_prototype, :seed), Tuple{Nothing, Nothing, Vector{Float64}, Nothing, Nothing, Nothing, NamedTuple{(:data, :itr), Tuple{Tuple{}, Nothing}}, Nothing, Nothing}}, y::NamedTuple{(:f, :g, :u0, :tspan, :p, :noise, :kwargs, :noise_rate_prototype, :seed), Tuple{Nothing, Nothing, Vector{Float64}, Nothing, Nothing, Nothing, NamedTuple{(:data, :itr), Tuple{NamedTuple{(), Tuple{}}, Nothing}}, Nothing, Nothing}})
@ Zygote ~/.julia/packages/Zygote/DkIUK/src/lib/lib.jl:27
[6] macro expansion
@ ~/.julia/packages/Zygote/DkIUK/src/lib/lib.jl:27 [inlined]
[7] accum(x::NamedTuple{(:ps, :prob), Tuple{Vector{Float32}, NamedTuple{(:f, :g, :u0, :tspan, :p, :noise, :kwargs, :noise_rate_prototype, :seed), Tuple{Nothing, Nothing, Vector{Float64}, Nothing, Nothing, Nothing, NamedTuple{(:data, :itr), Tuple{Tuple{}, Nothing}}, Nothing, Nothing}}}}, y::NamedTuple{(:ps, :prob), Tuple{Vector{Float32}, NamedTuple{(:f, :g, :u0, :tspan, :p, :noise, :kwargs, :noise_rate_prototype, :seed), Tuple{Nothing, Nothing, Vector{Float64}, Nothing, Nothing, Nothing, NamedTuple{(:data, :itr), Tuple{NamedTuple{(), Tuple{}}, Nothing}}, Nothing, Nothing}}}})
@ Zygote ~/.julia/packages/Zygote/DkIUK/src/lib/lib.jl:27
[8] macro expansion
@ ~/.julia/packages/Zygote/DkIUK/src/lib/lib.jl:27 [inlined]
[9] accum(x::NamedTuple{(:f, :rf), Tuple{NamedTuple{(:ps, :prob), Tuple{Vector{Float32}, NamedTuple{(:f, :g, :u0, :tspan, :p, :noise, :kwargs, :noise_rate_prototype, :seed), Tuple{Nothing, Nothing, Vector{Float64}, Nothing, Nothing, Nothing, NamedTuple{(:data, :itr), Tuple{Tuple{}, Nothing}}, Nothing, Nothing}}}}, Nothing}}, y::NamedTuple{(:f, :rf), Tuple{NamedTuple{(:ps, :prob), Tuple{Vector{Float32}, NamedTuple{(:f, :g, :u0, :tspan, :p, :noise, :kwargs, :noise_rate_prototype, :seed), Tuple{Nothing, Nothing, Vector{Float64}, Nothing, Nothing, Nothing, NamedTuple{(:data, :itr), Tuple{NamedTuple{(), Tuple{}}, Nothing}}, Nothing, Nothing}}}}, Nothing}})
@ Zygote ~/.julia/packages/Zygote/DkIUK/src/lib/lib.jl:27
[10] Pullback
@ ./reduce.jl:62 [inlined]
[11] (::typeof(∂(_foldl_impl)))(Δ::Float64)
@ Zygote ~/.julia/packages/Zygote/DkIUK/src/compiler/interface2.jl:0 It works for function logvar(prob; ps_=prob.p, n=100)
#sum(msolve(prob, ps=ps_) for i in 1:n)
x = 0.
for i in 1:n
x+=msolve(prob, ps=ps_)
end
x
end MethodError: no method matching +(::Tuple{}, ::NamedTuple{(), Tuple{}})
Closest candidates are:
+(::Any, ::Any, ::Any, ::Any...) at operators.jl:591
+(::VectorizationBase.CartesianVIndex, ::Any) at ~/.julia/packages/VectorizationBase/oCgEJ/src/cartesianvindex.jl:67
+(::ChainRulesCore.Tangent{P}, ::P) where P at ~/.julia/packages/ChainRulesCore/ctmSK/src/tangent_arithmetic.jl:146
...
Stacktrace:
[1] accum(x::Tuple{}, y::NamedTuple{(), Tuple{}})
@ Zygote ~/.julia/packages/Zygote/xGkZ5/src/lib/lib.jl:17
[2] macro expansion
@ ~/.julia/packages/Zygote/xGkZ5/src/lib/lib.jl:27 [inlined]
[3] accum(x::NamedTuple{(:data, :itr), Tuple{Tuple{}, Nothing}}, y::NamedTuple{(:data, :itr), Tuple{NamedTuple{(), Tuple{}}, Nothing}})
@ Zygote ~/.julia/packages/Zygote/xGkZ5/src/lib/lib.jl:27
[4] macro expansion
@ ~/.julia/packages/Zygote/xGkZ5/src/lib/lib.jl:27 [inlined]
[5] accum(x::NamedTuple{(:f, :g, :u0, :tspan, :p, :noise, :kwargs, :noise_rate_prototype, :seed), Tuple{Nothing, Nothing, Vector{Float64}, Nothing, Nothing, Nothing, NamedTuple{(:data, :itr), Tuple{Tuple{}, Nothing}}, Nothing, Nothing}}, y::NamedTuple{(:f, :g, :u0, :tspan, :p, :noise, :kwargs, :noise_rate_prototype, :seed), Tuple{Nothing, Nothing, Vector{Float64}, Nothing, Nothing, Nothing, NamedTuple{(:data, :itr), Tuple{NamedTuple{(), Tuple{}}, Nothing}}, Nothing, Nothing}})
@ Zygote ~/.julia/packages/Zygote/xGkZ5/src/lib/lib.jl:27
[6] Pullback
@ ~/code/OptImpSampling.jl/src/logvar.jl:87 [inlined] |
@aksk this looks unrelated to me. This line
says that something is producing both |
In that case I am sorry for the noise and thank you for your suggestions. I checked my code (not a true MWE, but only 80 lines) for them but couldn't find anything the like :( |
StochasticDiffEq, SciMLSensitivity and Lux are a touch over 80 lines :P. You may have more luck rewriting the sum over generator to |
Just so #1290 (comment) doesn't get buried, here is a summary of the issue and proposal for how to address it. For many rules, Zygote uses Why was this not showing up before? Simply speaking, Zygote was throwing away a lot of gradients of mutable types because they took the same code path as implicit params (shoved in the cache and summarily ignored). That doesn't seem to have lead to major issues, but it is incorrect if you wanted said gradients at any point. 0.6.44 fixed this by not hitting the implicit param path unless one specifically asks for it, but it turned up this unforeseen edge case in What can be done? I think the most straightforward solution is to have separate projection routines for what is used at the user <-> AD boundary (i.e. |
Is #1304 the same? |
Most likely, yes. Since that doesn't involve |
If the motivation is to unify the return types to the user then we can
unwrap refs as a last pass. The "implicit params code paths" would be
something to investigate, do you have specific instances of those in mind?
Adding projection primitives is one way, but it's not clear whether they
solve the issue. It's better to have conversion rules that Julia can use
instead.
…On Fri, Sep 9, 2022, 06:24 Brian Chen ***@***.***> wrote:
Most likely, yes.
—
Reply to this email directly, view it on GitHub
<#1290 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AJOZVVOEOZ6TPES5LNZHA5LV5KDF5ANCNFSM57DQE6OQ>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
MWE:
The text was updated successfully, but these errors were encountered: