-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
real(z::AbstractZero) #581
Conversation
Some input types always have I guess next most common is if the output is such a type, like But some operations on some inputs or outputs that would normally have a tangent do not have a tangent For example
The point of having both is while they act very similar in practice they reresent totally different situations. If you ever end up performing gradient dencent by adding a value of |
Yet
In the wild, it's not true that the gradient of an julia> using Diffractor, ChainRulesCore
julia> gradient(i -> 1, 2)
(ZeroTangent(),)
julia> gradient(i -> (1,2,3)[i], 2)
(NoTangent(),)
julia> Diffractor.∂⃖{1}()(i -> (1,2,3)[i], 2)[2](1) # including the derivative w.r.t the function
(ZeroTangent(), NoTangent())
julia> using Yota
julia> grad((i,j) -> j, 2, 3) # likewise
(3, (ZeroTangent(), ZeroTangent(), 1))
julia> grad(i -> (1,2,3)[i], 2)
(2, (ZeroTangent(), NoTangent())) and of course But notice that the derivative with respect to pure functions above is also I realise that (at least for real numbers) there are some you can check with finite differencing and some you will immediately get an error. But I wonder what's actually gained by representing this difference in code... besides missing method errors, and type instabilities? |
One more thought. If julia> NoTangent() + ZeroTangent() # NoTangent wins, more information
NoTangent()
julia> gradient((x, i) -> x[i] * i, [1.0, 2.0], 2) # NoTangent loses, yet we cannot perturb i
([0.0, 2.0], 2.0)
julia> NoTangent() + 2.0 # here's why
2.0 Perhaps there could be a design where this distinction is strongly enforced, but we don't have one. Instead, NoTangent seems to be a cosmetic thing only, for what types a rule produces. But the behaviour of a rule when it receives any zero, it seems, is meant to be identical. |
Codecov Report
@@ Coverage Diff @@
## main #581 +/- ##
=======================================
Coverage 93.18% 93.18%
=======================================
Files 15 15
Lines 895 895
=======================================
Hits 834 834
Misses 61 61
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
It used to. but we removed it. Which I think was a mistake.
It was.
Agree'd.
We don't have one because we failed. |
Am not sure that "make that failure worse" is a good take here. Given two plausible, consistent, designs, of which you prefer A to B, something which is 10% away from A may actually be much worse than either A or B. The "failed" / "10%" state has real downsides:
Both of these could be solved by ripping out the distinction & having exactly one The one reason to leave it in the "failed" state (besides inertia) is that there's a plan to correct it in 2.0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh the discussion in the PR got way off topic, we should merge this
This came up in https://github.com/JuliaDiff/Diffractor.jl/pull/88/files
Not sure I understand what the logic of this file is, as to which functions accept what kind of zero. Nor what's gained by having (at least) two.
FAQ entry: https://juliadiff.org/ChainRulesCore.jl/stable/FAQ.html#faq_abstract_zero This seems to imply that you cannot tell from the type of the primal whether it will always only have ZeroTangent or NoTangent. It depends on how it is being used. Since the function producing it may not know how it will be used next, does that imply that all pullbacks must accept both equally?