Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

real(z::AbstractZero) #581

Closed
wants to merge 3 commits into from
Closed

real(z::AbstractZero) #581

wants to merge 3 commits into from

Conversation

mcabbott
Copy link
Member

@mcabbott mcabbott commented Sep 1, 2022

This came up in https://github.com/JuliaDiff/Diffractor.jl/pull/88/files

Not sure I understand what the logic of this file is, as to which functions accept what kind of zero. Nor what's gained by having (at least) two.

FAQ entry: https://juliadiff.org/ChainRulesCore.jl/stable/FAQ.html#faq_abstract_zero This seems to imply that you cannot tell from the type of the primal whether it will always only have ZeroTangent or NoTangent. It depends on how it is being used. Since the function producing it may not know how it will be used next, does that imply that all pullbacks must accept both equally?

@oxinabox
Copy link
Member

oxinabox commented Sep 1, 2022

Some input types always have NoTangent.
This is by far the most common source of NoTangent.
Like Integer and String.

I guess next most common is if the output is such a type, like string

But some operations on some inputs or outputs that would normally have a tangent do not have a tangent

For example

f(x) = isinteger(x) ? x : error("must be an integer")

The point of having both is while they act very similar in practice they reresent totally different situations.
To use an analogy: one represents being on flat ground (vs a hill), the other represents being in space where ground is meaningless.

If you ever end up performing gradient dencent by adding a value of NoTangent to your primal, then somewhere your code (or your AD) is wrong.

@mcabbott
Copy link
Member Author

mcabbott commented Sep 1, 2022

Yet NoTangent() + 1.0 returns no error, so it won't help you find such a mistake. Nor does ZeroTangent() + NoTangent(), which can only occur if one primal has both.

test_rrule does flag many Union{NoTangent, ZeroTangent} type instabilities, but I've yet to track one down. Instead they just result in not testing for other instabilities.

In the wild, it's not true that the gradient of an Int is always one kind of zero:

julia> using Diffractor, ChainRulesCore

julia> gradient(i -> 1, 2)
(ZeroTangent(),)

julia> gradient(i -> (1,2,3)[i], 2)
(NoTangent(),)

julia> Diffractor.∂⃖{1}()(i -> (1,2,3)[i], 2)[2](1)  # including the derivative w.r.t the function
(ZeroTangent(), NoTangent())

julia> using Yota

julia> grad((i,j) -> j, 2, 3)  # likewise
(3, (ZeroTangent(), ZeroTangent(), 1))

julia> grad(i -> (1,2,3)[i], 2)
(2, (ZeroTangent(), NoTangent()))

and of course Zygote.wrap_chainrules_input(nothing) goes for ZeroTangent(). Maybe allowing integers to be differentiable was a mistake, without which this would be clear?

But notice that the derivative with respect to pure functions above is also ZeroTangent. Is this wrong? Would Tangent{#NN}(;) also be wrong?

I realise that (at least for real numbers) there are some you can check with finite differencing and some you will immediately get an error. But I wonder what's actually gained by representing this difference in code... besides missing method errors, and type instabilities?

@mcabbott
Copy link
Member Author

mcabbott commented Sep 1, 2022

One more thought. If NoTangent really encoded that this thing could never ever be perturbed, then it should win over other (mistaken) information, e.g. from a rule defined on some less specific type like Real. But this does not happen:

julia> NoTangent() + ZeroTangent()  # NoTangent wins, more information
NoTangent()

julia> gradient((x, i) -> x[i] * i, [1.0, 2.0], 2)  # NoTangent loses, yet we cannot perturb i
([0.0, 2.0], 2.0)

julia> NoTangent() + 2.0  # here's why
2.0

Perhaps there could be a design where this distinction is strongly enforced, but we don't have one. Instead, NoTangent seems to be a cosmetic thing only, for what types a rule produces. But the behaviour of a rule when it receives any zero, it seems, is meant to be identical.

@codecov-commenter
Copy link

codecov-commenter commented Sep 1, 2022

Codecov Report

Merging #581 (97775f5) into main (aba2fcb) will not change coverage.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##             main     #581   +/-   ##
=======================================
  Coverage   93.18%   93.18%           
=======================================
  Files          15       15           
  Lines         895      895           
=======================================
  Hits          834      834           
  Misses         61       61           
Impacted Files Coverage Δ
src/tangent_arithmetic.jl 96.47% <100.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@oxinabox
Copy link
Member

oxinabox commented Dec 8, 2022

Yet NoTangent() + 1.0 returns no error, so it won't help you find such a mistake. Nor does ZeroTangent() + NoTangent(), which can only occur if one primal has both.

It used to. but we removed it. Which I think was a mistake.
Still you can check for that error condition manually.

Maybe allowing integers to be differentiable was a mistake, without which this would be clear?

It was.

One more thought. If NoTangent really encoded that this thing could never ever be perturbed, then it should win over other (mistaken) information,

Agree'd.

Perhaps there could be a design where this distinction is strongly enforced, but we don't have one.

We don't have one because we failed.
But lets not make that failure worse?

@mcabbott
Copy link
Member Author

mcabbott commented Dec 8, 2022

Am not sure that "make that failure worse" is a good take here. Given two plausible, consistent, designs, of which you prefer A to B, something which is 10% away from A may actually be much worse than either A or B.

The "failed" / "10%" state has real downsides:

  • It makes readers jump through arbitrary hoops, and wonder if they are crazy, or the docs are crazy, or what's going on.
  • It introduces real type instabilities in some pullbacks, which give Union{NoTangenet, ZeroTangent}. I don't know that these hurt performance, but they do mean that regression testing is disabled for these.

Both of these could be solved by ripping out the distinction & having exactly one Zero. That may move to a design which isn't the one you'd prefer, given a clean slate. But it is a design that's available, without breaking anything, and it would let us delete quite a lot of code.

The one reason to leave it in the "failed" state (besides inertia) is that there's a plan to correct it in 2.0.

Copy link
Member

@oxinabox oxinabox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh the discussion in the PR got way off topic, we should merge this

@mcabbott mcabbott closed this Mar 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants