real(z::AbstractZero) #581

mcabbott · 2022-09-01T17:33:21Z

This came up in https://github.com/JuliaDiff/Diffractor.jl/pull/88/files

Not sure I understand what the logic of this file is, as to which functions accept what kind of zero. Nor what's gained by having (at least) two.

FAQ entry: https://juliadiff.org/ChainRulesCore.jl/stable/FAQ.html#faq_abstract_zero This seems to imply that you cannot tell from the type of the primal whether it will always only have ZeroTangent or NoTangent. It depends on how it is being used. Since the function producing it may not know how it will be used next, does that imply that all pullbacks must accept both equally?

oxinabox · 2022-09-01T19:02:35Z

Some input types always have NoTangent.
This is by far the most common source of NoTangent.
Like Integer and String.

I guess next most common is if the output is such a type, like string

But some operations on some inputs or outputs that would normally have a tangent do not have a tangent

For example

f(x) = isinteger(x) ? x : error("must be an integer")

The point of having both is while they act very similar in practice they reresent totally different situations.
To use an analogy: one represents being on flat ground (vs a hill), the other represents being in space where ground is meaningless.

If you ever end up performing gradient dencent by adding a value of NoTangent to your primal, then somewhere your code (or your AD) is wrong.

mcabbott · 2022-09-01T19:38:14Z

Yet NoTangent() + 1.0 returns no error, so it won't help you find such a mistake. Nor does ZeroTangent() + NoTangent(), which can only occur if one primal has both.

test_rrule does flag many Union{NoTangent, ZeroTangent} type instabilities, but I've yet to track one down. Instead they just result in not testing for other instabilities.

In the wild, it's not true that the gradient of an Int is always one kind of zero:

julia> using Diffractor, ChainRulesCore

julia> gradient(i -> 1, 2)
(ZeroTangent(),)

julia> gradient(i -> (1,2,3)[i], 2)
(NoTangent(),)

julia> Diffractor.∂⃖{1}()(i -> (1,2,3)[i], 2)[2](1)  # including the derivative w.r.t the function
(ZeroTangent(), NoTangent())

julia> using Yota

julia> grad((i,j) -> j, 2, 3)  # likewise
(3, (ZeroTangent(), ZeroTangent(), 1))

julia> grad(i -> (1,2,3)[i], 2)
(2, (ZeroTangent(), NoTangent()))

and of course Zygote.wrap_chainrules_input(nothing) goes for ZeroTangent(). Maybe allowing integers to be differentiable was a mistake, without which this would be clear?

But notice that the derivative with respect to pure functions above is also ZeroTangent. Is this wrong? Would Tangent{#NN}(;) also be wrong?

I realise that (at least for real numbers) there are some you can check with finite differencing and some you will immediately get an error. But I wonder what's actually gained by representing this difference in code... besides missing method errors, and type instabilities?

mcabbott · 2022-09-01T20:07:38Z

One more thought. If NoTangent really encoded that this thing could never ever be perturbed, then it should win over other (mistaken) information, e.g. from a rule defined on some less specific type like Real. But this does not happen:

julia> NoTangent() + ZeroTangent()  # NoTangent wins, more information
NoTangent()

julia> gradient((x, i) -> x[i] * i, [1.0, 2.0], 2)  # NoTangent loses, yet we cannot perturb i
([0.0, 2.0], 2.0)

julia> NoTangent() + 2.0  # here's why
2.0

Perhaps there could be a design where this distinction is strongly enforced, but we don't have one. Instead, NoTangent seems to be a cosmetic thing only, for what types a rule produces. But the behaviour of a rule when it receives any zero, it seems, is meant to be identical.

codecov-commenter · 2022-09-01T20:21:42Z

Codecov Report

Merging #581 (97775f5) into main (aba2fcb) will not change coverage.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##             main     #581   +/-   ##
=======================================
  Coverage   93.18%   93.18%           
=======================================
  Files          15       15           
  Lines         895      895           
=======================================
  Hits          834      834           
  Misses         61       61

Impacted Files	Coverage Δ
src/tangent_arithmetic.jl	`96.47% <100.00%> (ø)`

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

oxinabox · 2022-12-08T21:38:39Z

Yet NoTangent() + 1.0 returns no error, so it won't help you find such a mistake. Nor does ZeroTangent() + NoTangent(), which can only occur if one primal has both.

It used to. but we removed it. Which I think was a mistake.
Still you can check for that error condition manually.

Maybe allowing integers to be differentiable was a mistake, without which this would be clear?

It was.

One more thought. If NoTangent really encoded that this thing could never ever be perturbed, then it should win over other (mistaken) information,

Agree'd.

Perhaps there could be a design where this distinction is strongly enforced, but we don't have one.

We don't have one because we failed.
But lets not make that failure worse?

mcabbott · 2022-12-08T23:25:20Z

Am not sure that "make that failure worse" is a good take here. Given two plausible, consistent, designs, of which you prefer A to B, something which is 10% away from A may actually be much worse than either A or B.

The "failed" / "10%" state has real downsides:

It makes readers jump through arbitrary hoops, and wonder if they are crazy, or the docs are crazy, or what's going on.
It introduces real type instabilities in some pullbacks, which give Union{NoTangenet, ZeroTangent}. I don't know that these hurt performance, but they do mean that regression testing is disabled for these.

Both of these could be solved by ripping out the distinction & having exactly one Zero. That may move to a design which isn't the one you'd prefer, given a clean slate. But it is a design that's available, without breaking anything, and it would let us delete quite a lot of code.

The one reason to leave it in the "failed" state (besides inertia) is that there's a plan to correct it in 2.0.

oxinabox

Oh the discussion in the PR got way off topic, we should merge this

mcabbott added 2 commits September 1, 2022 13:29

real(z::AbstractZero)

f78f241

1.15.4

7abec81

mcabbott mentioned this pull request Sep 1, 2022

Accumulate NamedTuple + Tangent JuliaDiff/Diffractor.jl#88

Merged

don't delete imag

97775f5

mcabbott mentioned this pull request Sep 1, 2022

Collapse zeros for Tuple & Ref tangents #565

Merged

oxinabox approved these changes Sep 29, 2023

View reviewed changes

mcabbott closed this Mar 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

real(z::AbstractZero) #581

real(z::AbstractZero) #581

mcabbott commented Sep 1, 2022 •

edited

Loading

oxinabox commented Sep 1, 2022

mcabbott commented Sep 1, 2022

mcabbott commented Sep 1, 2022

codecov-commenter commented Sep 1, 2022 •

edited

Loading

oxinabox commented Dec 8, 2022

mcabbott commented Dec 8, 2022

oxinabox left a comment

real(z::AbstractZero) #581

real(z::AbstractZero) #581

Conversation

mcabbott commented Sep 1, 2022 • edited Loading

oxinabox commented Sep 1, 2022

mcabbott commented Sep 1, 2022

mcabbott commented Sep 1, 2022

codecov-commenter commented Sep 1, 2022 • edited Loading

Codecov Report

oxinabox commented Dec 8, 2022

mcabbott commented Dec 8, 2022

oxinabox left a comment

Choose a reason for hiding this comment

mcabbott commented Sep 1, 2022 •

edited

Loading

codecov-commenter commented Sep 1, 2022 •

edited

Loading