Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert "Remove number / vector (#44358)" #49915

Merged
merged 3 commits into from
Jul 28, 2023

Conversation

oxinabox
Copy link
Contributor

This reverts commit 24d5268.
undoing #44358

I propose backporting this PR to bring the operation back in 1.9.1, and holding off #44358 til julia 2.0 where matching changes can be made as a whole set. cc: @antoine-levitt

#44358 was originally made with the assumption that noone was using this feature except in error.
Which was checked using PkgEval.
This was incorrect, and the check for some reason or another missed or skipped over ChainRules.jl.
This broke ChainRules.jl for anyone using reverse mode AD to go through / or \.

The use in ChainRules.jl is intentional and is required to achieve generality between different types.
Quoting @sethaxen on slack

This seems to be a direct consequence of how c = transpose(a)*b for vectors a and b produces a scalar, while c = permutedims(a)*b produces a vector. I'd argue the first case is convenient but wrong, but if we accept it, then since c/b is just the linear solve for a, then /(::Number, AbstractVector{<:Number}) should be supported.

The trivial fix to make ChainRules work again in 1.9.0 (JuliaDiff/ChainRules.jl#718) is uglier, less performant and less numerically good.

I think we do not need to discuss if the feature in #44358 is desirable or not in this PR.
Merely if it was too breaking or not.

@antoine-levitt
Copy link
Contributor

I don't get what particular method of the chain rule for / and \ was broken by the PR? Is it just vector \ number or something more?

@oxinabox
Copy link
Contributor Author

oxinabox commented May 22, 2023

I could have it wrong but,
It's code which depending on operation being AD'd is either: Vector / Number or AbstractArray / Adjoint{<:Vector}

The linked PR has tests which you can try to see tgd primal (forward) code that it broke.

Here is an example of something that broke. vector/vector which returns a scalar (but matrix/vector would return a vector and so the code needs to be generic to both)

Julia Master (and 1.9)

julia> y, pb = rrule(/, [1.0, 2.0], [1.0, 2.0]);

julia> y
2×2 adjoint(::Matrix{Float64}) with eltype Float64:
 0.2  0.4
 0.4  0.8

julia> unthunk.(pb(y))
ERROR: MethodError: no method matching /(::Float64, ::Vector{Float64})

Julia 1.6 (and 1.8)

julia> using ChainRules, ChainRulesCore

julia> y, pb = rrule(/, [1.0, 2.0], [1.0, 2.0]);

julia> y
2×2 adjoint(::Matrix{Float64}) with eltype Float64:
 0.2  0.4
 0.4  0.8

julia> unthunk.(pb(y))
(NoTangent(), [0.20000000000000004, 0.4000000000000001], [-0.20000000000000007, -0.40000000000000013])

@antoine-levitt
Copy link
Contributor

OK, so indeed the PR broke the chain rule for vector / vector and vector \ vector, which are both legitimate 1.9 operations (though I'd love to get rid of them in the 2.0 timeframe). Note that both these operations are highly unusual, so it's relatively unlikely to have actually broken people's code. I've only seen vector \ vector used once (in the pkgeval run for the PR, in a stats package that used it for linear regression without offset, which is very cute), and I'm pretty sure nobody ever used vector / vector, except as a typo for vector ./ vector.

This is because the rule is

            Y = A \ B
            B̄ = A' \ Ȳ
            Ā = -B̄ * Y'

When A and B are vectors, Y is a scalar, so the second line hits vector' \ number, which hits number / vector. This is indeed because vector'vector is a scalar. An easy fix is to change the chainrule to cast Y to a vector if it is a scalar (or to compute the second line with an explicit pseudoinverse in the vector case, which it's going to do anyway, but not in the matrix case).

It's a bit concerning that pkgeval didn't catch this, maybe it got into chainrules after the pkgeval run? I agree that removing the method was breaking, and I was a bit on the fence about merging it in a pre-2.0 release. Fixing the massive footgun outweighted the esoteric breakage, though. Is it acceptable to just do the quick fix above to chainrules?

@oxinabox
Copy link
Contributor Author

oxinabox commented May 22, 2023

It's a bit concerning that pkgeval didn't catch this, maybe it got into chainrules after the pkgeval run?

Its been in ChainRules for over 3 years.

@oxinabox oxinabox closed this May 22, 2023
@oxinabox oxinabox reopened this May 22, 2023
@antoine-levitt
Copy link
Contributor

The pkgeval run did catch this, but then returned a timeout and so it was not reported: https://s3.amazonaws.com/julialang-reports/nanosoldier/pkgeval/by_hash/62300a8_vs_a5dfbb4/ChainRules.primary.log. I'll open a separate issue for maybe making pkgeval report failures even if there's a subsequent timeout

@oscardssmith oscardssmith added the triage This should be discussed on a triage call label May 22, 2023
@oscardssmith
Copy link
Member

oscardssmith commented May 22, 2023

adding to triage because this will be controversial, but I do think we do actually have to merge this (even though I hate it).

@antoine-levitt
Copy link
Contributor

antoine-levitt commented May 22, 2023

I won't be on triage, but my opinion is that this is a relatively minor breakage: it only affects the chain rule for vector / vector and vector \ vector. Basically nobody uses these methods, so I doubt the chainrule issue is actually bothering people (apart from fixing the CR CI, of course). An easy fix is Ȳ isa AbstractArray || Ȳ = [Ȳ] in the chain rule (untested...), which is not super pretty but is liveable I think, and will hopefully go away in 2.0.

Reverting it now is a bit weird (it'd make for weird 1.10 release notes). We knew it was breaking when merging it, so this issue doesn't change the calculus that was made at the time. But I'm not super clear on the policy wrt breaking stuff, so...

@KristofferC
Copy link
Member

There is still no evidence that anyone actually uses this so I agree with @antoine-levitt that nothing has changed since the last discussion was had on this.

@oxinabox
Copy link
Contributor Author

oxinabox commented May 23, 2023

There is still no evidence that anyone actually uses this

The deleted code is literally used in the ChainRules.jl code. That is a package that exists.
You can't say noone uses it when I just pointed to a package that uses it.
Are you saying noone uses that function that uses the deleted code?

Cos I am pretty sure they do, since it is the rrule for AbstractArray/AbstractArray.
Though they might not use the particular case of it that errors which is I think just vector/vector
but the fact that someone went and wrote in the first place and wrote tests for it suggests someone at one point did care about that use case...
(Good chance it was somewhere in Invenia's internal code base)

It's one thing to say "Do minor breaking thing that noone uses" based on PkgEval
Its another thing to say "Its ok to break something, uses the thing it breaks" based not on PkgEval but just suspecting.
For a start its second order, by that logic we could keep applying it recursively and say its ok to break whatever.
For a second its weaker proof.

The calculus has changed from when this was decided due to the fact that this break was missed when running PkgEval due to it timing out and not reporting the error.

We do vaguely like to say that we take SemVer seriously.
By just deleting methods willy-nilly we worsen julias reputation as being unstable and immature.


Reverting it now is a bit weird (it'd make for weird 1.10 release notes).

This is a perfectly valid thing to do by semver.
When you mistakenly break semver, undoing it is a patch release.
The release notes would say: "Restore mistakenly deleted ..."

@antoine-levitt
Copy link
Contributor

Though they might not use the particular case of it that errors which is I think just vector/vector

Also vector \ vector probably. But yes, that's the point: I don't expect anybody ever hit that code path (but I don't know)

but the fact that someone went and wrote in the first place and wrote tests for it suggests someone at one point did care about that use case...

I'd imagine this is a case of overzealous testing and there's no actual use case for this, but again I don't know.

I think the point is that as far as breakage go, this one is pretty minor. It's breaking a massive footgun (number / vector) that is, I'm pretty sure, never used except in error. In doing so, it unintentionally also broke the chain rule for another feature (vector \ vector) which is less of a footgun (because nobody reaches for \ unless they mean linalg) and does have its (rare) uses. It broke the chain rule in a way that is easy to fix, with a one-line workaround that will go away in 2.0. I'm not defending the breakage itself: if breakage is forbidden under any circumstance, then this PR should never have been merged and let's revert. But if there's some leeway, then the calculus hasn't fundamentally changed since the PR was merged, and so it shouldn't be reverted.

@KristofferC
Copy link
Member

KristofferC commented May 23, 2023

The deleted code is literally used in the ChainRules.jl code. That is a package that exists.
You can't say noone uses it when I just pointed to a package that uses it.

If a package does:

for method in get_all_julia_methods()
    wrapper_expr = provide_wrapper_for_method(method)
    @eval wrapper_expr
end

I wouldn't say that this is a "usage" of every julia method. ChainRules tries to provide a chain rule for every Julia function but that doesn't on its own constitute a usage (even though its tests will break if any of those methods are removed). What is interesting is the actual code that ends up calling this chain rule.

@oxinabox
Copy link
Contributor Author

but it isn't the method for number/vector that breaks.
It is the method for vector/vector which breaks.
And that method was written by someone who clearly knew the old behavour of number/vector since it replies upon it.

@PallHaraldsson
Copy link
Contributor

I support reverting this (any) feature if/since used in a package.

the assumption that noone was using this feature except in error.

Can't we then deprecate it? Even if it has a few/one users.

The use in ChainRules.jl is intentional and is required to achieve generality between different types.

I don't know too much about this/vector math, but there is some vector math alternative code, that should be used? Is using deprecated bad for speed?

I agree that removing the method was breaking, and I was a bit on the fence about merging it in a pre-2.0 release.

I.e. this was thought "technically breaking", and all such gets documented in NEWS, so I support at least the documenting. This was basically API, not syntax, but in either case, we likely need to be strict to not ever break. Except in 2.0, and I see this as just one more argument we should get to that sooner rather than later.

@oxinabox
Copy link
Contributor Author

oxinabox commented May 24, 2023

Yes, deprecating it would be fine.

I.e. this was thought "technically breaking", and all such gets documented in NEWS, so I support at least the documenting.

It was in the NEWS.md, I just didn't spot it.
You can find it in the HISTORY.md now
(and I had been ignoring nightly failures, because I am not great. And got busy moving countries.)

@KristofferC
Copy link
Member

There seems to be no use of this in the ecosystem (defining use as a call to this method that is not just testing the method itself). And the method that is getting added back is the extremely confusing

julia> 1 / [1.0, 2.0, 3.0]
1×3 transpose(::Vector{Float64}) with eltype Float64:
 0.0714286  0.142857  0.214286

So why add it back again? No one has missed this method in over a year!

@PallHaraldsson
Copy link
Contributor

This was only removed in 1.9 (according to NEWS), not all live on master, so only been out of Julia a few days officially? Maybe we can get away not having it in, see if we get (more) feedback before 1.9.1?

@oscardssmith
Copy link
Member

Triage agrees that this was breaking. However this method is somewhat sketchy and easy to misuse (as determined by the original pkgeval). As such, we believe that this functionality should be re-added for 1.9.1, but it should be deprecated (in 1.9.1) and slated for re-removal in 2.0.

@oscardssmith oscardssmith removed the triage This should be discussed on a triage call label May 25, 2023
@KristofferC
Copy link
Member

KristofferC commented May 25, 2023

The breakage of this was never in question, it was deemed seldomly enough used and Co fusing enough that the breakage was worth it. And if the method gets deprecated ChainRules will have to update to not use it anyway but the confusing behavior is still present. Worst of both worlds?

I mean the only reason this is discussed is because @oxinabox does not like the workaround for ChainRules due to the removal of this but a deprecation causes the exact same workload. So seems pointless to do reintroduce the confusing behavior (which again has not been mentioned by anyone for a year).

@oscardssmith
Copy link
Member

oscardssmith commented May 25, 2023

I think the key point is that

it was deemed seldomly enough used

only happened do to a pkgeval bug. had we seen that chainrules had an explicit test that this worked, we wouldn't have merged the pr. it is a good target for removal since it goes against the spirit of #4774, and likely to be a typo, but removal is actually breaking a real package that was using an exported function correctly .

deprecating will mean that chainrules keeps working, and sets the stage for us to properly remove this in the future.

@oxinabox
Copy link
Contributor Author

oxinabox commented Jun 2, 2023

Can we schedule this in to be talked about at Triage on the 22rd?
I can't ever make triage calls cos it's 2am for me. But @sethaxen can make the one on the 22rd and can articulate this argument for why we should do this.

@oscardssmith
Copy link
Member

oscardssmith commented Jun 2, 2023

I think triage already agrees that this was breaking and should be fixed

@ViralBShah
Copy link
Member

Does this mean we should merge and potentially backport to 1.9.2?

@oscardssmith
Copy link
Member

@ViralBShah yes.

@oxinabox
Copy link
Contributor Author

oxinabox commented Jul 28, 2023

Can we add a Forget-me-not label?
Or merge this?

@LilithHafner LilithHafner added the backport 1.9 Change should be backported to release-1.9 label Jul 28, 2023
@LilithHafner LilithHafner merged commit 503d5b4 into JuliaLang:master Jul 28, 2023
2 checks passed
@LilithHafner LilithHafner added the revert This reverts a previously merged PR. label Jul 28, 2023
@LilithHafner
Copy link
Member

Triage thought we should do this, in part because the original PR's pkgeval looks pretty bad (IIRC triage looked at a sample of the reported breakage and found several to be real). We can always re-triage and re-land the original PR.

@antoine-levitt
Copy link
Contributor

Triage thought we should do this, in part because the original PR's pkgeval looks pretty bad (IIRC triage looked at a sample of the reported breakage and found several to be real).

Really? I did look at those pretty closely, in both that pkgeval and that of #40758, and I didn't see any that were caused by this PR, just segfaults. The only "real" breakage I've seen is ChainRules, discussed extensively here.

I can either try to reland this directly, or make a PR to deprecate it, just tell me which.

@LilithHafner
Copy link
Member

LilithHafner commented Aug 1, 2023

I just now manually reviewed 9/174 of the packages that failed tests only after #44358. Of these, I found one real breakage

MethodError: no method matching /(::Float64, ::Vector{Float64}) from ApproximateGPs

This usage is via ChainRules.jl, and its existence serves as a counterexample to the (false) prevailing notion in this thread that no non-ChainRules.jl package depends on ChainRules.jl's usage of this method.

Triage recommended deprecation, so a PR to deprecate is likely to succeed, but @oxinabox might object?

@antoine-levitt
Copy link
Contributor

antoine-levitt commented Aug 1, 2023

Nice! I wonder why my screening did not flag it, I should have been more thorough.

This usage is via ChainRules.jl, and its existence serves as a counterexample to the (false) prevailing notion in this thread that no non-ChainRules.jl package depends on ChainRules.jl's usage of this method.

This is likely fixed by JuliaDiff/ChainRules.jl#718. My argument was not that nobody uses vector / vector and vector \ vector (which was indeed broken in chainrules as an unintended consequence, and is now fixed by JuliaDiff/ChainRules.jl#718), but that nobody uses (correctly) real / vector and vector \ real.

If deprecation is the way to go, can we go all the way and deprecate vector / vector and vector \ vector while we're at it?

KristofferC pushed a commit that referenced this pull request Aug 10, 2023
@KristofferC KristofferC mentioned this pull request Aug 10, 2023
35 tasks
KristofferC added a commit that referenced this pull request Aug 18, 2023
Backported PRs:
- [x] #47782 <!-- Generalize Bool parse method to AbstractString -->
- [x] #48634 <!-- Remove unused "deps" mechanism in internal sorting
keywords [NFC] -->
- [x] #49931 <!-- Lock finalizers' lists at exit -->
- [x] #50064 <!-- Fix numbered prompt with input only with comment -->
- [x] #50474 <!-- docs: Fix a `!!! note` which was miscapitalized -->
- [x] #50516 <!-- Fix visibility of assert on GCC12/13 -->
- [x] #50635 <!-- `versioninfo()`: include build info and unofficial
warning -->
- [x] #49915 <!-- Revert "Remove number / vector (#44358)" -->
- [x] #50781 <!-- fix `bit_map!` with aliasing -->
- [x] #50845 <!-- fix #50438, use default pool for at-threads -->
- [x] #49031 <!-- Update inference.md -->
- [x] #50289 <!-- Initialize prev_nold and nold in gc_reset_page -->
- [x] #50559 <!-- Expand kwcall lowering positional default check to
vararg -->
- [x] #49582 <!-- Update HISTORY.md for `DelimitedFiles` -->
- [x] #50341 <!-- invokelatest docs should say not exported before 1.9
-->
- [x] #50525 <!-- only check that values are finite in `generic_lufact`
when `check=true` -->
- [x] #50444 <!-- Optimize getfield lowering to avoid boxing in some
cases -->
- [x] #50523 <!-- Avoid generic call in most cases for getproperty -->
- [x] #50860 <!-- Add `Base.get_extension` to docs/API -->
- [x] #50164 <!-- codegen: handle dead code with unsafe_store of FCA
pointers -->
- [x] #50568 <!-- `Array(::AbstractRange)` should return an `Array` -->
- [x] #50871 <!-- macOS: Don't inspect dead threadtls during exception
handling. -->

Need manual backport:
- [ ] #48542 <!-- Add docs on task-specific buffering using
multithreading -->
- [ ] #50591 <!-- build: fix various makefile bugs -->


Non-merged PRs with backport label:
- [ ] #50842 <!-- Avoid race conditions with recursive rm -->
- [ ] #50823 <!-- Make ranges more robust with unsigned indexes. -->
- [ ] #50663 <!-- Fix Expr(:loopinfo) codegen -->
- [ ] #49716 <!-- Update varinfo() docstring signature -->
- [ ] #49713 <!-- prevent REPL from erroring in numbered mode in some
situations -->
- [ ] #49573 <!-- Implement jl_cpu_pause on PPC64 -->
- [ ] #48726 <!-- fix macro expansion of property destructuring -->
- [ ] #48642 <!-- Use gc alloc instead of alloc typed in lowering -->
- [ ] #48183 <!-- Don't use pkgimage for package if any includes fall in
tracked path for coverage or alloc tracking -->
- [ ] #48050 <!-- improve `--heap-size-hint` arg handling -->
- [ ] #47615 <!-- Allow threadsafe access to buffer of type inference
profiling trees -->
@KristofferC KristofferC removed the backport 1.9 Change should be backported to release-1.9 label Aug 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
revert This reverts a previously merged PR.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants