Improved type stability with explicit params #1248

ToucheSir · 2022-06-23T02:38:16Z

We can disable accumulating (implicit) parameters to the gradient cache in explicit mode. This can dramatically improve type stability because accum_param will return a Union{Nothing, [grad type]} otherwise.

One impact of this PR is that taking gradients of functions with both implicit and explicit parameters (i.e. calling pullback twice) may involve some additional compilation. However, given that we're trying to move users off of using implicit params anyhow, I see it as a small price to pay for being friendlier to the compiler.

Benchmarking TTFG on the MWE in #1126, modified to use explicit params:

julia> @time loss_grad(model, lr_images)
 85.027100 seconds (61.36 M allocations: 3.216 GiB, 1.17% gc time, 99.98% compilation time) # 0.6.40
 59.024238 seconds (60.72 M allocations: 3.174 GiB, 1.69% gc time, 99.98% compilation time) # this PR

Closes #1243.

ToucheSir · 2022-06-24T04:23:33Z

Ok, now we're back to the known failures on Nightly and downstream NeuralPDE. Molly one appears to be intermittent (minor numerical error) and showed up recently in PRs from a few days ago, but that should be investigated separately.

One unexpected find while working on this PR is that Zygote and downstream packages were calling pullback with the wrong argument order. This worked by sheer coincidence, but I've added a warning so that it can be rectified before we disallow it in the next breaking release.

src/compiler/interface.jl

mcabbott · 2022-06-26T01:03:34Z

This doesn't sound crazy, but FWIW I do not see the same speedup:

julia>@time gradient(loss, lr_images);
 51.949997 seconds (59.92 M allocations: 3.781 GiB, 4.14% gc time, 99.92% compilation time)  # tagged
 54.520161 seconds (60.06 M allocations: 3.801 GiB, 2.68% gc time, 99.94% compilation time)  # this PR

(Julia master, M1 mac.)

ToucheSir · 2022-06-26T01:30:59Z

That may be because loss is closing over a global model. My version passes both as args.

mcabbott · 2022-06-26T02:39:57Z

Avoiding globals makes a surprisingly large difference here:

julia> @time gradient((m, x) -> sum(m(x)), model, lr_images);
 35.009302 seconds (58.93 M allocations: 3.719 GiB, 8.44% gc time, 99.92% compilation time)  # tagged
 34.581913 seconds (59.08 M allocations: 3.738 GiB, 2.97% gc time, 99.93% compilation time)  # this PR

ToucheSir · 2022-06-26T03:49:27Z

Perhaps a newer Julia version helps close the gap, but I'm consistently seeing this ~15s difference on 1.7.3.

This is the full script I'm using:

using Flux

channels = 4

function resblock(channels)
    return SkipConnection(Chain(
        Conv((3, 3), channels => channels, pad=1),
        Conv((3, 3), channels => channels, pad=1),
    ), +)
end

model = Chain(
    SkipConnection(
        Chain(
            resblock(channels),
            resblock(channels),
            resblock(channels),
            resblock(channels),
            resblock(channels),

            resblock(channels),
            resblock(channels),
            resblock(channels),
            resblock(channels),
            resblock(channels),

            resblock(channels),
            resblock(channels),
            resblock(channels),
            resblock(channels),
            resblock(channels),
        ),
    +),
    AdaptiveMeanPool((1, 1))
)

@show typeof(model)

loss(m, x) = sum(m(x))

lr_images = randn(Float32, 2, 2, channels, 1)
@time loss(model, lr_images)
@time loss(model, lr_images)

loss_grad(m, x) = gradient(m -> loss(m, x), m)
# This gives the same numbers:
# loss_grad(m, x) = gradient((m, x) -> loss(m, x), m, x)

@time loss_grad(model, lr_images)
@time loss_grad(model, lr_images)

We can disable accumulating (implicit) parameters to the gradient cache in explicit mode. This can dramatically improve type stability because `accum_param` will return a `Union{Nothing, [grad type]}` otherwise.

src/compiler/interface.jl

Co-authored-by: Michael Abbott <[email protected]>

mcabbott

Let's do it.

Should be marked as closing #1243?

ToucheSir force-pushed the bc/no-cache-context branch 3 times, most recently from 6ccf008 to 7540bd6 Compare June 24, 2022 02:53

CarloLucibello reviewed Jun 24, 2022

View reviewed changes

src/compiler/interface.jl Show resolved Hide resolved

ToucheSir mentioned this pull request Jun 26, 2022

Improve type stability of ProjectTo(non-numeric array). JuliaDiff/ChainRulesCore.jl#557

Open

mcabbott added the performance label Jul 4, 2022

ToucheSir added 2 commits July 30, 2022 16:58

Improved type stability with explicit params

be5b47f

We can disable accumulating (implicit) parameters to the gradient cache in explicit mode. This can dramatically improve type stability because `accum_param` will return a `Union{Nothing, [grad type]}` otherwise.

basic comment for Context{I}

e9a6075

ToucheSir force-pushed the bc/no-cache-context branch from 8db910e to e9a6075 Compare July 30, 2022 23:58

ToucheSir mentioned this pull request Jul 31, 2022

Zygote hangs when taking explicit gradients of NaiveGAFlux model #1243

Closed

DhairyaLGandhi reviewed Jul 31, 2022

View reviewed changes

src/compiler/interface.jl Show resolved Hide resolved

Add accum_param tests

3433cdd

Co-authored-by: Michael Abbott <[email protected]>

ToucheSir requested a review from mcabbott July 31, 2022 17:22

mcabbott approved these changes Aug 1, 2022

View reviewed changes

ToucheSir merged commit 5c80f55 into master Aug 1, 2022

ToucheSir deleted the bc/no-cache-context branch August 1, 2022 14:29

ToucheSir mentioned this pull request Aug 9, 2022

Dictionary indexing failure inside closure and structs #717

Closed

ToucheSir mentioned this pull request Aug 20, 2022

Regression in comprehension support #1290

Open

mcabbott mentioned this pull request Nov 21, 2022

Narrower version of @non_differentiable params FluxML/Flux.jl#2118

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improved type stability with explicit params #1248

Improved type stability with explicit params #1248

ToucheSir commented Jun 23, 2022 •

edited

Loading

ToucheSir commented Jun 24, 2022 •

edited

Loading

mcabbott commented Jun 26, 2022

ToucheSir commented Jun 26, 2022

mcabbott commented Jun 26, 2022

ToucheSir commented Jun 26, 2022

mcabbott left a comment •

edited

Loading

Improved type stability with explicit params #1248

Improved type stability with explicit params #1248

Conversation

ToucheSir commented Jun 23, 2022 • edited Loading

ToucheSir commented Jun 24, 2022 • edited Loading

mcabbott commented Jun 26, 2022

ToucheSir commented Jun 26, 2022

mcabbott commented Jun 26, 2022

ToucheSir commented Jun 26, 2022

mcabbott left a comment • edited Loading

Choose a reason for hiding this comment

ToucheSir commented Jun 23, 2022 •

edited

Loading

ToucheSir commented Jun 24, 2022 •

edited

Loading

mcabbott left a comment •

edited

Loading