-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Perf: Specialize fn
in rmap
and rmaptype
#80
Conversation
Also avoid use of `map` on `NamedTuple`s as it doesn't specialize `f`.
Since this is low level and called quite often , can we start to add some microbenchmarks to show that this is an actual improvement / doesn't regress in the future with different compiler heuristics? |
This would be nice, but not sure how to replicate the problem with a microbenchmark. The profile below demonstrates the problem: See the However, the compiler is handling most of the code snippets I've tried properly; I haven't found a simple way to make it give up as above yet. E.g.: julia> @code_warntype map(x -> x^3, (x1 = 1, x2 = 2))
Variables
#self#::Core.Const(map)
f::Core.Const(var"#35#36"())
nt::NamedTuple{(:x1, :x2), Tuple{Int64, Int64}}
nts::Tuple{}
Body::NamedTuple{(:x1, :x2), Tuple{Int64, Int64}}
1 ─ %1 = Core.tuple(nt)::Tuple{NamedTuple{(:x1, :x2), Tuple{Int64, Int64}}}
│ %2 = Core._apply_iterate(Base.iterate, Base.same_names, %1, nts)::Core.Const(true)
│ %3 = !%2::Core.Const(false)
└── goto #3 if not %3
2 ─ Core.Const(:(Base.ArgumentError("Named tuple names do not match.")))
└── Core.Const(:(Base.throw(%5)))
3 ┄ %7 = Core.apply_type(Base.NamedTuple, $(Expr(:static_parameter, 1)))::Core.Const(NamedTuple{(:x1, :x2), T} where T<:Tuple)
│ %8 = Core.tuple(f)::Core.Const((var"#35#36"(),))
│ %9 = Core.tuple(nt)::Tuple{NamedTuple{(:x1, :x2), Tuple{Int64, Int64}}}
│ %10 = Core._apply_iterate(Base.iterate, Core.tuple, %9, nts)::Tuple{NamedTuple{(:x1, :x2), Tuple{Int64, Int64}}}
│ %11 = Base.map(Base.Tuple, %10)::Tuple{Tuple{Int64, Int64}}
│ %12 = Core._apply_iterate(Base.iterate, Base.map, %8, %11)::Tuple{Int64, Int64}
│ %13 = (%7)(%12)::NamedTuple{(:x1, :x2), Tuple{Int64, Int64}}
└── return %13 I'm open to ideas. |
I see, the issue I have is that there are really only 2 changes that are significant (lines 22 and 24 in the diff), the others are just syntactic and don't do anything. It might be better to just force inline this chain of method calls to get somewhat predictable performance / behavior from the compiler, that would be a better solution IMO. |
My understanding is that specializing the function (with the
Compiler inlining decisions use a cost function (see JuliaLang/julia#27857) and can only be forced if the calling signature is concrete (see JuliaLang/julia#29258). I'm not sure what that means for something like Edit: I'm not seeing the issue with this fix. What do you not like about it? |
There is no issue with it, it just seems very finicky with the specialization changes to try and trigger in-lining. you can't look at code typed either because it always specializes so it might be misleading. Since this is such a low level perf issue (and will be called everywhere) it would be nice to just come up with a way to enforce (to the degree possible) that these methods won't be a surprise performance trap. |
The |
@vchuravy: thoughts please? |
Also summoning @aviatesk. I went looking through the compiler to find the place where this decision is made, Broadly speaking |
Thanks for the info @vchuravy
Makes sense (given the general move towards better compile times) but compiler docs might need to be updated if this is increasingly the case (it might be on I guess my point would be to make this optimization more robust / obvious by just |
Okay, I will tell you what I know ;) As @vchuravy said, these problems relate to recursion detection heuristic within type inference. Rather, I suggest to use Now you can use Cthulhu or JETTest.jl, whichever you like, and you will find some optimization failures happening within the previous implementations of If you use Cthulhu, you will see some callsites are annotated as
# after descending into a bit of madness ...
│ ─ %-1 = invoke map(::var"#15#16"{var"#19#20"},::Tuple{NamedTuple{(:x1, :x2), Tuple{Int64, Int64}}})::Tuple{NamedTuple{(:x1, :x2), Tuple{Int64, Int64}}}
1 ─ nothing::Core.Const(nothing)
│ @ tuple.jl:220 within `map`
│ %2 = Base.getindex(t, 1)::NamedTuple{(:x1, :x2), Tuple{Int64, Int64}}
│ %3 = (f)(%2)::NamedTuple{(:x1, :x2), Tuple{Int64, Int64}}
│ %4 = Core.tuple(%3)::Tuple{NamedTuple{(:x1, :x2), Tuple{Int64, Int64}}}
└── return %4
Select a call to descend into or ↩ to ascend. [q]uit. [b]ookmark.
Toggles: [o]ptimize, [w]arn, [v]erbose printing for warntype code, [d]ebuginfo, [i]nlining costs, [s]yntax highlight for Source/LLVM/Native.
Show: [S]ource code, [A]ST, [L]LVM IR, [N]ative code
Actions: [E]dit source code, [R]evise and redisplay
Advanced: dump [P]arams cache.
%2 = < constprop > getindex(::Tuple{NamedTuple{(:x1, :x2), Tuple{Int64, Int64}}},::Core.Const(1))::NamedTuple{(:x1, :x2), Tuple{Int64, Int64}}
• %3 = < limited > #15(::NamedTuple{(:x1, :x2), Tuple{Int64, Int64}})::NamedTuple{(:x1, :x2), Tuple{Int64, Int64}}
↩ I recommend you use JETTest, because it's automated and can be used for your testsuite: # before this PR
julia> @test_nodispatch rmap(x -> x^3, (y = (x1 = 1, x2 = 2),))
Dispatch Test Failed at REPL[24]:1
Expression: #= REPL[24]:1 =# JETTest.@test_nodispatch rmap((x->begin
x ^ 3
end), (y = (x1 = 1, x2 = 2),))
═════ 4 possible errors found ═════
┌ @ REPL[17]:1 Main.map(#15, X)
│┌ @ namedtuple.jl:208 Base.map(Core.tuple(f), Base.map(Base.Tuple, Core.tuple(Core.tuple(nt), nts...))...)
││┌ @ tuple.jl:220 f(Base.getindex(t, 1))
│││┌ @ REPL[17]:1 Main.rmap(Core.getfield(#self#, :fn), x)
││││┌ @ REPL[17]:1 Main.map(#15, X)
│││││┌ @ namedtuple.jl:204 map(::var"#15#16"{var"#23#24"}, ::NamedTuple{(:x1, :x2), Tuple{Int64, Int64}})
││││││ failed to optimize: map(::var"#15#16"{var"#23#24"}, ::NamedTuple{(:x1, :x2), Tuple{Int64, Int64}})
│││││└─────────────────────
││││┌ @ REPL[17]:1 rmap(::var"#23#24", ::NamedTuple{(:x1, :x2), Tuple{Int64, Int64}})
│││││ failed to optimize: rmap(::var"#23#24", ::NamedTuple{(:x1, :x2), Tuple{Int64, Int64}})
││││└──────────────
│││┌ @ REPL[17]:1 (::var"#15#16"{var"#23#24"})(::NamedTuple{(:x1, :x2), Tuple{Int64, Int64}})
││││ failed to optimize: (::var"#15#16"{var"#23#24"})(::NamedTuple{(:x1, :x2), Tuple{Int64, Int64}})
│││└──────────────
││┌ @ tuple.jl:220 map(::var"#15#16"{var"#23#24"}, ::Tuple{NamedTuple{(:x1, :x2), Tuple{Int64, Int64}}})
│││ failed to optimize: map(::var"#15#16"{var"#23#24"}, ::Tuple{NamedTuple{(:x1, :x2), Tuple{Int64, Int64}}})
││└────────────────
ERROR: There was an error during testing
# after this PR
julia> @test_nodispatch rmap(x -> x^3, (y = (x1 = 1, x2 = 2),))
Test Passed
Expression: #= REPL[31]:1 =# JETTest.@test_nodispatch rmap((x->begin
x ^ 3
end), (y = (x1 = 1, x2 = 2),)) In JETTest,
And @jakebolewski explained above, the optimization benefits of this PR do not come from this, but it seems to come from the other functional changes, i.e. convert the outputs of recursive calls of |
Awesome write up @aviatesk, thanks for that breakdown and for the demonstration of JETTest.jl (which I didn't know about)! Definitely something we can and will integrate into the ClimaCore test suite, this will be hugely useful at spotting regressions. Adding JETTest checks is probably more useful than microbenchmarks at this stage. |
dff978d
to
cf42eb6
Compare
Thanks @aviatesk: JETTest is very nice! |
bors r+ |
Also avoid use of
map
onNamedTuple
s as it doesn't specializef
.