-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Call-site splatting optimization #13359
Comments
dispatch / codegen is doing a bit of bait-and-switch and using significantly less memory for the regular function, since it has compiled one version for |
If erasing those mistakes would narrow this gap, that alone would be a big advance. There are a few cases where you need specialization, but we might get that through #11242. |
Very interesting: I just discovered that there are circumstances where things aren't so pessimistic. I've long thought that a great milestone would be to come up with an implementation of However, with aggressive use of @inline sb2ind(dims, I::Tuple) = sb2ind(dims, I...)
@inline sb2ind(dims, I::Integer...) = _sb2ind(dims, 1, 1, I...)
_sb2ind(::Tuple{}, dimsprod, indx) = indx
@inline function _sb2ind(dims, dimsprod, indx, i::Integer, I::Integer...)
d = dims[1]
_sb2ind(Base.tail(dims), dimsprod*d, indx+(i-1)*dimsprod, I...)
end
# Here's what we're benchmarking against. Let's make sure there's no splatting penalty.
@generated function Base.sub2ind{N}(dims, I::CartesianIndex{N})
iexprs = [:(I[$d]) for d = 1:N]
meta = Expr(:meta, :inline)
quote
$meta
sub2ind(dims, $(iexprs...))
end
end
function run_sub2ind(dims)
s = 0
for I in CartesianRange(dims)
s += sub2ind(dims, I)
end
s
end
function run_sb2ind(dims)
s = 0
for I in CartesianRange(dims)
s += sb2ind(dims, I.I)
end
s
end
dims = ((100,100,50))
run_sub2ind(dims)
run_sb2ind(dims)
println("Warm up @time:")
@time 1
println("sub2ind (@generated functions):")
@time run_sub2ind(dims)
println("sb2ind (lispy-version):")
@time run_sb2ind(dims) Results on my machine: julia> include("sb2ind.jl")
Warm up @time:
0.000001 seconds (3 allocations: 144 bytes)
sub2ind (@generated functions):
0.001094 seconds (5 allocations: 176 bytes)
sb2ind (lispy-version):
0.001093 seconds (5 allocations: 176 bytes)
125000250000 Note that you can't get rid of any of these This is really good news! This could actually shake things up a lot, if it generalized to more cases (like the one that started this post). CC @Jutho, @mbauman. |
I thought we also had a non-
I was surprised to see the generated definition start allocating more quickly though? |
Is this a dup of #5402? |
This is about call-site splatting, that seems focused on declaration-splatting. |
@Jutho, I hadn't remembered that we'd gone up to such high dimensionality (see #10337 (comment)); but that's a complicated thread, so I may be misreading it. I suspect this is just the MAX constants defined in inference.jl (dratted tabbing and key binding messed up the original version of this) |
I can't find the link, but I recall that @eschnett discovered another example, which I seem to remember was something like this: julia> @inline myadd(x,y) = x+y
myadd (generic function with 1 method)
julia> @inline myadd(x,y,z,a...) = myadd(x+y,z,a...)
myadd (generic function with 2 methods)
julia> @code_native myadd(1,2)
.text
Filename: none
Source line: 1
pushq %rbp
movq %rsp, %rbp
Source line: 1
addq %rsi, %rdi
movq %rdi, %rax
popq %rbp
ret
julia> @code_native myadd(1,2,3)
.text
Filename: none
Source line: 1
pushq %rbp
movq %rsp, %rbp
Source line: 1
pushq %rbx
subq $24, %rsp
movq $2, -32(%rbp)
movabsq $jl_pgcstack, %rbx
movq (%rbx), %rax
movq %rax, -24(%rbp)
leaq -32(%rbp), %rax
movq %rax, (%rbx)
movq $0, -16(%rbp)
movq (%rsi), %rcx
movq 8(%rsi), %rax
movq (%rax), %rdi
Source line: 1
movabsq $jl_box_int64, %rax
addq (%rcx), %rdi
Source line: 1
movq 16(%rsi), %rcx
Source line: 1
addq (%rcx), %rdi
callq *%rax
movq -24(%rbp), %rcx
movq %rcx, (%rbx)
addq $24, %rsp
popq %rbx
popq %rbp
ret
julia> myadd3(x,y,z) = (x+y)+z
myadd3 (generic function with 1 method)
julia> @code_native myadd3(1,2,3)
.text
Filename: none
Source line: 1
pushq %rbp
movq %rsp, %rbp
Source line: 1
addq %rsi, %rdi
leaq (%rdi,%rdx), %rax
popq %rbp
ret Interestingly, julia> @code_typed myadd3(1,2,3)
1-element Array{Any,1}:
:($(Expr(:lambda, Any[:x,:y,:z], Any[Any[Any[:x,Int64,0],Any[:y,Int64,0],Any[:z,Int64,0]],Any[],Any[],Any[]], :(begin # none, line 1:
return (Base.box)(Base.Int,(Base.add_int)((Base.box)(Base.Int,(Base.add_int)(x::Int64,y::Int64)),z::Int64))
end::Int64))))
julia> @code_typed myadd(1,2,3)
1-element Array{Any,1}:
:($(Expr(:lambda, Any[:x,:y,:z,:(a::Any...)], Any[Any[Any[:x,Int64,0],Any[:y,Int64,0],Any[:z,Int64,0],Any[:a,Tuple{},0]],Any[],Any[],Any[]], :(begin
$(Expr(:meta, :inline)) # none, line 1:
return (Base.box)(Base.Int,(Base.add_int)((Base.box)(Base.Int,(Base.add_int)(x::Int64,y::Int64)),z::Int64))
end::Int64)))) |
I added a version of Tim's benchmark to BaseBenchmarks. |
Splatting has a known penalty, but today I looked into it a bit more carefully and I wonder if there's an easy fix for part of the problem. For those who don't like
@generated
functions, this might be a good opportunity to reduce their numbers, since I think my major use for them now is to avoid the splatting penalty.First the demo:
Results:
You can see there's an absolutely enormous, deadly penalty for any kind of splatting. Since I write a lot of code that has to work in arbitrary dimensions, it's a major contributor to why I tend to write so many
@generated
functions.Now here's the fun part: look at the difference in
@code_typed
forcall_bar2a
andcall_bar2b
(as a screenshot so you can see the colors):I think the only difference is
top(getfield)
vsBase.getfield
. (EDIT: I deleted the line number annotations to reduce the size of this diff.)The text was updated successfully, but these errors were encountered: