-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
make A[:] = x::Number fall back to fill! #13459
Conversation
I don't think this should be necessary. From what I can see, the performance difference is caused by #13301 (although it seems to be a store to the GC frame this time....). LLVM IR below, for whatever reason there's a julia> Base.code_llvm_raw(Base._unsafe_batchsetindex!, Tuple{Base.LinearFast, Matrix{Float64}, Base.Repeated{Int64}, Colon})
define %jl_value_t* @"julia__unsafe_batchsetindex!_22134"(%jl_value_t*, %jl_value_t**, i32) {
top:
call void @llvm.dbg.value(metadata %jl_value_t** %1, i64 0, metadata !2, metadata !14), !dbg !15
call void @llvm.dbg.value(metadata %jl_value_t** %1, i64 0, metadata !16, metadata !17), !dbg !15
call void @llvm.dbg.value(metadata %jl_value_t** %1, i64 0, metadata !18, metadata !19), !dbg !15
%3 = alloca [3 x %jl_value_t*], align 8, !dbg !15
%.sub = getelementptr inbounds [3 x %jl_value_t*], [3 x %jl_value_t*]* %3, i64 0, i64 0
%4 = getelementptr [3 x %jl_value_t*], [3 x %jl_value_t*]* %3, i64 0, i64 2, !dbg !15
%5 = bitcast [3 x %jl_value_t*]* %3 to i64*, !dbg !15
store i64 2, i64* %5, align 8, !dbg !15
%6 = getelementptr [3 x %jl_value_t*], [3 x %jl_value_t*]* %3, i64 0, i64 1, !dbg !15
%7 = load i64, i64* bitcast (%jl_value_t*** @jl_pgcstack to i64*), align 8, !dbg !15
%8 = bitcast %jl_value_t** %6 to i64*, !dbg !15
store i64 %7, i64* %8, align 8, !dbg !15
store %jl_value_t** %.sub, %jl_value_t*** @jl_pgcstack, align 8, !dbg !15
store %jl_value_t* null, %jl_value_t** %4, align 8, !dbg !15
%9 = icmp eq i32 %2, 3, !dbg !20
br i1 %9, label %fail, label %pass, !dbg !20
fail: ; preds = %top
%10 = getelementptr %jl_value_t*, %jl_value_t** %1, i64 3, !dbg !20
call void @jl_bounds_error_tuple_int(%jl_value_t** %10, i64 0, i64 1), !dbg !20
unreachable, !dbg !20
pass: ; preds = %top
%11 = getelementptr %jl_value_t*, %jl_value_t** %1, i64 1, !dbg !15
%12 = load %jl_value_t*, %jl_value_t** %11, align 8, !dbg !15
%13 = getelementptr inbounds %jl_value_t, %jl_value_t* %12, i64 1, !dbg !15
%14 = bitcast %jl_value_t* %13 to i64*, !dbg !15
%15 = load i64, i64* %14, align 8, !dbg !15, !tbaa !21
call void @llvm.dbg.value(metadata i64 1, i64 0, metadata !22, metadata !24), !dbg !15
call void @llvm.dbg.value(metadata i64 1, i64 0, metadata !25, metadata !24), !dbg !15
call void @llvm.dbg.value(metadata i64 1, i64 0, metadata !26, metadata !24), !dbg !15
%16 = icmp slt i64 %15, 1, !dbg !27
br i1 %16, label %L.4, label %L.preheader, !dbg !27
L.preheader: ; preds = %pass
%17 = bitcast %jl_value_t* %12 to double**, !dbg !15
%18 = load double*, double** %17, align 8, !dbg !15, !tbaa !32
%19 = getelementptr %jl_value_t*, %jl_value_t** %1, i64 2, !dbg !15
%20 = bitcast %jl_value_t** %19 to i64**, !dbg !15
%21 = load i64*, i64** %20, align 8, !dbg !15
br label %L, !dbg !27
L: ; preds = %L, %L.preheader
%"#s2.0" = phi i64 [ %22, %L ], [ 1, %L.preheader ]
%22 = add i64 %"#s2.0", 1, !dbg !27
call void @llvm.dbg.value(metadata i64 %"#s2.0", i64 0, metadata !35, metadata !24), !dbg !15
call void @llvm.dbg.value(metadata i64 %22, i64 0, metadata !26, metadata !24), !dbg !15
call void @llvm.dbg.value(metadata i64 %"#s2.0", i64 0, metadata !36, metadata !24), !dbg !15
%23 = load i64, i64* %21, align 16, !dbg !37, !tbaa !38
call void @llvm.dbg.value(metadata i64 1, i64 0, metadata !39, metadata !24), !dbg !15
call void @llvm.dbg.value(metadata i64 %23, i64 0, metadata !40, metadata !24), !dbg !15
call void @llvm.dbg.value(metadata i64 2, i64 0, metadata !39, metadata !24), !dbg !15
call void @llvm.dbg.value(metadata i64 3, i64 0, metadata !39, metadata !24), !dbg !15
%24 = add i64 %"#s2.0", -1, !dbg !41
%25 = sitofp i64 %23 to double, !dbg !41
%26 = getelementptr double, double* %18, i64 %24, !dbg !41
store double %25, double* %26, align 8, !dbg !41, !tbaa !42
store %jl_value_t* %12, %jl_value_t** %4, align 8, !dbg !41
%27 = icmp eq i64 %"#s2.0", %15, !dbg !43
br i1 %27, label %L.4.loopexit, label %L, !dbg !43
L.4.loopexit: ; preds = %L
br label %L.4, !dbg !45
L.4: ; preds = %L.4.loopexit, %pass
%28 = load i64, i64* %8, align 8, !dbg !45
store i64 %28, i64* bitcast (%jl_value_t*** @jl_pgcstack to i64*), align 8, !dbg !45
ret %jl_value_t* %12, !dbg !45
} |
OK I can see this is happening on release 0.4 as well so it's not #13301. The AST is a little bit strange and it seems to generate a temporary variable that is never used (which causes the GC frame store) For $(Expr(:boundscheck, false))
_var0 = (Base.arrayset)(A::Array{Float64,2},(Base.box)(Float64,(Base.sitofp)(Float64,v::Int64)::Any)::Float64,offset_0::Int64)::Array{Float64,2}
goto 26
_var0 = $(Expr(:boundscheck, :(Base.pop)))
26:
_var0::Array{Float64,2} # cartesian.jl, line 34: For $(Expr(:boundscheck, false))
(Base.arrayset)(a::Array{Float64,2},x::Float64,i::Int64)::Array{Float64,2}
$(Expr(:boundscheck, :(Base.pop))) Note that both functions uses a loop and inbound assignment but seems that inlining of |
Close in favour of #13463? |
Thanks for doing this, @KristofferC! I went with the more general solution in #13461 for now, but this was a great to get that ball rolling. |
I still think this is a good idea. As long as we have a specialized Moreover, |
You're right, I agree this is still a good idea.
Perhaps a PR that introduces a |
@mbauman mentioned this in https://groups.google.com/forum/#!topic/julia-users/ub08AnpIutw
A[:]
previously fell back to the AbstractArray function.Before there was a performance difference: