Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make A[:] = x::Number fall back to fill! #13459

Closed
wants to merge 1 commit into from

Conversation

KristofferC
Copy link
Member

@mbauman mentioned this in https://groups.google.com/forum/#!topic/julia-users/ub08AnpIutw

A[:] previously fell back to the AbstractArray function.

Before there was a performance difference:

julia> @time fill!(A, 0.);
  0.007387 seconds (4 allocations: 160 bytes)

julia> @time A[:] = 0.
  0.010574 seconds (5 allocations: 176 bytes)

@yuyichao
Copy link
Contributor

yuyichao commented Oct 5, 2015

I don't think this should be necessary. From what I can see, the performance difference is caused by #13301 (although it seems to be a store to the GC frame this time....). LLVM IR below, for whatever reason there's a store %jl_value_t* %12, %jl_value_t** %4, align 8, !dbg !41 in the loop that is storying a constant value to the GC frame.

julia> Base.code_llvm_raw(Base._unsafe_batchsetindex!, Tuple{Base.LinearFast, Matrix{Float64}, Base.Repeated{Int64}, Colon})

define %jl_value_t* @"julia__unsafe_batchsetindex!_22134"(%jl_value_t*, %jl_value_t**, i32) {
top:                                                                              
  call void @llvm.dbg.value(metadata %jl_value_t** %1, i64 0, metadata !2, metadata !14), !dbg !15
  call void @llvm.dbg.value(metadata %jl_value_t** %1, i64 0, metadata !16, metadata !17), !dbg !15
  call void @llvm.dbg.value(metadata %jl_value_t** %1, i64 0, metadata !18, metadata !19), !dbg !15
  %3 = alloca [3 x %jl_value_t*], align 8, !dbg !15                               
  %.sub = getelementptr inbounds [3 x %jl_value_t*], [3 x %jl_value_t*]* %3, i64 0, i64 0
  %4 = getelementptr [3 x %jl_value_t*], [3 x %jl_value_t*]* %3, i64 0, i64 2, !dbg !15
  %5 = bitcast [3 x %jl_value_t*]* %3 to i64*, !dbg !15                           
  store i64 2, i64* %5, align 8, !dbg !15                                         
  %6 = getelementptr [3 x %jl_value_t*], [3 x %jl_value_t*]* %3, i64 0, i64 1, !dbg !15
  %7 = load i64, i64* bitcast (%jl_value_t*** @jl_pgcstack to i64*), align 8, !dbg !15
  %8 = bitcast %jl_value_t** %6 to i64*, !dbg !15                                 
  store i64 %7, i64* %8, align 8, !dbg !15                                        
  store %jl_value_t** %.sub, %jl_value_t*** @jl_pgcstack, align 8, !dbg !15       
  store %jl_value_t* null, %jl_value_t** %4, align 8, !dbg !15                    
  %9 = icmp eq i32 %2, 3, !dbg !20
  br i1 %9, label %fail, label %pass, !dbg !20

fail:                                             ; preds = %top
  %10 = getelementptr %jl_value_t*, %jl_value_t** %1, i64 3, !dbg !20
  call void @jl_bounds_error_tuple_int(%jl_value_t** %10, i64 0, i64 1), !dbg !20
  unreachable, !dbg !20

pass:                                             ; preds = %top
  %11 = getelementptr %jl_value_t*, %jl_value_t** %1, i64 1, !dbg !15
  %12 = load %jl_value_t*, %jl_value_t** %11, align 8, !dbg !15
  %13 = getelementptr inbounds %jl_value_t, %jl_value_t* %12, i64 1, !dbg !15
  %14 = bitcast %jl_value_t* %13 to i64*, !dbg !15
  %15 = load i64, i64* %14, align 8, !dbg !15, !tbaa !21
  call void @llvm.dbg.value(metadata i64 1, i64 0, metadata !22, metadata !24), !dbg !15
  call void @llvm.dbg.value(metadata i64 1, i64 0, metadata !25, metadata !24), !dbg !15
  call void @llvm.dbg.value(metadata i64 1, i64 0, metadata !26, metadata !24), !dbg !15
  %16 = icmp slt i64 %15, 1, !dbg !27
  br i1 %16, label %L.4, label %L.preheader, !dbg !27

L.preheader:                                      ; preds = %pass
  %17 = bitcast %jl_value_t* %12 to double**, !dbg !15
  %18 = load double*, double** %17, align 8, !dbg !15, !tbaa !32
  %19 = getelementptr %jl_value_t*, %jl_value_t** %1, i64 2, !dbg !15
  %20 = bitcast %jl_value_t** %19 to i64**, !dbg !15
  %21 = load i64*, i64** %20, align 8, !dbg !15
  br label %L, !dbg !27

L:                                                ; preds = %L, %L.preheader
  %"#s2.0" = phi i64 [ %22, %L ], [ 1, %L.preheader ]
  %22 = add i64 %"#s2.0", 1, !dbg !27
  call void @llvm.dbg.value(metadata i64 %"#s2.0", i64 0, metadata !35, metadata !24), !dbg !15
  call void @llvm.dbg.value(metadata i64 %22, i64 0, metadata !26, metadata !24), !dbg !15
  call void @llvm.dbg.value(metadata i64 %"#s2.0", i64 0, metadata !36, metadata !24), !dbg !15
  %23 = load i64, i64* %21, align 16, !dbg !37, !tbaa !38
  call void @llvm.dbg.value(metadata i64 1, i64 0, metadata !39, metadata !24), !dbg !15
  call void @llvm.dbg.value(metadata i64 %23, i64 0, metadata !40, metadata !24), !dbg !15
  call void @llvm.dbg.value(metadata i64 2, i64 0, metadata !39, metadata !24), !dbg !15
  call void @llvm.dbg.value(metadata i64 3, i64 0, metadata !39, metadata !24), !dbg !15
  %24 = add i64 %"#s2.0", -1, !dbg !41
  %25 = sitofp i64 %23 to double, !dbg !41
  %26 = getelementptr double, double* %18, i64 %24, !dbg !41
  store double %25, double* %26, align 8, !dbg !41, !tbaa !42
  store %jl_value_t* %12, %jl_value_t** %4, align 8, !dbg !41
  %27 = icmp eq i64 %"#s2.0", %15, !dbg !43
  br i1 %27, label %L.4.loopexit, label %L, !dbg !43

L.4.loopexit:                                     ; preds = %L
  br label %L.4, !dbg !45

L.4:                                              ; preds = %L.4.loopexit, %pass
  %28 = load i64, i64* %8, align 8, !dbg !45
  store i64 %28, i64* bitcast (%jl_value_t*** @jl_pgcstack to i64*), align 8, !dbg !45
  ret %jl_value_t* %12, !dbg !45
}

@yuyichao
Copy link
Contributor

yuyichao commented Oct 5, 2015

OK I can see this is happening on release 0.4 as well so it's not #13301.

The AST is a little bit strange and it seems to generate a temporary variable that is never used (which causes the GC frame store)

For _unsafe_batchsetindex!

      $(Expr(:boundscheck, false))
      _var0 = (Base.arrayset)(A::Array{Float64,2},(Base.box)(Float64,(Base.sitofp)(Float64,v::Int64)::Any)::Float64,offset_0::Int64)::Array{Float64,2}
      goto 26
      _var0 = $(Expr(:boundscheck, :(Base.pop)))
      26: 
      _var0::Array{Float64,2} # cartesian.jl, line 34:

For fill!

      $(Expr(:boundscheck, false))
      (Base.arrayset)(a::Array{Float64,2},x::Float64,i::Int64)::Array{Float64,2}
      $(Expr(:boundscheck, :(Base.pop)))

Note that both functions uses a loop and inbound assignment but seems that inlining of unsafe_setindex! confuses the compiler. We could probably work around this but maybe we should just fix the compiler/type inference inliner?

@KristofferC
Copy link
Member Author

Close in favour of #13463?

@yuyichao
Copy link
Contributor

yuyichao commented Oct 6, 2015

Not sure, maybe this can be replaced by #13461 if we don't have #13359.

@mbauman
Copy link
Member

mbauman commented Oct 7, 2015

Thanks for doing this, @KristofferC! I went with the more general solution in #13461 for now, but this was a great to get that ball rolling.

@mbauman mbauman closed this Oct 7, 2015
@KristofferC KristofferC deleted the kc/colon_num branch October 8, 2015 10:27
@stevengj
Copy link
Member

stevengj commented May 10, 2016

I still think this is a good idea. As long as we have a specialized fill! function, it is really strange not to use it for A[:] = x, which is the most natural way for users to do a fill! operation. Especially since in the common case A[:] = 0, I think fill! calls specialized memset code?

Moreover, A[:] = x is currently quite slow (doing a heap allocation), as mentioned in #16302, although this will hopefully be fixed. It's rather unexpected for A[:] = x to perform differently than fill!

@mbauman
Copy link
Member

mbauman commented May 10, 2016

You're right, I agree this is still a good idea.

fill! is also defined for all AbstractArrays in a sensible way, so we can do this for everyone (and not just Array). Unfortunately, the most sensible place to do this is immediately after we incur the splatting penalty.

Perhaps a PR that introduces a @generated function to work around the lack of call-site specialization will light a fire here? In general, setindex! shouldn't need to allocate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants