-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Store to GC frame preventing vectorization #15717
Comments
Simple code and the corresponding IR on the first bad commit. julia> function f(a)
@inbounds @simd for i in eachindex(a)
a[i] += 1
end
nothing
end
f (generic function with 1 method) if13: ; preds = %if13.preheader, %if13
%"##i#7433.01" = phi i64 [ %41, %if13 ], [ 0, %if13.preheader ]
store %jl_value_t* %0, %jl_value_t** %3, align 8, !dbg !49, !llvm.mem.parallel_loop_access !57
%35 = load float*, float** %24, align 8, !dbg !49, !tbaa !59, !llvm.mem.parallel_loop_access !57
%36 = getelementptr float, float* %35, i64 %"##i#7433.01", !dbg !49
%37 = load float, float* %36, align 4, !dbg !49, !tbaa !60, !llvm.mem.parallel_loop_access !57
%38 = fadd float %37, 1.000000e+00, !dbg !49
store %jl_value_t* %0, %jl_value_t** %4, align 8, !dbg !49, !llvm.mem.parallel_loop_access !57
%39 = load float*, float** %24, align 8, !dbg !49, !tbaa !59, !llvm.mem.parallel_loop_access !57
%40 = getelementptr float, float* %39, i64 %"##i#7433.01", !dbg !49
store float %38, float* %40, align 4, !dbg !49, !tbaa !60, !llvm.mem.parallel_loop_access !57
%41 = add nuw nsw i64 %"##i#7433.01", 1, !dbg !61, !simd_loop !13
call void @llvm.dbg.value(metadata i64 %41, i64 0, metadata !22, metadata !31), !dbg !32
%exitcond = icmp eq i64 %41, %22, !dbg !50
br i1 %exitcond, label %L.backedge.loopexit, label %if13, !dbg !50, !llvm.loop !58 |
Ref #13777 |
Actually looks like the issue is similar to #15402 but it gets even worse after that |
With #15735 and #13463 the example above vectorizes at normal optimization level. However, if the array is allocated in the same function or the variable is otherwise assgined to (similar to #13301) the redundant store in the loop still prevent the optimization from happening at normal optimization level. function f_simd(n::Integer)
a = zeros(Float32, n)
@inbounds @simd for i in eachindex(a)
a[i] += 1
end
nothing
end IR of the inner loop: if15: ; preds = %if15, %if15.lr.ph
%"i#256.017" = phi i64 [ 0, %if15.lr.ph ], [ %45, %if15 ]
store %jl_value_t* %24, %jl_value_t** %9, align 8, !dbg !53, !tbaa !32, !llvm.mem.parallel_loop_access !59
%42 = getelementptr float, float* %41, i64 %"i#256.017", !dbg !53
%43 = load float, float* %42, align 4, !dbg !53, !tbaa !61, !llvm.mem.parallel_loop_access !59
%44 = fadd float %43, 1.000000e+00, !dbg !53
store %jl_value_t* %24, %jl_value_t** %10, align 8, !dbg !53, !tbaa !32, !llvm.mem.parallel_loop_access !59
store float %44, float* %42, align 4, !dbg !53, !tbaa !61, !llvm.mem.parallel_loop_access !59
%45 = add nuw nsw i64 %"i#256.017", 1, !dbg !62, !simd_loop !4
call void @llvm.dbg.value(metadata i64 %45, i64 0, metadata !26, metadata !30), !dbg !31
%exitcond = icmp eq i64 %45, %35, !dbg !58
br i1 %exitcond, label %L11.loopexit, label %if15, !dbg !58, !llvm.loop !60 |
Vectorization works now (on llvm 3.8 at least) move to #15369 |
Using patched LLVM 3.7.1 and OrcJIT.
Bisect log
Possibly similar to #13301 but that was "fixed" after codegen_rewrite2 and SIMD doesn't even work for the cases that used to work before...
@vtjnash
The text was updated successfully, but these errors were encountered: