2x regression in indexing benchmarks due to 'Remove references to non opaque pointers in codegen and LLVM passes (#54853)' #55090

Zentrik · 2024-07-09T19:43:20Z

Through bisection I identified 5e1bcdf as causing the regressions below. This change also led to some improvements but caused many more regressions. I believe these two commits were my more minimal version of the identified commit, so would be good to test if they also cause the regression, 4eef1be 756e72f.

A subset of the results is below, for full results see https://tealquaternion.camdvr.org/compare.html?start=a14cc38512b6daab6b8417ebb8a64fc794ff89cc&end=323e725c1e4848414b5642b8f54c24916b9ddd9e&stat=min-wall-time or https://github.com/JuliaCI/NanosoldierReports/blob/master/benchmark/by_date/2024-07/05/report.md.

Summary

	Range	Mean	Count
Regressions	0.52%, 207.23%	27.72%	129
Improvements	-47.24%, -0.17%	-14.31%	22
All	-47.24%, 207.23%	21.60%	151

Benchmarks

Benchmark	% Change	Significance Factor
array.index.(sumrange_view, SubArray{Int32, 2, Base.ReshapedArray{Int32, 2, SubArray{Int32, 3, Array{Int32, 3}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}}, true}, Tuple{}}, Tuple{Base.Slice{Base.OneTo{Int64}}, UnitRange{Int64}}, true})	207.23%	54.86x
array.index.(sumcolon_view, SubArray{Int32, 2, Base.ReshapedArray{Int32, 2, SubArray{Int32, 3, Array{Int32, 3}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}}, true}, Tuple{}}, Tuple{Base.Slice{Base.OneTo{Int64}}, UnitRange{Int64}}, true})	205.96%	54.63x
array.index.(sumeach_view, SubArray{Int32, 2, Array{Int32, 3}, Tuple{Int64, Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}}, true})	159.39%	203.84x
array.index.(sumlinear, SubArray{Int32, 2, Array{Int32, 3}, Tuple{Int64, Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}}, true})	159.14%	244.42x
array.index.(sumeach, SubArray{Int32, 2, Array{Int32, 3}, Tuple{Int64, Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}}, true})	159.02%	451.23x

The text was updated successfully, but these errors were encountered:

Zentrik · 2024-07-10T22:27:38Z

The regression is not present on master with 5e1bcdf reverted and 4eef1be, 756e72f cherry picked (here is the branch https://github.com/Zentrik/julia/tree/test-54853). Those two commits should be sufficient for llvm 18.

Looking at BaseBenchmarks.SUITE[["array", "index", ("sumelt", "SubArray{Int32, 2, Array{Int32, 3}, Tuple{Int64, Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}}, true}")]], the benchmark is

function perf_sumelt(A)
    s = zero(eltype(A))
    for a in A
        s += a
    end
    return s
end

C = rand(Int32, 4, 500, 500)
A = view(C, 1, :, :)
@benchmark perf_sumelt($A)

The unoptimized llvm ir is identical apart from 4 instructions that are %90 = getelementptr inbounds i32, ptr %..., i64 %...on the fast version and %90 = getelementptr i8, ptr %..., i64 %... on the regression. Here's the optimized IR for the fast version https://gist.github.com/Zentrik/7a2834c7672963bb8bf2b00201f3c9fe and the slow version https://gist.github.com/Zentrik/1ff09cce2f2a098b6622286fdaa678f4. The slow version has a lot of extra instructions which I think is just a aggressive vectorization but not sure.

The Julia memory model is always inbounds for GEP. This makes the code in #55090 look almost the same as it did before the change. Locally I wasn't able to reproduce the regression, but given it's vectorized code I suspect it is backend sensitive. Fixes #55090 Co-authored-by: Zentrik <[email protected]>

topolarity · 2024-08-08T20:33:46Z

Sounds like this is still an issue despite #55107:

The regression still exists on nanosoldier and locally and this doesn't seem to have made much difference to performance. The optimized ir is essentially unchanged compared to master apart from some inbounds sprinkled on some geps. The only difference in the unoptimized ir with the fast version is i8s are used instead of i32s.

Zentrik · 2024-08-08T21:15:31Z

While the regression hasn't been fixed, there's probably not much to be done. #55412 reverted the relevant part of the commit causing the regression but with the LLVM 18 upgrade that's now a net negative.

The Julia memory model is always inbounds for GEP. This makes the code in JuliaLang#55090 look almost the same as it did before the change. Locally I wasn't able to reproduce the regression, but given it's vectorized code I suspect it is backend sensitive. Fixes JuliaLang#55090 Co-authored-by: Zentrik <[email protected]>

The Julia memory model is always inbounds for GEP. This makes the code in #55090 look almost the same as it did before the change. Locally I wasn't able to reproduce the regression, but given it's vectorized code I suspect it is backend sensitive. Fixes #55090 Co-authored-by: Zentrik <[email protected]> (cherry picked from commit 7e1f0be)

Zentrik added performance Must go faster regression Regression in behavior compared to a previous version labels Jul 9, 2024

maleadt assigned gbaraldi Jul 10, 2024

gbaraldi mentioned this issue Jul 12, 2024

Make the memory GEP an inbounds GEP since the bounds check has happened somewhere else #55107

Merged

Zentrik mentioned this issue Aug 7, 2024

Make the memory GEP use the memory type #55412

Closed

vtjnash closed this as completed in #55107 Aug 8, 2024

topolarity reopened this Aug 8, 2024

gbaraldi mentioned this issue Sep 2, 2024

Revert "Make the memory GEP an inbounds GEP since the bounds check has happened somewhere else" #55674

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2x regression in indexing benchmarks due to 'Remove references to non opaque pointers in codegen and LLVM passes (#54853)' #55090

2x regression in indexing benchmarks due to 'Remove references to non opaque pointers in codegen and LLVM passes (#54853)' #55090

Zentrik commented Jul 9, 2024 •

edited

Loading

Zentrik commented Jul 10, 2024 •

edited

Loading

topolarity commented Aug 8, 2024

Zentrik commented Aug 8, 2024

2x regression in indexing benchmarks due to 'Remove references to non opaque pointers in codegen and LLVM passes (#54853)' #55090

2x regression in indexing benchmarks due to 'Remove references to non opaque pointers in codegen and LLVM passes (#54853)' #55090

Comments

Zentrik commented Jul 9, 2024 • edited Loading

Summary

Benchmarks

Zentrik commented Jul 10, 2024 • edited Loading

topolarity commented Aug 8, 2024

Zentrik commented Aug 8, 2024

Zentrik commented Jul 9, 2024 •

edited

Loading

Zentrik commented Jul 10, 2024 •

edited

Loading