Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix performance bug for * with AbstractQ #44615

Merged
merged 2 commits into from
Mar 22, 2022
Merged

Fix performance bug for * with AbstractQ #44615

merged 2 commits into from
Mar 22, 2022

Conversation

dkarrasch
Copy link
Member

This fixes a major regression discovered in JuliaLang/LinearAlgebra.jl#800 in multiplication of Q times Diagonal (and other structurally sparse matrices). The reason was that these "sparse" matrices took precedence, however assuming that the other factor was easily indexable. In older versions, Q*D = rmul!(copyto!(Matrix{}(undef, size(Q)), Q), D), so we sped up copyto! for Qs. Recently (in the v1.8 cycle) we defined diagonal multiplication by 3-arg mul!, thus avoiding the copy. But now multiplication fell back to scalar indexing, in this case to performance hell. So, in this PR we teach Julia that in the case of two special matrices, Qshould take precedence, and the other factor should by densified, since the result will be dense anyway.

Closes JuliaLang/LinearAlgebra.jl#800.

@dkarrasch dkarrasch added performance Must go faster linear algebra Linear algebra backport 1.8 Change should be backported to release-1.8 labels Mar 14, 2022
@dkarrasch
Copy link
Member Author

dkarrasch commented Mar 15, 2022

Performance results following the issue to be closed:

using LinearAlgebra, BenchmarkTools
n = 300
M = rand(n,n);
dl = ones(n-1);
d = ones(n);
D = Diagonal(d);
Bi = Bidiagonal(d, dl, :L);
Tri = Tridiagonal(dl, d, dl);
Sym = SymTridiagonal(d, dl);
F = qr(ones(n, 1));
A = F.Q';
for A in (F.Q, F.Q'), B in (M, D, Bi, Tri, Sym)
  @btime $B*$A
  @btime $A*$B
end

yields

  132.073 μs (3 allocations: 705.67 KiB)
  141.561 μs (3 allocations: 705.67 KiB)
  112.906 μs (3 allocations: 705.67 KiB)
  122.037 μs (3 allocations: 705.67 KiB)
  114.454 μs (3 allocations: 705.67 KiB)
  123.033 μs (3 allocations: 705.67 KiB)
  113.941 μs (3 allocations: 705.67 KiB)
  124.189 μs (3 allocations: 705.67 KiB)
  110.328 μs (3 allocations: 705.67 KiB)
  122.690 μs (3 allocations: 705.67 KiB)
  130.392 μs (3 allocations: 705.67 KiB)
  143.679 μs (3 allocations: 705.67 KiB)
  113.639 μs (3 allocations: 705.67 KiB)
  122.109 μs (3 allocations: 705.67 KiB)
  113.455 μs (3 allocations: 705.67 KiB)
  122.141 μs (3 allocations: 705.67 KiB)
  113.554 μs (3 allocations: 705.67 KiB)
  123.823 μs (3 allocations: 705.67 KiB)
  114.012 μs (3 allocations: 705.67 KiB)
  122.848 μs (3 allocations: 705.67 KiB)

including the latest Matrix constructor improvements, which avoid reading out the many structural zeros.

@dkarrasch dkarrasch force-pushed the dk/fixq branch 2 times, most recently from 8ec2be0 to 8702ee7 Compare March 15, 2022 17:16
@KristofferC KristofferC added this to the 1.8 milestone Mar 16, 2022
@ararslan
Copy link
Member

CI failures:

  • Linux x86: Out of memory, which I think has been common for this builder
  • macOS x64: LibGit2 failures related to master vs. main branch naming
  • Windows x86: Something weird with profiling

All unrelated to this PR but I'm always wary of backporting things that fail CI.

@dkarrasch
Copy link
Member Author

Thanks for checking. I understand your concern, but the regression that is fixed here is horrible. I don't think we want to release v1.8 with that performance trap.

@ViralBShah
Copy link
Member

ViralBShah commented Mar 17, 2022

Maybe we can wait for a couple of days for this to get better here to make sure CI is more green, and then merge. At that point, it can also be backported with peace of mind.

@ararslan
Copy link
Member

The macOS failure has been fixed on master, so that at least would likely be fixed here with a rebase.

@ViralBShah
Copy link
Member

I think this is good to merge - the 32-bit CI seems to be having trouble.

@dkarrasch dkarrasch merged commit fc9c280 into master Mar 22, 2022
@dkarrasch dkarrasch deleted the dk/fixq branch March 22, 2022 20:44
@odow
Copy link
Contributor

odow commented Mar 23, 2022

This broke JuMP's tests because it's not the case that Diagonal{T,Vector{T}} can be converted to a Matrix{T}.

Here's what used to happen:

julia> using JuMP, LinearAlgebra

julia> model = Model();

julia> @variable(model, x[1:2])
2-element Vector{VariableRef}:
 x[1]
 x[2]

julia> D = Diagonal(x)
2×2 Diagonal{VariableRef, Vector{VariableRef}}:
 x[1]  
      x[2]

julia> Matrix(D)
2×2 Matrix{AffExpr}:
 x[1]  0
 0     x[2]

Note the promotion in element type from VariableRef to AffExpr. This happens when zero(T) isa T is false.

Now we get
image

cc @blegat

@dkarrasch
Copy link
Member Author

Thanks for the quick report!!! I'm very sorry about that. I overlooked the special type promotion in diagm. I think I have a fix and maybe even a test to guard against this issue.... alright, JuMP tests pass locally with my fix!

KristofferC pushed a commit that referenced this pull request Mar 23, 2022
@KristofferC KristofferC mentioned this pull request Mar 23, 2022
22 tasks
@odow
Copy link
Contributor

odow commented Mar 23, 2022

Thanks for the fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
linear algebra Linear algebra performance Must go faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Two performance problems with the Q in QR factorizations
5 participants