Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible faster matrix sqrt for upper triangular matrices #31007

Closed
mateuszbaran opened this issue Feb 8, 2019 · 4 comments
Closed

Possible faster matrix sqrt for upper triangular matrices #31007

mateuszbaran opened this issue Feb 8, 2019 · 4 comments
Assignees

Comments

@mateuszbaran
Copy link
Contributor

I've noticed that a small change in sqrt(A::UpperTriangular)

function sqrt(A::UpperTriangular)
can significantly improve its performance for smallish matrices. I've replaced sqrt(A,Val(realmatrix)) with

if realmatrix
    sqrt(A,Val(true))
else
    sqrt(A,Val(false))
end

and I see a significant speedup. Full test code:

using LinearAlgebra
using BenchmarkTools

function sqrt2(A::UpperTriangular)
    realmatrix = false
    if isreal(A)
        realmatrix = true
        for i = 1:LinearAlgebra.checksquare(A)
            x = real(A[i,i])
            if x < zero(x)
                realmatrix = false
                break
            end
        end
    end
    if realmatrix
        sqrt(A,Val(true))
    else
        sqrt(A,Val(false))
    end
end

for i in 1:10
    Xr = UpperTriangular(rand(2^i,2^i))
    println("real, size(Xr) = ", size(Xr))
    @btime sqrt($Xr)
    @btime sqrt2($Xr)

    Xc = UpperTriangular(rand(ComplexF64, 2^i,2^i))
    println("complex, size(Xc) = ", size(Xc))
    @btime sqrt($Xc)
    @btime sqrt2($Xc)
end

And my results (Julia 1.1, Core i7 4800MQ), julia -O3:

real, size(Xr) = (2, 2)
  3.616 μs (2 allocations: 128 bytes)
  78.606 ns (2 allocations: 128 bytes)
complex, size(Xc) = (2, 2)
  3.777 μs (2 allocations: 160 bytes)
  158.348 ns (2 allocations: 160 bytes)
real, size(Xr) = (4, 4)
  3.724 μs (2 allocations: 224 bytes)
  149.041 ns (2 allocations: 224 bytes)
complex, size(Xc) = (4, 4)
  4.001 μs (2 allocations: 352 bytes)
  352.613 ns (2 allocations: 352 bytes)
real, size(Xr) = (8, 8)
  4.059 μs (2 allocations: 640 bytes)
  462.076 ns (2 allocations: 640 bytes)
complex, size(Xc) = (8, 8)
  4.724 μs (2 allocations: 1.16 KiB)
  1.048 μs (2 allocations: 1.16 KiB)
real, size(Xr) = (16, 16)
  6.241 μs (2 allocations: 2.14 KiB)
  2.619 μs (2 allocations: 2.14 KiB)
complex, size(Xc) = (16, 16)
  7.986 μs (2 allocations: 4.14 KiB)
  4.286 μs (2 allocations: 4.14 KiB)
real, size(Xr) = (32, 32)
  18.643 μs (2 allocations: 8.14 KiB)
  14.934 μs (2 allocations: 8.14 KiB)
complex, size(Xc) = (32, 32)
  23.200 μs (2 allocations: 16.14 KiB)
  19.396 μs (2 allocations: 16.14 KiB)
real, size(Xr) = (64, 64)
  103.611 μs (3 allocations: 32.09 KiB)
  99.858 μs (3 allocations: 32.09 KiB)
complex, size(Xc) = (64, 64)
  114.902 μs (3 allocations: 64.09 KiB)
  109.975 μs (3 allocations: 64.09 KiB)
real, size(Xr) = (128, 128)
  730.537 μs (3 allocations: 128.09 KiB)
  724.945 μs (3 allocations: 128.09 KiB)
complex, size(Xc) = (128, 128)
  717.583 μs (3 allocations: 256.09 KiB)
  711.489 μs (3 allocations: 256.09 KiB)
real, size(Xr) = (256, 256)
  5.521 ms (3 allocations: 512.09 KiB)
  5.512 ms (3 allocations: 512.09 KiB)
complex, size(Xc) = (256, 256)
  7.157 ms (3 allocations: 1.00 MiB)
  7.148 ms (3 allocations: 1.00 MiB)
real, size(Xr) = (512, 512)
  44.524 ms (3 allocations: 2.00 MiB)
  44.279 ms (3 allocations: 2.00 MiB)
complex, size(Xc) = (512, 512)
  56.995 ms (3 allocations: 4.00 MiB)
  56.932 ms (3 allocations: 4.00 MiB)
real, size(Xr) = (1024, 1024)
  922.565 ms (3 allocations: 8.00 MiB)
  900.415 ms (3 allocations: 8.00 MiB)
complex, size(Xc) = (1024, 1024)
  1.539 s (3 allocations: 16.00 MiB)
  1.538 s (3 allocations: 16.00 MiB)

What do you think about it? This is not really about type stability of matrix sqrt (discussed in #4006, #24296).

@jekbradbury
Copy link
Contributor

I wonder what the difference is from inference’s perspective, and whether it could be fixed there?

@andreasnoack
Copy link
Member

Since the Val depends on the values of the matrix, it's not type stable so we probably end up with a dynamic dispatch which can be costly for small arrays. Maybe the compiler will be able to split such a case automatically one day, since realmatrix is inferred as Bool but meanwhile we should just make this change. @mateuszbaran It would be great if you could open a PR.

@mateuszbaran
Copy link
Contributor Author

OK, I'll make a PR with this change.

mateuszbaran added a commit to mateuszbaran/julia that referenced this issue Feb 18, 2019
Replacing Val(::Bool) in an argument list with an explicit if makes the 
call type stable. The issue was discussed in JuliaLang#31007.
@andreasnoack
Copy link
Member

Superseded by #31100

mateuszbaran added a commit to mateuszbaran/julia that referenced this issue May 1, 2020
Replacing Val(::Bool) in an argument list with an explicit if makes the 
call type stable. The issue was discussed in JuliaLang#31007.
dkarrasch pushed a commit that referenced this issue May 2, 2020
Replacing Val(::Bool) in an argument list with an explicit if makes the 
call type stable. The issue was discussed in #31007.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants