-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compute real matrix logarithm and matrix square root using real arithmetic #39973
Conversation
This substantially reduces allocations for some reason
I added tests for all branches and substantially reduced the allocations. Notably, the matrix square root can be computed in-place and now is within the logarithm. Note that even where there are more allocations, the actual memory used is lower. I also worked out all of the excess allocations:
Updated BenchmarksBenchmark codeusing LinearAlgebra, BenchmarkTools, Random
n = 20;
# matrix square root
Random.seed!(1);
Artor = randn(n, n)^2;
Actor = complex(Artor);
Artoc = randn(n, n);
Actoc = randn(ComplexF64, n, n);
Urtor = UpperTriangular(randn(n, n))^2;
Uctor = complex(Urtor);
Urtoc = UpperTriangular(randn(n, n));
Uctoc = UpperTriangular(randn(ComplexF64, n, n));
println("Dense matrices:");
for (name, mat) in (("ℝ → ℝ", Artor), ("ℂ → ℝ", Actor), ("ℝ → ℂ", Artoc), ("ℂ → ℂ", Actoc))
@assert sqrt(mat)^2 ≈ mat
println(name)
@btime sqrt($mat);
end
println("\nUpper triangular matrices:");
for (name, mat) in (("ℝ → ℝ", Urtor), ("ℂ → ℝ", Uctor), ("ℝ → ℂ", Urtoc), ("ℂ → ℂ", Uctoc))
@assert sqrt(mat)^2 ≈ mat
println(name)
@btime sqrt($mat);
end
# matrix logarithm
Random.seed!(1);
Artor = exp(randn(n, n));
Actor = complex(Artor);
Artoc = randn(n, n);
Actoc = randn(ComplexF64, n, n);
Urtor = UpperTriangular(exp(triu(randn(n, n))));
Uctor = complex(Urtor);
Urtoc = UpperTriangular(randn(n, n));
Uctoc = UpperTriangular(randn(ComplexF64, n, n));
println("Dense matrices:");
for (name, mat) in (("ℝ → ℝ", Artor), ("ℂ → ℝ", Actor), ("ℝ → ℂ", Artoc), ("ℂ → ℂ", Actoc))
@assert exp(log(mat)) ≈ mat
println(name)
@btime log($mat);
end
println("\nUpper triangular matrices:");
for (name, mat) in (("ℝ → ℝ", Urtor), ("ℂ → ℝ", Uctor), ("ℝ → ℂ", Urtoc), ("ℂ → ℂ", Uctoc))
println(name)
try
@assert exp(Matrix(log(mat))) ≈ mat
@btime log($mat);
catch
println(" Error")
end
end Matrix square rootBefore: Dense matrices:
ℝ → ℝ
285.076 μs (10 allocations: 105.59 KiB)
ℂ → ℝ
283.468 μs (9 allocations: 99.22 KiB)
ℝ → ℂ
322.385 μs (10 allocations: 105.59 KiB)
ℂ → ℂ
318.138 μs (9 allocations: 99.22 KiB)
Upper triangular matrices:
ℝ → ℝ
3.127 μs (2 allocations: 3.27 KiB)
ℂ → ℝ
7.844 μs (1 allocation: 6.38 KiB)
ℝ → ℂ
7.076 μs (2 allocations: 6.39 KiB)
ℂ → ℂ
6.704 μs (1 allocation: 6.38 KiB) This PR: Dense matrices:
ℝ → ℝ
125.525 μs (120 allocations: 58.70 KiB)
ℂ → ℝ
129.777 μs (122 allocations: 68.33 KiB)
ℝ → ℂ
146.215 μs (21 allocations: 153.06 KiB)
ℂ → ℂ
317.354 μs (10 allocations: 99.23 KiB)
Upper triangular matrices:
ℝ → ℝ
3.100 μs (2 allocations: 3.27 KiB)
ℂ → ℝ
5.295 μs (4 allocations: 12.89 KiB)
ℝ → ℂ
5.694 μs (2 allocations: 6.39 KiB)
ℂ → ℂ
5.535 μs (2 allocations: 6.39 KiB) Matrix logarithmBefore: Dense matrices:
ℝ → ℝ
386.569 μs (103 allocations: 541.89 KiB)
ℂ → ℝ
391.631 μs (113 allocations: 545.36 KiB)
ℝ → ℂ
384.807 μs (94 allocations: 489.84 KiB)
ℂ → ℂ
554.130 μs (85 allocations: 423.34 KiB)
Upper triangular matrices:
ℝ → ℝ
103.988 μs (123 allocations: 169.84 KiB)
ℂ → ℝ
191.811 μs (70 allocations: 325.05 KiB)
ℝ → ℂ
Error
ℂ → ℂ
412.665 μs (113 allocations: 521.83 KiB) This PR: Dense matrices:
ℝ → ℝ
266.868 μs (587 allocations: 155.17 KiB)
ℂ → ℝ
267.419 μs (589 allocations: 164.80 KiB)
ℝ → ℂ
356.544 μs (64 allocations: 292.23 KiB)
ℂ → ℂ
528.548 μs (53 allocations: 238.41 KiB)
Upper triangular matrices:
ℝ → ℝ
84.371 μs (43 allocations: 68.55 KiB)
ℂ → ℝ
86.358 μs (43 allocations: 75.06 KiB)
ℝ → ℂ
320.204 μs (71 allocations: 254.30 KiB)
ℂ → ℂ
382.941 μs (79 allocations: 298.72 KiB) |
end | ||
end | ||
return UpperTriangular(R) | ||
return rdiv!(AG, UpperTriangular(B)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A consequence to removing all of these allocations is that the function is less general. e.g. log(UpperTriangular(exp(@MMatrix(randn(2,2)))))
worked before this PR. Now it no longer does because rdiv!
is not defined for MMatrix
inputs.
@andreasnoack is there anything else this PR needs? |
The current version is definitely already an improvement so if you want to stop here then I think it's okay. However, I think it's a shame that there are so many allocations. Ideally, the LAPACK wrappers would allow reuse of memory as proposed in #16263 and it should be fairly straightforward to modify the julia/stdlib/LinearAlgebra/src/lapack.jl Lines 6460 to 6461 in efad4e3
scale = Ref{$relty}() though |
It looks like this change alone eliminated ~half of the allocations. The remaining ones come from I also eliminated excess allocations when computing the degree of the Padé approximant. Now the same 3 allocations are reused for all steps. This also cut ~1/3 of Updated benchmarksMatrix square rootAll remaining allocations come from copies or the Schur decomposition. Dense matrices:
ℝ → ℝ
115.852 μs (12 allocations: 50.83 KiB)
ℂ → ℝ
117.443 μs (14 allocations: 60.45 KiB)
ℝ → ℂ
145.638 μs (21 allocations: 153.06 KiB)
ℂ → ℂ
316.831 μs (10 allocations: 99.23 KiB)
Upper triangular matrices:
ℝ → ℝ
3.050 μs (2 allocations: 3.27 KiB)
ℂ → ℝ
5.179 μs (4 allocations: 12.89 KiB)
ℝ → ℂ
5.641 μs (2 allocations: 6.39 KiB)
ℂ → ℂ
5.529 μs (2 allocations: 6.39 KiB) Matrix logarithmDense matrices:
ℝ → ℝ
220.330 μs (42 allocations: 96.14 KiB)
ℂ → ℝ
221.420 μs (44 allocations: 105.77 KiB)
ℝ → ℂ
330.091 μs (48 allocations: 228.39 KiB)
ℂ → ℂ
501.972 μs (37 allocations: 174.56 KiB)
Upper triangular matrices:
ℝ → ℝ
76.920 μs (32 allocations: 48.75 KiB)
ℂ → ℝ
78.498 μs (32 allocations: 55.27 KiB)
ℝ → ℂ
249.985 μs (31 allocations: 81.97 KiB)
ℂ → ℂ
295.925 μs (29 allocations: 81.72 KiB) |
p = 0 | ||
m = 0 | ||
|
||
# Compute repeated roots | ||
d = complex(diag(A)) | ||
# Find s0, the smallest s such that the ρ(triu(A)^(1/2^s) - I) ≤ theta[tmax], where ρ(X) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This maybe should be ρ(A^(1/2^s)
, but the algorithm in the paper says to use the approach in Algorithm 4.1, which is for upper triangular A
and uses only the diagonal.
Great. This is of course a much cleaner solution. I need to update my mental model of the compiler to something post Julia 0.5.
Yes, that's unfortunate. See #28897 |
…metic (JuliaLang#39973) * Add failing test * Add sylvester methods for small matrices * Add 2x2 real matrix square root * Add real square root of quasitriangular matrix * Simplify 2x2 real square root * Rename functions to use quasitriu * Avoid NaNs when eigenvalues are all zero * Reuse ranges * Add clarifying comments * Unify real and complex matrix square root * Add reference for real sqrt * Move quasitriu auxiliary functions to triangular.jl * Ensure loops are type-stable and use simd * Remove duplicate computation * Correctly promote for dimensionful A * Use simd directive * Test that UpperTriangular is returned by sqrt * Test sqrt for UnitUpperTriangular * Test that return type is complex when input type is * Test that output is complex when input is * Add failing test * Separate type-stable from type-unstable part * Use generic sqrt_quasitriu for sqrt triu * Avoid redundant matmul * Clarify comment * Return complex output for complex input * Call log_quasitriu * Add failing test for log type-inferrability * Realify or complexify as necessary * Call sqrt_quasitriu directly * Refactor sqrt_diag! * Simplify utility function * Add comment * Compute accurate block-diagonal * Compute superdiagonal for quasi triu A0 * Compute accurate block superdiagonal * Avoid full LU decomposition in inner loop * Avoid promotion to improve type-stability * Modify return type if necessary * Clarify comment * Add comments * Call log_quasitriu on quasitriu matrices * Document quasi-triangular algorithm * Remove test This matrix has eigenvalues to close to zero that its eltype is not stable * Rearrange definition * Add compatibility for unit triangular matrices * Release constraints on tests * Separate copying of A from log computation * Revert "Separate copying of A from log computation" This reverts commit 23becc5. * Use Givens rotations * Compute Schur in-place when possible * Always allocate a copy * Fix block indexing * Compute sqrt in-place * Overwrite AmI * Reduce allocations in Pade approximation * Use T * Don't unnecessarily unwrap * Test remaining log branches * Add additional matrix square root tests * Separate type-unstable from type-stable part This substantially reduces allocations for some reason * Use Ref instead of a Vector * Eliminate allocation in checksquare * Refactor param choosing code to own function * Comment section * Use more descriptive variable name * Reuse temporaries * Add reference * More accurately describe condition
…metic (JuliaLang#39973) * Add failing test * Add sylvester methods for small matrices * Add 2x2 real matrix square root * Add real square root of quasitriangular matrix * Simplify 2x2 real square root * Rename functions to use quasitriu * Avoid NaNs when eigenvalues are all zero * Reuse ranges * Add clarifying comments * Unify real and complex matrix square root * Add reference for real sqrt * Move quasitriu auxiliary functions to triangular.jl * Ensure loops are type-stable and use simd * Remove duplicate computation * Correctly promote for dimensionful A * Use simd directive * Test that UpperTriangular is returned by sqrt * Test sqrt for UnitUpperTriangular * Test that return type is complex when input type is * Test that output is complex when input is * Add failing test * Separate type-stable from type-unstable part * Use generic sqrt_quasitriu for sqrt triu * Avoid redundant matmul * Clarify comment * Return complex output for complex input * Call log_quasitriu * Add failing test for log type-inferrability * Realify or complexify as necessary * Call sqrt_quasitriu directly * Refactor sqrt_diag! * Simplify utility function * Add comment * Compute accurate block-diagonal * Compute superdiagonal for quasi triu A0 * Compute accurate block superdiagonal * Avoid full LU decomposition in inner loop * Avoid promotion to improve type-stability * Modify return type if necessary * Clarify comment * Add comments * Call log_quasitriu on quasitriu matrices * Document quasi-triangular algorithm * Remove test This matrix has eigenvalues to close to zero that its eltype is not stable * Rearrange definition * Add compatibility for unit triangular matrices * Release constraints on tests * Separate copying of A from log computation * Revert "Separate copying of A from log computation" This reverts commit 23becc5. * Use Givens rotations * Compute Schur in-place when possible * Always allocate a copy * Fix block indexing * Compute sqrt in-place * Overwrite AmI * Reduce allocations in Pade approximation * Use T * Don't unnecessarily unwrap * Test remaining log branches * Add additional matrix square root tests * Separate type-unstable from type-stable part This substantially reduces allocations for some reason * Use Ref instead of a Vector * Eliminate allocation in checksquare * Refactor param choosing code to own function * Comment section * Use more descriptive variable name * Reuse temporaries * Add reference * More accurately describe condition
As discussed on Slack (cc @andreasnoack and @baggepinnen), this PR implements the real matrix square root and real matrix logarithm in the following papers, respectively:
These algorithms are both recommended in:
This was proposed in #5840 and discussed for
sqrt
in #4006.The result is that whenever a real logarithm or square root exists, it is returned. This both speeds up the algorithms (see benchmarks below) and prevents error from accumulating in the imaginary parts of the outputs. To avoid heavy code duplication, the PR refactors the
log
andsqrt
of upper triangular matrices to more generally computelog
andsqrt
of upper quasi-triangular matrices.The changes:
sqrt(A::Matrix{<:Real})
andlog(A::Matrix{<:Real})
unlessA
was symmetric with positive eigenvalues, in which case a real result was returned. Now a real result is returned whenever the eigenvalues are positive.A
(withReal
orComplex
eltype), if a real matrix square root or real matrix logarithm exists, then it is computed using entirely real arithmetic. If the eltype ofA
isComplex
, then a complex result is returned, where the imaginary part is exactly zero. Previously the function would be computed using complex arithmetic, resulting in imaginary parts that were not exactly zero.log(::UnitUpperTriangular)
andlog(::UnitLowerTriangular)
are now defined.Issues fixed along the way:
Float32
andComplexF32
inputs,log
always promoted to theFloat64
analogs. It no longer does.log(::Matrix{ComplexF64})
wasAny
. It is nowMatrix{ComplexF64}
.A::UpperTriangular{<:Real}
with negative eigenvalues,sqrt(A)
returned a complex result, whilelog(A)
would error.log(A)
now returns a complex result.Example
Before:
This PR:
Basic Benchmarks
Here are some quick and dirty benchmarks on my (very old) machine.
Matrix square root
Before:
This PR:
Matrix logarithm
Before:
This PR: