Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement more matrix functions #5840

Closed
5 of 17 tasks
jiahao opened this issue Feb 17, 2014 · 54 comments
Closed
5 of 17 tasks

Implement more matrix functions #5840

jiahao opened this issue Feb 17, 2014 · 54 comments
Labels
help wanted Indicates that a maintainer wants help on an issue or pull request linear algebra Linear algebra stdlib Julia's standard library

Comments

@jiahao
Copy link
Member

jiahao commented Feb 17, 2014

Higham and Deadman have published a list of software implementing algorithms for matrix functions, showing that Julia's library for these functions could be improved greatly relative to the state of the art.

The purpose of this issue is to discuss and track the implementation of matrix functions.

From Higham and Deadman's list:

General function evaluations

  • General matrix function with derivatives of the underlying scalar function available: Schur–Parlett algorithm (Davies and Higham, 2003)
  • Function of a symmetric or Hermitian matrix by diagonalization
  • Condition number estimate for general functions: cond(f,A)
  • matrix function - vector product evaluation: f(A) * b

Specific functions

  • expm: scaling and squaring algorithm (Al-Mohy and Higham, 2009)
  • logm: inverse scaling and squaring algorithm (Al-Mohy, Higham, and Relton, 2012,
    2013)
  • sqrtm:
    • Schur algorithm (Björck and Hammarling, 1983)
    • real version for real matrices (Higham, 1987)
    • blocking algorithm for performance improvements (Deadman, Higham, and Ralha, 2013)
  • A^t for real matrix powers: Schur– Padé algorithm (Higham and Lin, 2013) (Fixed the algorithm for powers of a matrix. #21184)
  • Matrix unwinding function

Fréchet derivatives

  • Exponential: scaling and squaring algorithm (Al-Mohy and Higham, 2009)
  • Logarithm: inverse scaling and squaring algorithm (Al-Mohy, Higham, and Relton, 2013)
  • General matrix function: complex step algorithm (Al-Mohy and Higham, 2010) or using block 2 × 2 matrix formula.
  • A^t for real matrix powers: Schur– Padé algorithm (Higham and Lin, 2013)

Miscellaneous functions

@StefanKarpinski
Copy link
Member

It would be amazing to include all of these algorithms. Thanks for putting this issue together, @jiahao.

@jiahao
Copy link
Member Author

jiahao commented Feb 18, 2014

It's not clear to me what interface should be implemented for the very generic things like "function of a symmetric or Hermitian matrix".

We could define methods for elementary functions until the cows come home, but that is both tedious and namespace polluting (we'd need sinm, cosm, atanhm, etc. to disambiguate from the ordinary forms which broadcast over the elements) , and furthermore wouldn't scale well for linear combinations of functions.

Maybe we want something like applym, so that a call like applym(A->exp(A)+sin(2A)-cosh(4A), A) would evaluate the matrix expm(A)+sinm(2A)-coshm(4A)?

@StefanKarpinski
Copy link
Member

This is one of the reasons why having all these functions be vectorized is kind of shitty. I suspect that @JeffBezanson will vehemently agree. There could be a MatrixFunctions module that you have to import from to get the matrix variants with all the same names.

@jiahao
Copy link
Member Author

jiahao commented Feb 18, 2014

Sure, but that won't help with the fact that a complicated matrix function like A->expm(A)+sinm(2A)-coshm(4A) can be computed more quickly all at once than computing (expm(A))+(sinm(2A))-(coshm(4A)) by working out the individual elementary functions separately, since the underlying Schur-Parlett algorithm is essentially the same.

@StefanKarpinski
Copy link
Member

Yes, that's a very good point.

@stevengj
Copy link
Member

Trefethen's contour-integration method is also neat for mostly analytic functions (with known singularities), though I don't know how it compares to the state of the art (in performance and generality).

@jiahao
Copy link
Member Author

jiahao commented Feb 18, 2014

@stevengj it looks like that paper is Ref. 32 of the Higham and Deadman report, and isn't currently available in any major package (although it has Matlab code).

@simonbyrne
Copy link
Contributor

This seems like a good candidate for a GSOC project.

As far as an interface goes, I would suggest a macro

B = @matfun exp(A)+sin(2A)-cosh(4A)

A naive approach could simply replace the functions (e.g. exp -> expm), a more advanced approach could try to identify the common decompositions.

@GunnarFarneback
Copy link
Contributor

The item "[sqrtm] real version for real matrices (Higham, 1987)" is checked. I can't see that algorithm in base. I made an attempt on it about a year ago but concluded that the complexity just wasn't worth it. I've put up the code as a gist if someone wants to have a look.

@jiahao
Copy link
Member Author

jiahao commented Feb 18, 2014

Thanks for catching that. I must have ticked it by accident when writing up this issue.

@dlfivefifty
Copy link
Contributor

Just tried to call logm, but wasn't implemented. Had to go to Mathematica

@fabianlischka
Copy link
Contributor

Whoever tackles this issue might find this paper helpful for testing:
Higham, Deadman: Testing Matrix Function Algorithms Using Identities, March 2014

@StefanKarpinski
Copy link
Member

Great reference – thanks, @fabianlischka.

@mfasi
Copy link
Contributor

mfasi commented Jul 26, 2015

The logm function has been added by pull request #12217.

@ViralBShah
Copy link
Member

Also cc: @higham @alanedelman

@ararslan
Copy link
Member

ararslan commented May 5, 2016

Would sinm and cosm (matrix sine and cosine, respectively, see here) be useful? They're easily derived from expm. I can put together a PR if that sounds desirable.

@dlfivefifty
Copy link
Contributor

I think dropping the m is more consistent with the Taylor series definition of exp. For example,

exp(dual(1.,2.)) 

returns the dual corresponding to the Taylor series definition (which in this case is equivalent to expm([1. 2.; 0. 1.]).

That's ignoring the Matlab and Mathematica precedents.

@dlfivefifty
Copy link
Contributor

Whether or not exp(A) replaces expm(A), I think down the line exp.(A) should replace exp(A), as the current exp(A) is confusing for non-MATLAB users

@tkelman
Copy link
Contributor

tkelman commented Aug 12, 2016

@felixrehren
Copy link
Contributor

felixrehren commented Jan 11, 2017

Version 2 of the Deadman-Higham list has been out for a year and Julia is the same as it was first time. Let's get better before version 3 comes out. I'm going to look at Schur-Pade -- that together with the top-4 tasks "General function evaluations" in the OP would cover most of this. Anyone else want to join?

P.S. we already beat R in the DH list, with built-in Matlab and SciPy looking vulnerable as well!

@simonbyrne
Copy link
Contributor

simonbyrne commented Jan 11, 2017

Is the "Deadman-Higham ranking" our Weissman Score?

@dlfivefifty
Copy link
Contributor

Could there be a function matrixbroadcast(f,A) that calculates the matrix function f(A) for general f? This would be easy to implement for diagonalisable A, not sure about otherwise

@stevengj
Copy link
Member

stevengj commented Jan 12, 2017

Even for diagonalizable A, the possibility of nearly defective A would be an issue. Of course, it could be restricted to Hermitian A, but that case is so easy that we would hardly add much value by implementing a function for it in Base.

@alanedelman
Copy link
Contributor

There is a pretty good chance that matrix derivatives would work with ForwardDiff.jl, at least
if there is a corresponding generic linear algebra function. I've been pretty impressed that
one can get derivatives of powers and svd's and inverses, etc.

@ChrisRackauckas
Copy link
Member

Should this all be moved out to a MatrixFunctions.jl?

@felixrehren
Copy link
Contributor

Seems like a good idea -- I think everything on the list other than expm logm sqrtm is a clear candidate for moving out (e.g. I have a rootm function that takes arbitrary matrix pth roots -- probably doesn't fit in base, but could be nice somewhere). Whether expm logm sqrtm are worth keeping in base or can move out as well, I don't know.

@ararslan You did the great work moving SpecialFunctions.jl. What do you think about MatrixFunctions.jl? Other people that were involved there IIRC but are not in this thread: @musm, @andreasnoack

@ViralBShah
Copy link
Member

Should we close this, and perhaps there ought to be a better place for this in a new repo?

@ChrisRackauckas
Copy link
Member

If it's extending sqrt and log like exp in #23233, then it should be in Base since otherwise it's type piracy.

@dlfivefifty
Copy link
Contributor

we could do something like this:

for f in (:sqrt, ...)
   @eval $f(A::AbstractArray) = matrixfunction($f, A)
end
matrixfunction(f, A) = error(“Install MatrixFunctions.jl”)
end

Then MatrixFunctions.jl can safely override Base.matrixfunction(::typeof(sqrt), A)

@robzan8
Copy link

robzan8 commented May 1, 2018

I have implemented a couple of algorithms for computing matrix functions.
For dense matrix functions: Schur-Parlett with automatic differentiation done with TaylorSeries.jl.
For sparse matrix functions: the rational Krylov/Arnoldi method, with poles found automatically using AAA (the AAA algorithm for rational approximation of functions).

You can check them out here https://github.com/robzan8/MatFun.jl

@ararslan ararslan added the stdlib Julia's standard library label May 1, 2018
@sethaxen
Copy link
Contributor

I was planning to add the Fréchet derivatives for matrix exponential and logarithm to ChainRules to support pushforwards and pullbacks of exp and log. However, they duplicate a lot of code from LinearAlgebra's implementations. Given this issue, would it be better to contribute them directly to LinearAlgebra?

@andreasnoack
Copy link
Member

Possibly but isn't it possible to define these in terms of existing matrix functions?

@antoine-levitt
Copy link
Contributor

Unless I'm missing something, not really: differentials of matrix functions are tricky due to non-commutativity. @sethaxen it would be strange to have this live in LinearAlgebra, can you maybe refactor LinearAlgebra to expose the functions that you need and use them in ChainRules?

@dlfivefifty
Copy link
Contributor

I think @antoine-levitt is right. Looks like differentiating f(A(t)) using Taylor series is a mess. Cauchy integrals formula is a bit easier: if we differentiate

f(A(t)) =f(ζ) inv*I - A(t)) dζ

then we get

d/dt f(A(t)) =f(ζ) inv*I - A(t)) * dA/dt * inv*I - A(t))  dζ

which I don't believe can be reduced to matrix factorisations.

Really interested to know if anyone has a reference for differentiating matrix functions. Don't have a copy of Higham handy but nothing pops out of the table of contents.

@sethaxen
Copy link
Contributor

sethaxen commented Jan 4, 2021

Possibly but isn't it possible to define these in terms of existing matrix functions?

Sort of. For any matrix function f we can compute its Frechet derivative by calling f on a block matrix with a particular structure:

function frechet(f, A, ΔA)
    n = checksquare(A)
    Ablock = [A ΔA; zero(A) A]
    Yblock = f(Ablock)
    Y = Yblock[1:n,1:n]
    ΔY = Yblock[1:n,n+1:2n]
    return Y, ΔY
end

But this is ~8x the cost of f, whereas the algorithms by Higham in the OP are more efficient (I think ~3x or less).

@sethaxen it would be strange to have this live in LinearAlgebra

I agree, I only ask because given this issue it seemed there was (old) interest in them being in base, and the Frechet derivative of exp is included in scipy.

can you maybe refactor LinearAlgebra to expose the functions that you need and use them in ChainRules?

Perhaps. While the most efficient way to implement will be completely in parallel, we can get close by reusing all matrix products. So a nice refactor might be to move the work of exp! into a function that also returns all intermediate matrix products, which I could then reuse. But it would also be nice to reduce the number of allocations in exp!, in which case its Frechet derivative would need to be computed in parallel and duplicate a lot of code.

@antoine-levitt
Copy link
Contributor

Really interested to know if anyone has a reference for differentiating matrix functions. Don't have a copy of Higham handy but nothing pops out of the table of contents.

Basically continue what you were doing with the integral formula, expand in an eigenbasis and integrate the poles explicitly, and you get divided differences (f(lambda_n) - f(lambda_m)) / (lambda_n - lambda_m), with the convention (f(x)-f(y))/(x-y) = f'(x) when x==y. It's probably discussed in Higham somewhere, I think it's called Daleskii-Krein there (I know it from quantum mechanics, where it's known under the generic name of "perturbation theory"). It's a bit tricky to implement in the case of close eigenvalues for generic functions f because of the divided differences (what I've been doing in https://github.com/JuliaMolSim/DFTK.jl/blob/master/src/Smearing.jl#L44 is to use forward-diff to use f' and f'' for x and y close together, although that still doesn't get you to O(eps)). For exp/log you can use expm1 and log1p. All this is for normal matrices, @sethaxen I imagine the method you're looking at is more robust for non-normal?

It was suggested above to split matrix functions (more than exp/log/sqrt) to a separate MatrixFunctions.jl package; I think that would be a very natural place to put derivatives.

@dlfivefifty
Copy link
Contributor

@sethaxen's method is probably a bad idea for normal matrices as Ablock will not be normal.

@sethaxen
Copy link
Contributor

sethaxen commented Jan 4, 2021

Really interested to know if anyone has a reference for differentiating matrix functions. Don't have a copy of Higham handy but nothing pops out of the table of contents.

Section 3.2 of Higham's Functions of Matrices gives several approaches for computing Frechet derivative:

  • the block method given above (Eq 3.16)
  • Daleckii-Krein using the eigendecomposition for normal matrices (Corollary 3.12)
  • Using the Schur decomposition, if you know the Frechet derivative of f(T) for triangular T (Problem 3.2)
  • Computing the Frechet derivatives of the primitive functions of the implementation of f and composing with the chain rule, though this needs to be checked for accuracy.

@sethaxen I imagine the method you're looking at is more robust for non-normal?

For automatic differentiation, we indeed only want to define a rule if it works for all matrices, and the latter approach should be fine. In fact, the Frechet derivative of the matrix exponential by Higham is precisely exactly this, just checked for accuracy. I don't think we can autodiff exp with any AD (Zygote can't handle in-place modification, and ForwardDiff and ReverseDiff require types other than BlasFloat), so a rule is needed. The reverse-mode derivative (pullback) is the Frechet derivative applied to the adjoint of the input.

It was suggested above to split matrix functions (more than exp/log/sqrt) to a separate MatrixFunctions.jl package; I think that would be a very natural place to put derivatives.

The issue with that, raised above, is type piracy. I guess that could be worked around by re-appending the m suffix or using an interface like matfun(sin, A), but sin(A) is cleaner. For Hermitian, almost all of the matrix functions share a common form (using Daleckii-Krein), so it would be odd to split a subset of them into a separate package. The advantage though is such a package could provide multiple algorithms for computing each f and also bundle the ChainRules rules side-by-side, which is always preferable to putting them in the ChainRules package. There are also cases like ExponentialUtilities.exp! where the code for LinearAlgebra.exp! was almost exactly duplicated just to use a cache.

@antoine-levitt
Copy link
Contributor

Computing the Frechet derivatives of the primitive functions of the implementation of f and composing with the chain rule, though this needs to be checked for accuracy.

I see. In the applications I'm interested in the function f is often the Heaviside function, so that's not an option.

Regarding a separate package, clearly exp/sin/log/sqrt are special because they're in base, but there are other functions one might want to use, so a general matfun(f, A) is needed, and it would be natural to put the rules there. But indeed the question of where to put the rules for exp/sin/log/sqrt remains... I would say they belong to ChainRules, and code duplication is necessary in any case (as code reuse for the function itself and its derivatives looks hard from what you say).

@ViralBShah
Copy link
Member

This is best done in external packages. Moving to a discussion so that we have the list.

@JuliaLang JuliaLang locked and limited conversation to collaborators Jan 30, 2022
@ViralBShah ViralBShah converted this issue into discussion #43982 Jan 30, 2022

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
help wanted Indicates that a maintainer wants help on an issue or pull request linear algebra Linear algebra stdlib Julia's standard library
Projects
None yet
Development

No branches or pull requests