-
-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Taking matrix transposes seriously #408
Comments
The logical successor to a previous issue... 👍 |
This is not true at all, and not at all analogous to |
Unless I'm mistaken, the correct mathematical conjugation operation in that context is julia> v = rand(3) + rand(3)*im
3-element Array{Complex{Float64},1}:
0.0647959+0.289528im
0.420534+0.338313im
0.690841+0.150667im
julia> v'v
0.879291582684847 + 0.0im
julia> conj(v)*v
ERROR: DimensionMismatch("Cannot multiply two vectors")
Stacktrace:
[1] *(::Array{Complex{Float64},1}, ::Array{Complex{Float64},1}) at ./linalg/rowvector.jl:180 |
The problem with using a keyword for |
We need to fix the keyword performance problem in 1.0 anyway. |
@StefanKarpinski, you're mistaken. You can have complex conjugation in a vector space without having adjoints — adjoints are a concept that requires a Hilbert space etc., not just a complexified vector space. Furthermore, even when you do have a Hilbert space, the complex conjugate is distinct from the adjoint. e.g. the conjugate of a complex column vector in ℂⁿ is another complex vector, but the adjoint is a linear operator (a "row vector"). (Complex conjugation does not imply that you can multiply |
Deprecating vectorized |
https://en.wikipedia.org/wiki/Complexification#Complex_conjugation (This treatment is rather formal; but if you google "complex conjugate matrix" or "complex conjugate vector" you will find zillions of usages.) |
If mapping a complex vector to the conjugate vector (what In that case, should
|
For example, a commonplace usage of conjugation of vectors is in the analysis of eigenvalues of real matrices: the eigenvalues and eigenvectors come in complex-conjugate pairs. Now, suppose you have a block matrix represented by a 2d array |
(Note that the adjoint, even for a matrix, might be different from a conjugate-transpose, because the adjoint of a linear operator defined in the most general way depends also on the choice of inner product. There are lots of real applications where some kind of weighted inner product is appropriate, in which case the appropriate way to take the adjoint of a matrix changes! But I agree that, given a |
Correspondingly, there is also some difficulty in defining an algebraically reasonable Update: Oh, good: we don't define |
(OT: I'm already looking forward to "Taking 7-tensors seriously," the next installment in the highly-successful 6-part miniseries...) |
I'm not following this – the 2x2 matrix representation of complex numbers should behave exactly like having complex scalars as elements. I would think, e.g. that if we define m(z::Complex) = [z.re -z.im; z.im z.re] and we have an arbitrary complex vector conj(m.(v)) == m.(conj(v)) I would spell the example out more and make a comparison with |
Your Another way of putting it is that |
This proposal looks like it will produce a real improvement 👍 I'm thinking about the mathematical side:
|
@felixrehren, I don't think you would use a keyword argument to specify the inner product that induces the adjoint. I think you would just use a different type instead, the same as if you wanted to change the meaning of |
My preference would be a bit simpler:
I really don't see a use case for a non-recursive |
I personally feel that keeping this simple will be the most straightforward for users (and for implementation), and still be quite defensible on the linear algebra front. So far (most) of our linear algebra routines are pretty strongly rooted to standard array structures and the standard inner product. In each slot of your matrix or vector, you put an element of your "field". I will argue that for elements of fields, we care about Anyway, the 2x2 complex example feels a little bit of a red herring to me. The real cause of the recursive matrix behavior was as a shortcut to doing linear algebra on block matrices. Why don't we treat this special case with proper care, and simplify the underlying system? So my "simplified" suggestion would be:
I think these rules would be simple enough for users to embrace, and build upon. PS - @StefanKarpinski For practical reasons, a boolean keyword argument for recursion won't work with view's for transposition. The type of the view may depend on the boolean value. |
Also, I mentioned elsewhere, but I'll add it here for completeness: recursive transpose views have the annoying property that the element type might change compared to the array it is wrapping. E.g. NB: there's also nothing stopping users from defining |
@stevengj – you point about "complex" 2x2 matrices being a formally real vector space rather than a complex vector space makes sense to me, but then that point calls into question for me the original motivation for recursive adjoint, which leads me to wonder if @andyferris's proposal wouldn't be better (non-recursive transpose and adjoint). I guess the fact that both the complex 2x2 example and the block matrix representation "want" adjoint to be recursive is suggestive, but given your comments about that first example, I have to wonder if there aren't other cases where non-recursive adjoint is more correct/convenient. |
If the adjoint isn't recursive, it isn't an adjoint. It's just wrong. |
Can you give a little bit more of a justification for that when you've got a moment? |
The adjoint of a vector must be a linear operator mapping it to a scalar. That is, If |
OK, I think I follow this. We also have recursive inner products: julia> norm([[3,4]])
5.0
julia> dot([[3,4]], [[3,4]])
25 Clearly, the "adjoint" or "dual" or whatever should be similarly recursive. I think the central question then is, do we require that The alternative is to say that |
There's one and only one reason that we have a special name and special symbols for adjoints in linear algebra, and that is the relationship to inner products. That's the reason why "conjugate transpose" is a important operation and "conjugate rotation of matrices by 90 degrees" is not. There's no point in having something that is "just an array operation that swaps rows and columns and conjugates" if it is not connected to dot products. |
One can define a similar relationship between |
Triage thinks @Sacha0 should move ahead with proposal 2 so we can try it out. |
I strongly agree with @ttparker that recursive adjoint is a choice, and not the only mathematically consistent option. For example, we could simply state: 1 - to (and similarly for This would probably be the assumption many people would make coming from other libraries, and having such simple definition of basis vectors, rank, etc, helps keep implementation simple. (Many would probably say that numerical linear algebra is feasible to implement on a computer precisely because we have a nice basis set to work in.) Our current approach is more like: 2 - to and similarly for (block) matrices. This is wildly more generalized than the linear algebra implementations in MATLAB, numpy, eigen, etc, and it's a reflection of Julia's powerful type/dispatch system that this is even feasible. The overarching reason I see option 2 as being desirable is again that Julia's type/dispatch system allows us to have a much broader goal, which vaguely goes like: 3 - In Which is a really cool goal (surely much beyond any other programming language/library that I'm aware of), completely motivates recursive |
Cheers, let's do! I look forward to chatting further offline :). Best! |
Nice summary Andy! :) |
Fully agreed Andy, at least for However, one final plea for a non-recursive
So what are the disadvantages. I see none. I still stand by my point that really nobody is using |
I can say that at least Mathematica (which one would expect to have devoted considerable thought to this) does not do recursive transpose:
EDIT: Ooops, this was also commented above, sorry |
So I'm confused. There seemed to be pretty solid consensus that So now the consensus seems to be that the only real changes to So the community consensus seems to have changed almost 180 degrees in a very short time (around the time of @Sacha0's post #408). Was Sacha's post so eloquent that it just changed everyone's mind? (That's fine if so, I just want to understand why we seem to moving forward on a path that just a few days ago we'd all seemed to agree was the wrong one.) |
I forget if anyone's suggested this, but could we just make |
If only I were so eloquent 😄. What you are seeing is that consensus had not actually formed. Rather, (1) participants that favor the status quo, but had withdrawn from discussion due to attrition, returned to express an opinion; and (2) other parties that had not considered what moving away from the status quo would entail in practice (and how that might play with release considerations) formed a stronger opinion in favor of the status quo and expressed that opinion. Please consider that this discussion has been ongoing in one form or another on github since 2014, and likely earlier offline. For long-term participants, such discussions become exhausting and cyclic. There being meaningful work to do other than engage in this discussion --- like writing code, which is more enjoyable --- the result is attrition among those long-term participants. Consequently, the conversation appears lopsided during one period or another. Personally, I am about at that attrition threshold, so I am going to focus on writing code now rather than continue engaging in this discussion. Thanks all and best! :) |
I'll cast a small vote in favor of non-recursive transpose and ctranspose for AbstractArrays, with both being recursive on AbstractArray{T} where T<:AbstractArray. I agree that recursive behavior is 'correct' in some cases, and I see the question as how do we achieve the correct behavior with the least amount of surprise for those using and developing packages. As an example of something that would be hard to support under the new proposals: normal, transpose, ctranspose, and conj arrays all should be able to have views (or lazy evaluation) which interop with ReshapedArray and SubArray views. (I'm agnostic about whether these produce views by default or only when using Finally, for consistency with higher dimensional Arrays and reshapes, I believe the appropriate generalization of transpose and ctranspose is to reverse all of the dimensions, i.e. Cheers! |
I very much appreciate the people actually doing the work. What has been discussed at way too much length is vector adjoints/transpose (but never the recursive aspect of it), until @andyferris stepped up and implemented this, and it works wonderfully well. Similarly, I also greatly appreciate the ongoing redesign of array constructors. Thumbs up for all of that. That being said, matrix transpose and adjoint/ctranspose never got much discussion, especially not the recursive aspect of it, which was almost silently introduced in JuliaLang/julia#7244 with as single motivation block matrices. Various reasons and motivations for recursive adjoint have been given (after the facts), and most people can agree on that being a good (but not the only) choice. Transpose however is lacking even a single motivation or actual use case. |
There's a few separate things going on in these discussions, and it happens right now that we need a plan that can be implemented quickly.
If we think about v1.0 as somewhat stabilizing the language, then in some senses the biggest priority to make a change in behavior is the third one. I'd say: the language (including parser) should be most stable, followed by |
Spot on Andy! :) |
I think the only thing left to do here is make |
Can this be closed now? |
Next up: "Taking scalar transposes seriously" |
But seriously though, can we have a good interface for specifying the different 3D transposes and tensor multiplications which are used in PDE solvers? Kind of serious, but I'm not sure if I could handle being the OP to the next iteration of this madness. |
no :) |
Definitely seems like a good subject for a package. |
Does TensorOperations.jl not do what you need here? (Note that at this level "a good interface" means something like a tensor network diagram, which is slightly challenging to write in code any more succinctly than the syntax of TensorOperations). |
Yeah, TensorOperations.jl looks good. I was slightly kidding, but I got what I needed out of it 👍 . |
Currently,
transpose
is recursive. This is pretty unintuitive and leads to this unfortunateness:For some time now, we've been telling people to do
permutedims(A, (2,1))
instead. But I think we all know, deep down, that's terrible. How did we get here? Well, it's pretty well-understood that one wants thectranspose
or "adjoint" of a matrix of matrices to be recursive. A motivating example is that you can represent complex numbers using 2x2 matrices, in which case the "conjugate" of each "element" (actually a matrix), is its adjoint as a matrix – in other words, if ctranspose is recursive, then everything works. This is just an example but it generalizes.The reasoning seems to have been the following:
ctranspose
should be recursivectranspose == conj ∘ transpose == conj ∘ transpose
transpose
therefore should also be recursiveI think there are a few problems here:
ctranspose == conj ∘ transpose == conj ∘ transpose
has to hold, although the name makes this seem almost unavoidable.conj
operating elementwise on arrays is kind of an unfortunate holdover from Matlab and isn't really a mathematically justifiable operations, much asexp
operating elementwise isn't really mathematically sound and whatexpm
does would be a better definition.ctranspose
should be recursive; in the absence of conjugation, there's no good reason for transpose to be recursive.Accordingly, I would propose the following changes to remedy the situation:
ctranspose
(aka'
) toadjoint
– that is really what this operation does, and it frees us from the implication that it must be equivalent toconj ∘ transpose
.conj(A)
on arrays in favor ofconj.(A)
.recur::Bool=true
keyword argument toadjoint
(néectranspose
) indicating whether it should call itself recursively. By default, it does.recur::Bool=false
keyword argument totranspose
indicating whether it should call itself recursively. By default, it does not.At the very minimum, this would let us write the following:
Whether or not we could shorten that further to
A'
depends on what we want to do withconj
andadjoint
of non-numbers (or more specifically, non-real, non-complex values).[This issue is the second of an ω₁-part series.]
The text was updated successfully, but these errors were encountered: