Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Taking matrix transposes seriously #408

Closed
StefanKarpinski opened this issue Mar 10, 2017 · 141 comments
Closed

Taking matrix transposes seriously #408

StefanKarpinski opened this issue Mar 10, 2017 · 141 comments
Assignees
Labels
breaking This change will break code needs decision A decision on this change is needed

Comments

@StefanKarpinski
Copy link
Member

StefanKarpinski commented Mar 10, 2017

Currently, transpose is recursive. This is pretty unintuitive and leads to this unfortunateness:

julia> A = [randstring(3) for i=1:3, j=1:4]
3×4 Array{String,2}:
 "J00"  "oaT"  "JGS"  "Gjs"
 "Ad9"  "vkM"  "QAF"  "UBF"
 "RSa"  "znD"  "WxF"  "0kV"

julia> A.'
ERROR: MethodError: no method matching transpose(::String)
Closest candidates are:
  transpose(::BitArray{2}) at linalg/bitarray.jl:265
  transpose(::Number) at number.jl:100
  transpose(::RowVector{T,CV} where CV<:(ConjArray{T,1,V} where V<:(AbstractArray{T,1} where T) where T) where T) at linalg/rowvector.jl:80
  ...
Stacktrace:
 [1] transpose_f!(::Base.#transpose, ::Array{String,2}, ::Array{String,2}) at ./linalg/transpose.jl:54
 [2] transpose(::Array{String,2}) at ./linalg/transpose.jl:121

For some time now, we've been telling people to do permutedims(A, (2,1)) instead. But I think we all know, deep down, that's terrible. How did we get here? Well, it's pretty well-understood that one wants the ctranspose or "adjoint" of a matrix of matrices to be recursive. A motivating example is that you can represent complex numbers using 2x2 matrices, in which case the "conjugate" of each "element" (actually a matrix), is its adjoint as a matrix – in other words, if ctranspose is recursive, then everything works. This is just an example but it generalizes.

The reasoning seems to have been the following:

  1. ctranspose should be recursive
  2. ctranspose == conj ∘ transpose == conj ∘ transpose
  3. transpose therefore should also be recursive

I think there are a few problems here:

  • There's no reason that ctranspose == conj ∘ transpose == conj ∘ transpose has to hold, although the name makes this seem almost unavoidable.
  • The behavior of conj operating elementwise on arrays is kind of an unfortunate holdover from Matlab and isn't really a mathematically justifiable operations, much as exp operating elementwise isn't really mathematically sound and what expm does would be a better definition.
  • It is actually taking the conjugate (i.e. adjoint) of each element that implies that ctranspose should be recursive; in the absence of conjugation, there's no good reason for transpose to be recursive.

Accordingly, I would propose the following changes to remedy the situation:

  1. Rename ctranspose (aka ') to adjoint – that is really what this operation does, and it frees us from the implication that it must be equivalent to conj ∘ transpose.
  2. Deprecate vectorized conj(A) on arrays in favor of conj.(A).
  3. Add a recur::Bool=true keyword argument to adjoint (née ctranspose) indicating whether it should call itself recursively. By default, it does.
  4. Add a recur::Bool=false keyword argument to transpose indicating whether it should call itself recursively. By default, it does not.

At the very minimum, this would let us write the following:

julia> A.'
4×3 Array{String,2}:
 "J00"  "Ad9"  "RSa"
 "oaT"  "vkM"  "znD"
 "JGS"  "QAF"  "WxF"
 "Gjs"  "UBF"  "0kV"

Whether or not we could shorten that further to A' depends on what we want to do with conj and adjoint of non-numbers (or more specifically, non-real, non-complex values).

[This issue is the second of an ω₁-part series.]

@jiahao
Copy link
Member

jiahao commented Mar 10, 2017

The logical successor to a previous issue... 👍

@stevengj
Copy link
Member

stevengj commented Mar 10, 2017

The behavior of conj operating elementwise on arrays is kind of an unfortunate holdover from Matlab and isn't really a mathematically justifiable operations

This is not true at all, and not at all analogous to exp. Complex vector spaces, and conjugations thereof, are a perfectly well-established mathematical concept. See also JuliaLang/julia#19996 (comment)

@StefanKarpinski
Copy link
Member Author

StefanKarpinski commented Mar 10, 2017

Complex vector spaces, and conjugation thereof, are a perfectly well-established mathematical concept.

Unless I'm mistaken, the correct mathematical conjugation operation in that context is ctranspose rather than conj (which was precisely my point):

julia> v = rand(3) + rand(3)*im
3-element Array{Complex{Float64},1}:
 0.0647959+0.289528im
  0.420534+0.338313im
  0.690841+0.150667im

julia> v'v
0.879291582684847 + 0.0im

julia> conj(v)*v
ERROR: DimensionMismatch("Cannot multiply two vectors")
Stacktrace:
 [1] *(::Array{Complex{Float64},1}, ::Array{Complex{Float64},1}) at ./linalg/rowvector.jl:180

@stevengj
Copy link
Member

The problem with using a keyword for recur is that, last I checked, there was a big performance penalty for using keywords, which is a problem if you end up recursively calling these functions with a base case that ends up on scalars.

@StefanKarpinski
Copy link
Member Author

We need to fix the keyword performance problem in 1.0 anyway.

@stevengj
Copy link
Member

stevengj commented Mar 10, 2017

@StefanKarpinski, you're mistaken. You can have complex conjugation in a vector space without having adjoints — adjoints are a concept that requires a Hilbert space etc., not just a complexified vector space.

Furthermore, even when you do have a Hilbert space, the complex conjugate is distinct from the adjoint. e.g. the conjugate of a complex column vector in ℂⁿ is another complex vector, but the adjoint is a linear operator (a "row vector").

(Complex conjugation does not imply that you can multiply conj(v)*v!)

@StefanKarpinski
Copy link
Member Author

Deprecating vectorized conj is independent of the rest of the proposal. Can you provide some references for the definition of complex conjugation in a vector space?

@stevengj
Copy link
Member

stevengj commented Mar 10, 2017

https://en.wikipedia.org/wiki/Complexification#Complex_conjugation

(This treatment is rather formal; but if you google "complex conjugate matrix" or "complex conjugate vector" you will find zillions of usages.)

@StefanKarpinski
Copy link
Member Author

If mapping a complex vector to the conjugate vector (what conj does now) is an important operation distinct from mapping it to is conjugate covector (what ' does), then we can certainly keep conj to express that operation on vectors (and matrices, and higher arrays, I guess). That provides a distinction between conj and adjoint since they would agree on scalars but behave differently on arrays.

In that case, should conj(A) call conj on each element or call adjoint? The representation of complex numbers as 2x2 matrices example would suggest that conj(A) should actually call adjoint on each element, rather than calling conj. That would make adjoint, conj and conj. all different operations:

  1. adjoint: swap i and j indices and recursively map adjoint over elements.
  2. conj: map adjoint over elements.
  3. conj.: map conj over elements.

@stevengj
Copy link
Member

stevengj commented Mar 10, 2017

conj(A) should call conj on each element. If you are representing complex numbers by 2x2 matrices, you have a different complexified vector space.

For example, a commonplace usage of conjugation of vectors is in the analysis of eigenvalues of real matrices: the eigenvalues and eigenvectors come in complex-conjugate pairs. Now, suppose you have a block matrix represented by a 2d array A of real 2x2 matrices, acting on blocked vectors v represented by 1d arrays of 2-component vectors. For any eigenvalue λ of A with eigenvector v, we expect a second eigenvalue conj(λ) with eigenvector conj(v). This won't work if conj calls adjoint recursively.

@stevengj
Copy link
Member

stevengj commented Mar 10, 2017

(Note that the adjoint, even for a matrix, might be different from a conjugate-transpose, because the adjoint of a linear operator defined in the most general way depends also on the choice of inner product. There are lots of real applications where some kind of weighted inner product is appropriate, in which case the appropriate way to take the adjoint of a matrix changes! But I agree that, given a Matrix, we should take the adjoint corresponding to the standard inner product given by dot(::Vector,::Vector). However, it's entirely possible that some AbstractMatrix (or other linear-operator) types would want to override adjoint to do something different.)

@stevengj
Copy link
Member

stevengj commented Mar 10, 2017

Correspondingly, there is also some difficulty in defining an algebraically reasonable adjoint(A) for e.g. 3d arrays, because we haven't defined 3d arrays as linear operators (e.g. there is no built-in array3d * array2d operation). Maybe adjoint should only be defined by default for scalars, 1d, and 2d arrays?

Update: Oh, good: we don't define ctranspose now for 3d arrays either. Carry on.

@timholy
Copy link
Member

timholy commented Mar 10, 2017

(OT: I'm already looking forward to "Taking 7-tensors seriously," the next installment in the highly-successful 6-part miniseries...)

@StefanKarpinski
Copy link
Member Author

StefanKarpinski commented Mar 10, 2017

If you are representing complex numbers by 2x2 matrices, you have a different complexified vector space.

I'm not following this – the 2x2 matrix representation of complex numbers should behave exactly like having complex scalars as elements. I would think, e.g. that if we define

m(z::Complex) = [z.re -z.im; z.im z.re]

and we have an arbitrary complex vector v then we'd want this identity to hold:

conj(m.(v)) == m.(conj(v))

I would spell the example out more and make a comparison with ' which is supposed to already commute with m but I can't because of #409, which accidentally breaks that commutation. Once that bug is fixed then m.(v)' == m.(v') will hold, but if I'm understanding you correctly, @stevengj, then conj(m.(v)) == m.(conj(v)) should not?

@stevengj
Copy link
Member

stevengj commented Mar 11, 2017

conj(m(z)) should == m(z).

Your m(z) is an isomorphism to complex numbers under addition and multiplication, but it is not the same object in other ways for linear algebra, and we shouldn't pretend it is for conj. For example, if z is a complex scalar, then eigvals(z) == [z], but eigvals(m(z)) == [z, conj(z)]. When the m(z) trick is extended to complex matrices, this doubling of the spectrum has tricky consequences for e.g. iterative methods (see e.g. this paper).

Another way of putting it is that z is already a complex vector space (a vector space over the complex numbers), but the 2x2 real matrix m(z) is a vector space over the real numbers (you can multiply m(z) by a real number) … if you "complexify" m(z) by multiplying it by a complex number e.g. m(z) * (2+3im) (not m(z) * m(2+3im)) then you get a different complex vector space not isomorphic anymore to the original complex vector space z.

@felixrehren
Copy link

felixrehren commented Mar 11, 2017

This proposal looks like it will produce a real improvement 👍 I'm thinking about the mathematical side:

  • as I understand, conjugation is an action on the ring (read: numbers) which provide the coefficients for our vector space/vectors. This action induces another action on the vector space (and on its mappings, read:matrices), also called conjugation, by the linearity of the construction of the vector space over the ring. Hence conjugation necessarily works by elementwise application to the coefficients. For Julia, that means that at heart for a vector v, conj(v) = conj.(v), and it's a design choice whether conj has methods also for arrays or only for scalars, both of which seem ok? (Steven's point about the complex-numbers-as-2x2-matrices example is that this is a vector space whose coefficients are formally/actually real, so conjugation has no effect here.)
  • adjoint has an algebraic meaning that's fundamental in lots of important domains intimately involving linear algebra (see 1 and 2 and 3, all with big applications in physics too). If adjoint is added, it should allow keyword arguments for the action under consideration -- in a vector spaces/functional analysis, this action is typically induced by the inner product, hence the keyword argument can be the form instead. Whatever is designed for Julia's transposes should avoid conflicting with these applications, so I think adjoint is a bit highly-charged?

@stevengj
Copy link
Member

@felixrehren, I don't think you would use a keyword argument to specify the inner product that induces the adjoint. I think you would just use a different type instead, the same as if you wanted to change the meaning of dot for a vector.

@stevengj
Copy link
Member

stevengj commented Mar 11, 2017

My preference would be a bit simpler:

  • Keep conj as-is (elementwise).
  • Make a' correspond to adjoint(a), and make it recursive always (and hence fail for arrays of strings etc where adjoint is not defined for the elements).
  • Make a.' correspond to transpose(a), and make it never recursive (just a permutedims), and hence ignore the type of the elements.

I really don't see a use case for a non-recursive adjoint. As a stretch, I can sort of imagine use-cases for a recursive transpose, but in any application where you have to change a.' to transpose(a, recur=true) it seems just as easy to call another function transposerecursive(a). We could define transposerecursive in Base, but I think the need for this will be so rare that we should wait to see if it actually comes up.

@JeffBezanson JeffBezanson added the breaking This change will break code label Mar 12, 2017
@andyferris
Copy link
Member

andyferris commented Mar 12, 2017

I personally feel that keeping this simple will be the most straightforward for users (and for implementation), and still be quite defensible on the linear algebra front.

So far (most) of our linear algebra routines are pretty strongly rooted to standard array structures and the standard inner product. In each slot of your matrix or vector, you put an element of your "field". I will argue that for elements of fields, we care about +, *, etc, and conj, but not transpose. If your field was a special complex representation which looks a bit like a 2x2 real matrix but changes under conj, then that's OK - it's a property of conj. If it's just a 2x2 real AbstractMatrix that doesn't change under conj, then arguably such elements of a matrix shouldn't change under the adjoint (@stevengj said it better than I can describe - there might be an isomorphism to complex numbers but that doesn't mean it behaves the same in all ways).

Anyway, the 2x2 complex example feels a little bit of a red herring to me. The real cause of the recursive matrix behavior was as a shortcut to doing linear algebra on block matrices. Why don't we treat this special case with proper care, and simplify the underlying system?

So my "simplified" suggestion would be:

  • Keep conj as-is (or make it a view for AbstractArrays)
  • Make a' a non-recursive view such that (a')[i,j] == conj(a[j,i])
  • Make a.' a non-recursive view such that (a.')[i,j] == a[j,i]
  • Introduce a BlockArray type for handling block-matrices and so-on (in Base, or possibly in a well-supported package). Arguably, this would be much more powerful and flexible than a Matrix{Matrix} for this purpose, but equally efficient.

I think these rules would be simple enough for users to embrace, and build upon.

PS - @StefanKarpinski For practical reasons, a boolean keyword argument for recursion won't work with view's for transposition. The type of the view may depend on the boolean value.

@andyferris
Copy link
Member

andyferris commented Mar 12, 2017

Also, I mentioned elsewhere, but I'll add it here for completeness: recursive transpose views have the annoying property that the element type might change compared to the array it is wrapping. E.g. transpose(Vector{Vector}) -> RowVector{RowVector}. I haven't thought of a way of getting this element type for RowVector without a run-time penalty or by invoking inference to compute the output type. I'm guessing the current behavior (invoking inference) is undesirable from a language standpoint.

NB: there's also nothing stopping users from defining conj to return a different type - so the ConjArray view also suffers from this problem, whether transposition is recursive or not.

@StefanKarpinski
Copy link
Member Author

@stevengj – you point about "complex" 2x2 matrices being a formally real vector space rather than a complex vector space makes sense to me, but then that point calls into question for me the original motivation for recursive adjoint, which leads me to wonder if @andyferris's proposal wouldn't be better (non-recursive transpose and adjoint). I guess the fact that both the complex 2x2 example and the block matrix representation "want" adjoint to be recursive is suggestive, but given your comments about that first example, I have to wonder if there aren't other cases where non-recursive adjoint is more correct/convenient.

@stevengj
Copy link
Member

stevengj commented Mar 12, 2017

If the adjoint isn't recursive, it isn't an adjoint. It's just wrong.

@StefanKarpinski
Copy link
Member Author

Can you give a little bit more of a justification for that when you've got a moment?

@stevengj
Copy link
Member

The adjoint of a vector must be a linear operator mapping it to a scalar. That is, a'*a must be a scalar if a is a vector. And this induces a corresponding adjoint on matrices, since the defining property is a'*A*a == (A'*a)'*a.

If a is a vector of vectors, this implies that a' == adjoint(a) must be recursive.

@andyferris
Copy link
Member

OK, I think I follow this.

We also have recursive inner products:

julia> norm([[3,4]])
5.0

julia> dot([[3,4]], [[3,4]])
25

Clearly, the "adjoint" or "dual" or whatever should be similarly recursive.

I think the central question then is, do we require that a' * b == dot(a,b) for all vectors a, b?

The alternative is to say that ' doesn't necessarily return the adjoint - it is just an array operation that transposes the elements and passes them through conj. This just happens to be the adjoint for the real or complex elements.

@stevengj
Copy link
Member

There's one and only one reason that we have a special name and special symbols for adjoints in linear algebra, and that is the relationship to inner products. That's the reason why "conjugate transpose" is a important operation and "conjugate rotation of matrices by 90 degrees" is not. There's no point in having something that is "just an array operation that swaps rows and columns and conjugates" if it is not connected to dot products.

@stevengj
Copy link
Member

stevengj commented Mar 13, 2017

One can define a similar relationship between transpose and an unconjugated "dot product", which would argue for a recursive transpose. However, unconjugated "dot products" for complex data, which aren't really inner products at all, don't show up nearly as often as true inner products — they arise from bi-orthogonality relations when when an operator is written in complex-symmetric (not Hermitian) form — and don't even have a built-in Julia function or a common terminology in linear algebra, whereas wanting to swap rows and columns of arbitrary non-numeric arrays seems far more common, especially with broadcasting. This is why I can support making transpose non-recursive while leaving adjoint recursive.

@JeffBezanson
Copy link
Member

Triage thinks @Sacha0 should move ahead with proposal 2 so we can try it out.

@andyferris
Copy link
Member

andyferris commented Nov 30, 2017

I strongly agree with @ttparker that recursive adjoint is a choice, and not the only mathematically consistent option. For example, we could simply state:

1 - to LinAlg, an AbstractVector v is a length(v)-dimensional vector with (scalar) basis weights v[1], v[2], ..., v[length(v)].

(and similarly for AbstractMatrix).

This would probably be the assumption many people would make coming from other libraries, and having such simple definition of basis vectors, rank, etc, helps keep implementation simple. (Many would probably say that numerical linear algebra is feasible to implement on a computer precisely because we have a nice basis set to work in.)

Our current approach is more like:

2 - to LinAlg, an AbstractVector v is a direct sum of length(v) abstract vectors. We also include enough definitions on scalar types like Number so that to LinAlg they are valid one-dimensional linear operators / vectors.

and similarly for (block) matrices. This is wildly more generalized than the linear algebra implementations in MATLAB, numpy, eigen, etc, and it's a reflection of Julia's powerful type/dispatch system that this is even feasible.

The overarching reason I see option 2 as being desirable is again that Julia's type/dispatch system allows us to have a much broader goal, which vaguely goes like:

3 - In LinAlg, we are attempting to create a generic linear algebra interface that works for objects which satisfy linearity (etc) under +, *, conj (etc), treating such objects as linear operators / members of a Hilbert space / whatever as appropriate.

Which is a really cool goal (surely much beyond any other programming language/library that I'm aware of), completely motivates recursive adjoint and 2 (because +, * and conj are themselves recursive) and is why @Sacha0's proposal 2 and the triage decision is a good choice :)

@Sacha0
Copy link
Member

Sacha0 commented Dec 1, 2017

I'd really be happy to discuss further about this, but I guess we better do this elsewhere. But really, contact me as I am actually very interested to learn more about this.

Cheers, let's do! I look forward to chatting further offline :). Best!

@Sacha0
Copy link
Member

Sacha0 commented Dec 1, 2017

Nice summary Andy! :)

@Jutho
Copy link
Contributor

Jutho commented Dec 1, 2017

Fully agreed Andy, at least for adjoint (which was the topic of your comment).

However, one final plea for a non-recursive transpose, before I forever hold my peace (hopefully).
I see a non-recursive transpose having the following advantages:

  • It can be used by all people working with matrices, even containing non-numerical data. That's also how they will likely know this operation and look for it, from other languages and from basic math extrapolated to their non-numerical use case

  • No need to write extra code to have the lazy flip type or PermutedDimsArray interact with LinAlg. What if I have a numerical matrix which I flipped instead of transposed; will I still be able to multiply it with other matrices (preferably using BLAS)?

  • With a non-recursive transpose and a recursive adjoint, we can easily have a non-recursive adjoint as conj(transpose(a)) and a recursive transpose conj(adjoint(a)). And still everything will interact nicely with LinAlg.

So what are the disadvantages. I see none. I still stand by my point that really nobody is using transpose in its mathematical sense. But instead of trying to argue any further, can anybody give me an concrete example where a recursive transpose is needed or useful, and where really it is a mathematical tanspose? This excludes any example where you actually intended to use adjoint but are just having real numbers. So an application where there is a mathematical reason to transpose a vector or matrix filled with more matrices which are itself complex valued.

@pablosanjose
Copy link
Contributor

pablosanjose commented Dec 1, 2017

I can say that at least Mathematica (which one would expect to have devoted considerable thought to this) does not do recursive transpose:

A = Array[1 &, {2, 3, 4, 5}];
Dimensions[A]  # returns {2, 3, 4, 5}
Dimensions[Transpose[A]] # returns {3, 2, 4, 5}

EDIT: Ooops, this was also commented above, sorry

@ttparker
Copy link

ttparker commented Dec 1, 2017

So I'm confused. There seemed to be pretty solid consensus that transpose should be made non-recursive - e.g. #408, #408, #408, #408, and JuliaLang/julia#23424. Then @Sacha0 gave two proposals, one of which would leave transpose recursive but introduce a non-recursive flip function, which got strong support despite (as far as I can tell) not really having been raised as a possibility before. Then @JeffBezanson suggested that we don't need flip after all if we give permutedims a default second argument, which also got strong support.

So now the consensus seems to be that the only real changes to transpose should be "behind the scenes": regarding special lowering and lazy vs. eager evaluation, which the typical end user probably won't know or care about. The only really visible changes are essentially just spelling changes (depreciating .' and giving permutedims a default second argument).

So the community consensus seems to have changed almost 180 degrees in a very short time (around the time of @Sacha0's post #408). Was Sacha's post so eloquent that it just changed everyone's mind? (That's fine if so, I just want to understand why we seem to moving forward on a path that just a few days ago we'd all seemed to agree was the wrong one.)

@ttparker
Copy link

ttparker commented Dec 1, 2017

I forget if anyone's suggested this, but could we just make transpose(::AbstractMatrix{AbstractMatrix}) (and possibly transpose(::AbstractMatrix{AbstractVector}) as well) recursive and transpose non-recursive otherwise? That seems like it would cover all the bases and I can't think of any other use case where you'd want tranpose to be recursive.

@Sacha0
Copy link
Member

Sacha0 commented Dec 1, 2017

So the community consensus seems to have changed almost 180 degrees in a very short time (around the time of @Sacha0's post #408 (comment)). Was Sacha's post so eloquent that it just changed everyone's mind? (That's fine if so, I just want to understand why we seem to moving forward on a path that just a few days ago we'd all seemed to agree was the wrong one.)

If only I were so eloquent 😄. What you are seeing is that consensus had not actually formed. Rather, (1) participants that favor the status quo, but had withdrawn from discussion due to attrition, returned to express an opinion; and (2) other parties that had not considered what moving away from the status quo would entail in practice (and how that might play with release considerations) formed a stronger opinion in favor of the status quo and expressed that opinion.

Please consider that this discussion has been ongoing in one form or another on github since 2014, and likely earlier offline. For long-term participants, such discussions become exhausting and cyclic. There being meaningful work to do other than engage in this discussion --- like writing code, which is more enjoyable --- the result is attrition among those long-term participants. Consequently, the conversation appears lopsided during one period or another. Personally, I am about at that attrition threshold, so I am going to focus on writing code now rather than continue engaging in this discussion. Thanks all and best! :)

@braydenware
Copy link

braydenware commented Dec 1, 2017

I'll cast a small vote in favor of non-recursive transpose and ctranspose for AbstractArrays, with both being recursive on AbstractArray{T} where T<:AbstractArray.

I agree that recursive behavior is 'correct' in some cases, and I see the question as how do we achieve the correct behavior with the least amount of surprise for those using and developing packages.
In this proprosal, recursive transpose behavior for custom types is opt-in: you opt-in by making your type an AbstractArray or by defining the appropriate method
Base.transpose(AbstractArray{MyType}) or Base.transpose(AbstractArray{T}) where T<: MyAbstractType.
I think the duck typing strategy for recursive transposes (just recursing without asking) produces a few surprises as documented above. If you introduce distinct ctranspose and adjoint, or more complicated proposals like conjadjoint and flip, users will encounter these and try to use them, and package maintainers will try to support them all.

As an example of something that would be hard to support under the new proposals: normal, transpose, ctranspose, and conj arrays all should be able to have views (or lazy evaluation) which interop with ReshapedArray and SubArray views. (I'm agnostic about whether these produce views by default or only when using @view. ) This connects to the work on the A*_mul_B* lowering and on lower level BLAS calls with flags 'N', 'T', and 'C' for dense arrays, as has been noted elsewhere. These would be easier to reason about if they treated normal, transpose, ctranspose, and conj
on equal footing. Note that BLAS itself only supports 'N' for normal, 'T' for transpose, and 'C' for ctranspose and has no flag for conj, which I think is a mistake.

Finally, for consistency with higher dimensional Arrays and reshapes, I believe the appropriate generalization of transpose and ctranspose is to reverse all of the dimensions, i.e.
transpose(A::Array{T, 3}) = permutedims(A, (3, 2, 1)).

Cheers!

@Jutho
Copy link
Contributor

Jutho commented Dec 1, 2017

I very much appreciate the people actually doing the work. What has been discussed at way too much length is vector adjoints/transpose (but never the recursive aspect of it), until @andyferris stepped up and implemented this, and it works wonderfully well. Similarly, I also greatly appreciate the ongoing redesign of array constructors. Thumbs up for all of that.

That being said, matrix transpose and adjoint/ctranspose never got much discussion, especially not the recursive aspect of it, which was almost silently introduced in JuliaLang/julia#7244 with as single motivation block matrices. Various reasons and motivations for recursive adjoint have been given (after the facts), and most people can agree on that being a good (but not the only) choice. Transpose however is lacking even a single motivation or actual use case.

@andyferris
Copy link
Member

There's a few separate things going on in these discussions, and it happens right now that we need a plan that can be implemented quickly.

  • We've discussed whether supporting block matrices (and more exotic structures) is worthwhile in LinAlg. Implementation choices are: no recursive stuff at all (except +, * and conj because that's the nature of generic functions in Julia), recursive everything (the status quo), or trying for some kind of type-check or trait for whether an element should do recursive linear algebra or be treated as scalar.
  • We want a nice way for users to permute the dimensions of a 2D array of data. We've got non-recursive transpose, flip, shortened permutedims syntax (that PR was submitted first purely because it is the fewest characters to implement, and probably makes sense even if we do something else as well), some kind of type-check or trait for whether an element should do recursive transpose (perhaps even reintroducing transpose(x::Any) = x...).
  • The Julia parser has strange behavior like x' * y -> Ac_mul_B(x, y) which is a bit of a wart, which ideally won't exist in v1.0. This hasn't been seen as feasible until we can support fast BLAS (no extra copies) without it, thus lazy matrix transpose & adjoint.
  • The code in LinAlg is quite large and built up over a number of years. Many things like matrix multiplication could be refactored to be more trait-friendly, perhaps with a dispatch system more like the new broadcast. I think this is where we can make it easier to send the right arrays (I'm thinking PermuteDimsArray of conjugated reshaped ajdointed views of strided matrices) to BLAS. However, this won't make v1.0 and we are also trying to avoid performance regressions without making the code much worse. As Sacha pointed out (and I'm discovering) having transpose views with a wide range of behaviors on the elements (recursive adjoint, recursive transpose, conjugation, nothing) makes for additional complexity and a bunch of new methods to keep things working as they are.

If we think about v1.0 as somewhat stabilizing the language, then in some senses the biggest priority to make a change in behavior is the third one. I'd say: the language (including parser) should be most stable, followed by Base, followed by the stdlib (which may or may not include LinAlg, but I think almost definitely will include BLAS, Sparse, etc one day). It's a change that doesn't really affect users (mostly library developers), so I wouldn't be surprised if people's opinions differ here.

@Sacha0
Copy link
Member

Sacha0 commented Dec 1, 2017

Spot on Andy! :)

@JeffBezanson
Copy link
Member

I think the only thing left to do here is make adjoint and transpose lazy by default?

@JeffBezanson
Copy link
Member

Can this be closed now?

@StefanKarpinski
Copy link
Member Author

Next up: "Taking scalar transposes seriously"

@ChrisRackauckas
Copy link
Member

But seriously though, can we have a good interface for specifying the different 3D transposes and tensor multiplications which are used in PDE solvers? Kind of serious, but I'm not sure if I could handle being the OP to the next iteration of this madness.

@KristofferC
Copy link
Member

KristofferC commented Jan 15, 2018

no

:)

@StefanKarpinski
Copy link
Member Author

can we have a good interface for specifying the different 3D transposes and tensor multiplications which are used in PDE solvers

Definitely seems like a good subject for a package.

@andyferris
Copy link
Member

andyferris commented Jan 16, 2018

a good interface for specifying the different 3D transposes and tensor multiplications

Does TensorOperations.jl not do what you need here? (Note that at this level "a good interface" means something like a tensor network diagram, which is slightly challenging to write in code any more succinctly than the syntax of TensorOperations).

@ChrisRackauckas
Copy link
Member

Yeah, TensorOperations.jl looks good. I was slightly kidding, but I got what I needed out of it 👍 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking This change will break code needs decision A decision on this change is needed
Projects
None yet
Development

No branches or pull requests