Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support and as elementwise- and outer-product operators #35150

Merged
merged 8 commits into from
May 2, 2020

Conversation

timholy
Copy link
Member

@timholy timholy commented Mar 18, 2020

While we have broadcasting and a*b', sometimes you need to pass an operator as an argument to a function. Since we already have as a synonym for dot, these seemed to make sense.

@simeonschaub
Copy link
Member

For the elementwise product, couldn't we allow passing .* as a generic function instead? The problem I see with having a separate operator for this is that beginners use this instead of dotted function calls and aren't necessarily aware that this prevents loop fusion.

@JeffBezanson
Copy link
Member

See #34156

@timholy
Copy link
Member Author

timholy commented Mar 18, 2020

couldn't we allow passing .* as a generic function instead

That would be great, but I don't see how we'd support older Julia versions with Compat.jl. (Yes, I have a Compat.jl PR already queued up....)

@simeonschaub
Copy link
Member

simeonschaub commented Mar 18, 2020

It is actually possible to get this to work already:

julia> const var".*" = (x...,) -> .*(x...)
#3 (generic function with 1 method)

julia> map(.*, 1:6, 2:7)
6-element Array{Int64,1}:
  2
  6
 12
 20
 30
 42

Although this is quite hacky and the proper way would probably be to do this during lowering, as proposed in #34156.

@ericphanson
Copy link
Contributor

ericphanson commented Mar 18, 2020

Several packages use for tensor products; e.g. QuantumInformation.jl uses it to mean kron, while QuantumOptics.jl uses it to mean kron with the arguments reversed. Neither definition agrees with the use here: for vectors a and b, one has the identity a * b' == kron(a, b') == kron(b', a), so they both agree with each other in this scenario, and disagree with this PR by an adjoint. In other words, with this PR we have Base.⊗(a,b) == QuantumInformation.⊗(a,b') == QuantumOptics.⊗(a,b').

@mcabbott
Copy link
Contributor

Not sure we should use up to mean .*, it's a one-liner if you want to define it. FWIW my packages use this for reshaped matrix multiplication, size(ones(1,2,3) ⊙ ones(3,4,5)) == (1,2,4,5) (like Mathematica's .).

is surely something like kron, but as noted there are many variants.

@timholy
Copy link
Member Author

timholy commented Mar 18, 2020

WRT .*, I prefer that to , I just didn't realize it was available without parser support. I'd be happy to switch to that if we decide to go ahead with this.

For , it's a little unfortunate other packages define disagreeing methods given that this is the definition of outer product. Of course there's the issue of notation, but seems almost universal. I'd argue that it's really their problem if LinearAlgebra imposes the correct meaning. AFAIK, we didn't worry that packages might already define an only method meaning something else when we added only during the 1.4 series.

@mcabbott
Copy link
Contributor

Re , it's a pity that kron is backwards here, vec(a * b') == kron(b,a). But is including conjugation in an outer product quite so standard? And is length(ones(2,2) ⊗ ones(2,2)) == 4 intended?

@ericphanson
Copy link
Contributor

ericphanson commented Mar 18, 2020

For , it's a little unfortunate other packages define disagreeing methods given that this is the definition of outer product. Of course there's the issue of notation, but seems almost universal. I'd argue that it's really their problem if LinearAlgebra imposes the correct meaning. AFAIK, we didn't worry that packages might already define an only method meaning something else when we added only during the 1.4 series.

I would definitely contest that this is the correct definition of , though I admit it depends on the field. The tricky bit is that these definitions are all equal up to isomorphisms and hence things like Wikipedia do not always distinguish between them. See @Jutho's TensorKit.jl docs for a description of the various spaces.

In quantum mechanics, at least, if a and b are vectors (i.e. kets), then a ⊗ b is a vector too, which lives in the tensor product Hilbert space (i.e., a vector space). From that point of view, the definition here is incorrect.

edit: the definition of given here isn't even linear in each argument (it's anti-linear in the second argument), and the tensor product is almost always defined to be linear (e.g. the first sentence of the wiki article).

@ericphanson
Copy link
Contributor

ericphanson commented Mar 18, 2020

Re , it's a pity that kron is backwards here, vec(a * b') == kron(b,a).

The kron issue is this one, by the way: #28127. And AFAIK that's why the QuantumOptics.jl choice reverses the arguments; I've also found it makes things much simpler.

@mcabbott
Copy link
Contributor

Re kron, now that I look, #14895 seems like the closest issue? I think that everyone would agree that, for real vectors, (a ⊗ b)_i,j == a_i b_j, which is a 2-array (like this PR). But kron wants to return a 1-array, and instead of reshaping this, it reshapes the transpose. Which I always presumed was because it copied a row-major language (and, perhaps, because it didn't want to return a 4-array for kron(mat, mat)).

Perhaps giving a meaning to is a chance to fix these, and define (a ⊗ b)_i,j,k,l,m == a_i,j b_k,l,m? Which matches this PR for the case of real vectors.

@ericphanson
Copy link
Contributor

ericphanson commented Mar 18, 2020

That seems like a reasonable definition for a tensor product (edit: and seems more convenient than kron for the reasons you mention, and because it's easy to call vec to get the vectorized version). I think maybe one contention here is whether the symbol should mean the tensor product or the outer product (the difference being the complex-conjugation). As you can probably tell, I’d prefer the former :).

@ericphanson
Copy link
Contributor

I had a good discussion with my housemates about this, who each had different preferences, and the understanding we came to seemed to be that including the adjoint makes sense if you are only ever going to tensor two things together, and in that context you are usually wanting to put the second argument in the dual space anyway, so one might as well use for this purpose. However, if one might want to tensor together more than 2 things (as is the case in physics), then it makes sense to avoid the adjoint; after all, which ones would you adjoint?

As another way to see that, if you include the adjoint, the binary operator is no longer associative.

@timholy
Copy link
Member Author

timholy commented Mar 18, 2020

I could happily change this to not take the conjugate.

@ericphanson
Copy link
Contributor

What do you think about also adding AbstractVector type annotations? I.e.

(a::AbstractVector, b::AbstractVector) = a * transpose(b)

That way we leave open the possibility of implementing @mcabbott's suggestion of (a ⊗ b)_i,j,k,l,m == a_i,j b_k,l,m for higher-dimensional objects in a separate PR.

@mcabbott
Copy link
Contributor

mcabbott commented Mar 18, 2020

Perhaps it should have an ascii form, like tensor(A, B) = A .* reshape(B, ntuple(_->1, ndims(A))..., size(B)...), plus some smarter methods for B::Adjoint etc. Or perhaps this is indeed another PR.

Defining ⊗(a,b) = a * transpose(b) to apply also to matrices would seem badly wrong to me.

@timholy
Copy link
Member Author

timholy commented Mar 18, 2020

What do you think about also adding AbstractVector type annotations?

Yeah, that's really what I was thinking here without actually having imposed it. You can tell from the shortage of tests I wasn't thinking beyond a fairly narrow range of uses 😄. This really came out of an extensive discussion regarding color arithmetic in image processing (JuliaGraphics/ColorVectorSpace.jl#126).

I guess I should update this even if we end up not doing it, in case others come over and see the same issues.

@timholy
Copy link
Member Author

timholy commented Mar 18, 2020

OK, let's see what breaks:
@nanosoldier runtests(ALL, vs = ":master")

@timholy
Copy link
Member Author

timholy commented Mar 18, 2020

@nanosoldier runtests(ALL, vs = ":master")

@Jutho
Copy link
Contributor

Jutho commented Mar 18, 2020

While there are indeed different conventions of what ⊗ should do (a tensor or monoidal product can be defined even for objects which are not vectors or tensors, e.g. cobordisms or tangles or ..., any objects and morphisms in a tensor category), one aspect thar all these choices agree on is that in a linear category, of which the category Vect of vector spaces is the prime example, the tensor product is a bilinear operation, i.e. it is linear (and thus not antilinear) in both of its arguments. In simple words, there should be no complex conjugation/adjoint involved. I too like the definition of @mcabbott.

@Jutho
Copy link
Contributor

Jutho commented Mar 18, 2020

And thus, I strongly disagree with the statements about complex vectors on that wikipedia page of the outer product. v ⊗ w should be an element of V ⊗ W, whereas linear maps W->V are isomorphic to V ⊗ W^*. Only for real vector spaces with a euclidean inner product (cartesian space) are W and the dual space W^* isomorphic, for complex Euclidean spaces the dual space W^* is not isomorphic to W (but to the space conj(W)). Also for real vector spaces with a more general inner product (or none at all), the tensor product of two vectors, v^i w^j, is a different kind of object than a linear map A^i_j, even though you would represent both as a matrix.

Note: edited because originally typed on an old tablet/browser with poor Github support.

@timholy
Copy link
Member Author

timholy commented Mar 18, 2020

Good. There seems to be a growing consensus, but just to check, is the current behavior of this (modified) PR reasonable?

@mcabbott
Copy link
Contributor

mcabbott commented Mar 18, 2020

I think the vector method should avoid recursive transpose:

⊗(a::AbstractVector, b::AbstractVector{<:Number}) = a * transpose(b)

I was concerned that the reshape method would be slow for Adjoint{T,Matrix} etc, but it seems to work fine.

I wonder how others feel about the function not having an ascii equivalent.

And docs... news item for perhaps need replacing with .*, and maybe kron's docstring should mention its new cousin.

@Jutho
Copy link
Contributor

Jutho commented Mar 19, 2020

Indeed, the docstring still needs updating.

I think it is good that was not used, as I think also the plain circle is used or the elementwise product (Hadamard product), and is even slightly more common (see Wikipedia), while also has other meanings, like the symmetric counterpart of the wedge product, i.e. for vectors:
v ∧ w = v * transpose(w) - w * transpose(v)
v ⊙ w = v * tranpose(w) + w * transpose(v)

I am not sure about the absence of an ascii alternative. I am not sure about the generic name tensor for the tensor product, although it seems consistent with the absence of the explicit word product in dot (product) and cross (product) in LinearAlgebra function naming.

@timholy
Copy link
Member Author

timholy commented Mar 19, 2020

There is one problem with .*:

julia> using StaticArrays

julia> a = [SVector{2}(1, 2), SVector{2}(-3, 5)]
2-element Array{SArray{Tuple{2},Int64,1,2},1}:
 [1, 2]
 [-3, 5]

julia> a .* a     # this fails, as is right and proper
ERROR: MethodError: no method matching *(::SArray{Tuple{2},Int64,1,2}, ::SArray{Tuple{2},Int64,1,2})
Closest candidates are:
  *(::Any, ::Any, ::Any, ::Any...) at operators.jl:529
  *(::LinearAlgebra.Adjoint{var"#s666",var"#s665"} where var"#s665"<:LinearAlgebra.AbstractTriangular where var"#s666", ::AbstractArray{T,1} where T) at /home/tim/src/julia-master/usr/share/julia/stdlib/v1.5/LinearAlgebra/src/triangular.jl:1995
  *(::LinearAlgebra.Adjoint{var"#s666",var"#s665"} where var"#s665"<:(Union{LinearAlgebra.Hermitian{T,S}, LinearAlgebra.Hermitian{Complex{T},S}, LinearAlgebra.Symmetric{T,S}} where S where T<:Real) where var"#s666", ::AbstractArray{T,1} where T) at /home/tim/src/julia-master/usr/share/julia/stdlib/v1.5/LinearAlgebra/src/symmetric.jl:588
  ...
Stacktrace:
 [1] _broadcast_getindex_evalf at ./broadcast.jl:631 [inlined]
 [2] _broadcast_getindex at ./broadcast.jl:604 [inlined]
 [3] getindex at ./broadcast.jl:564 [inlined]
 [4] copy at ./broadcast.jl:854 [inlined]
 [5] materialize(::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1},Nothing,typeof(*),Tuple{Array{SArray{Tuple{2},Int64,1,2},1},Array{SArray{Tuple{2},Int64,1,2},1}}}) at ./broadcast.jl:820
 [6] top-level scope at REPL[3]:1

julia> a ..* a      # but it would be nice to have this work
ERROR: syntax: invalid operator "..*" near column 4
Stacktrace:
 [1] top-level scope at none:0

@simeonschaub
Copy link
Member

simeonschaub commented Mar 19, 2020

From a physics point of view, I'm not really sure if I agree that transpose is the most natural definition of an outer product. At least to me, the defining property of the outer product is that (u ⊗ v)(x) ≡ <v,x> u, which is different to the definition here. A more common notation for the outer product in quantum physics literature is |u><v| and usually denotes the Kronecker product instead, as @ericphanson already pointed out, but because we call it an outer product, I find it inconsistent that the inner product uses the conjugate transpose, whereas the outer product doesn't take the conjugate.

There is one problem with .*

That's definitely not the only problem with that hack. The more I think about it, the more I doubt that this should be in Base until we have a proper solution for all infix operators. Otherwise, I find it hard to justify why we should export the dotted version just for the * operator and not any other ones.

@timholy
Copy link
Member Author

timholy commented Mar 19, 2020

I agreed with the conjugate at the beginning, but generalizing to higher-dimensional tensors seems to be the stronger argument.

@timholy
Copy link
Member Author

timholy commented May 3, 2020

To the latecomers: I'm trying to do this as a public service to avoid stealing good operators. My actual goal is to avoid having to define * for RGB colors and instead force people to specify which sense of multiplication they want: , , or . Then they can broadcast them: img1 .⊙ img2 where means the component-wise product of RGB colors and each img is an array of RGB colors. If we were to do with with just dots it would have to be img1 ..* img2 indicating that we want to broadcast .*. (And so far RGB colors, like CartesianIndex, are not iterable so .* is not defined, though we have considered changing that.)

While I haven't used the Tensor* packages much there's obvious room to combine them with image processing, hence I don't want conflicts. Thus, @Jutho and the Colors world have to get on the same page.

To be honest I would prefer to see this in a TensorCore.jl package as long as other compatible users can be persuaded to depend upon it. I don't understand the reluctance to having a small package that defines these fallbacks.

@stevengj
Copy link
Member

stevengj commented May 3, 2020

To be honest I would prefer to see this in a TensorCore.jl package as long as other compatible users can be persuaded to depend upon it.

That makes sense to me. Nothing else in the LinearAlgebra package supports multilinear algebra, after all. (So in that context const ⊗ = kron would make more sense to me.)

@dkarrasch
Copy link
Member

Nothing else in the LinearAlgebra package supports multilinear algebra

But what about three-arg dot with symmetric or Hermitian A, don't we interpret A there as a bilinear form instead of a linear operator? @EricForgy

@ericphanson
Copy link
Contributor

So in that context const ⊗ = kron would make more sense to me.

⊗(A,B) = kron(B,A), sure! As I keep saying, the order is backwards for interop with column-major operations like vec and reshape which is why we need a new function!

@EricForgy
Copy link

EricForgy commented May 3, 2020

But what about three-arg dot with symmetric or Hermitian A, don't we interpret A there as a bilinear form instead of a linear operator? @EricForgy

I don't think I am supposed to be tagged there, but in an ideal world (v2.0 or v3.0), I'd love to see a MultilinearAlgebra standard library that handles dual spaces properly with a proper definition of and then LinearAlgebra is a special extension of that. In that case, yeah, dot should probably just be a twice covariant tensor, but then you'd end up seeing things like dot ⊗ A 😊

In my opinion, * should have a definition consistent with multilinear algebra. It currently isn't because in u*v', * is being used for , which is unfortunate and fixing it is definitely a breaking change 🤔

If u and v are vectors, then it should be u ⊗ v' to get a matrix. If you want to write it with *, you'd need to first convert u and v to matrices and then u*v' would be fine. Wikipedia and probably many other references that define outer product, likely assume u and v are represented as N x 1 matrices so that * is fine, but we've taken the stance that u and v are not matrices, so our definitions will deviate a little in notation, which I think is a good thing, if we want internal consistency.

I like what is done in TensorKit, but would change a few minor things. In a MultilinearAlgebra package, similar to TensorKit, I would also define tensors in terms of the vector spaces U, V, W etc with their duals U*, V* and W* etc and extend Array. For example, a vector u in U would be

Array{T,U}: U* -> T

Similarly, a covector would be

Array{T,U*}: U -> T

A matrix representing a map from V to U would be

Array{T,U,V*}

An arbitrary tensor could be expressed as:

Array{T,U,W*,V*,U,W,V*} # any ordered list of vector spaces and their duals

Then a proper definition of becomes pretty clear, e.g. give u1,u2 in U, then

u1 ⊗ u2 in Array{T,U,U}

Similarly, u in U and v in V, then

u ⊗ v in Array{T,U,V}

and

u ⊗ v' in Array{T,U,V'} # which is a matrix

where V' denotes V* as a vector space, but with the data possibly arranged differently.

More generally, if a in Array{T,U,W*,V*} and b in Array{T,U,W,V*}, then

a ⊗ b in Array{T,U,W*,V*,U,W,V*}

i.e. you just concatenate the list of vector spaces in the parameter.

Tensors cannot be denoted by something like Array{T,M,N} for integers M and N, because the vector spaces matter and their order matters.

With this, a simple and consistent * also becomes pretty clear. If u in Array{T,U} and v in Array{T,V} then

u*v in T

is only defined if U is equivalent to V* and the last vector space of u is contracted with the first vector space of v resulting in an element u*v in T.

Matrix vector multiplication is then:

*: Array{T,U,V*} x Array{T,V} -> Array{T,U}

More generally,

*: Array{T,U1,...,Um,U*} x Array{T,U,U{m+1},...,Un} -> Array{T,U1,...Un}

This provides the general machinery for MultilinearAlgebra and then LinearAlgebra becomes a specialized extension of this, e.g.

dot(u,v,M) = u'*M*v,
outproduct(u,v) = u ⊗ v' # A matrix

and given matrices a in Array{T,U,V*} and b in Array{T,W,X*}, then

kron(a,b) = a ⊗ b in Array{T,U,V*,W,X*}

Getting back to this PR, again, I would hesitate to put in LinearAlgebra until we sort all this stuff out properly.

@Jutho
Copy link
Contributor

Jutho commented May 3, 2020

@EricForgy, I just wanted to point out that in TensorKit.jl, M and N do not actually denote the upper and lower / contravariant and covariant indices. That information is encoded in the vector space. For what they are, I am happy to discuss elsewhere.

@timholy , I am not reluctant to depend on a package (big or small) if I see the use of it. However, with TensorKit.jl, I am defining a completely new type hierarchy distinct from AbstractArray, and which in many cases really only makes sense for numerical elements. Hence, I don't see any use case where this is combined with e.g. Colors.

This is my last thought about that I want to put here, I will probably remain silent after this.

To me, Julia is so successful at inter-package functionality and generic programming, because Base or standard libraries assigns a meaning to a sufficient number of basic functions and operators, and than a user/package can just define his own types to specify how he wants to implement this meaning for his specific use case, by relying on multiple dispatch. In this approach, the meaning of this operation does not always need to be sharply defined. String concatenation is not multiplication, but it is indeed an associative yet not commutative binary operation, and so by using * for it, one can also take integer powers of strings.

If a package defines a new number type which has two multiplications (for whatever reason, e.g. exact and approximate, fast and slow, two different definitions, ...), and that package decides to use * and for those two operations, that's of very limited use, because e.g. a Matrix filled with that number type will never use the multiplication in combination with the standard LinearAlgebra implementations of e.g. matrix multiplication or LU decomposition or ... Hence, the better approach is to associate two different types with the two different multiplication modes, and define * for both of them.

That is just to say, I think it is good for LinearAlgebra to associate one meaning to , even if it is not very specifically defined. I think everybody will agree that it represents some kind of tensor product. LinearAlgebra can then also provide one specific way to implement this for AbstractArray types, but that does not need to be part of the contract and does not imply that no-one else cannot use a somewhat different convention for his own types. If I were to have, say, a package about LinearMaps which are distinct from AbstractArray/AbstractMatrix, I might not be interested in general tensors, but I might be interested in e.g. a lazy Kronecker product. That is still a tensor product, it satisfies the basic concepts of it (bilinear operation, ...). So I could still overload LinearAlgebra.⊗ for that. And I might still benefit from other generic code that just assumes this basic rules for .

But I wouldn't know why such a hypothetical LinearMaps package would now have to depend on a TensorCore.jl package when it has nothing to do with tensors. Hence, in that case, LinearMaps would just define its own , and gone is generic programming...

@mcabbott
Copy link
Contributor

mcabbott commented May 4, 2020

I think I now agree with the complaint that, as it stands, this doesn't fit LinearAlgebra very well. Everything else is a matrix (modulo questions of when the trivial dimension of v' gets dropped, etc) with row indices upstairs, column indices downstairs, and nothing beyond. If A ⊗ B returns a 4-tensor, that's not an object which anything else can talk to.

But I also agree that there's value to giving symbols canonical meanings, even if there's a bit of slack in the definition. And costs to moving that to some 20-line package which will tend to get forgotten.

So perhaps should be in Base?

Or roughly ⊗(A,B) = kron(B,A) could be in LinearAlgebra. Not exactly this, as it shouldn't assume their elements commute, but just a reshape of this PR. But this does seem like a bit of a kludge, it's what you do if your language only has matrices, whereas Array{T,N} allows any N.

With this kron-like definition, u⊗v would be a vector, and thus wouldn't solve the original motivating issue here, of wanting something like an outer product of real vectors. Perhaps (within how LinearAlgebra thinks) this should be more like *′(a,b) = a * b'? The ascii version *' doesn't parse:

:(a *' b)        # ERROR: syntax: incomplete: invalid character literal
:(map(*', A, B)) # ERROR: syntax: "*" is not a unary operator (but could be made so)

Perhaps map(*', A, B) and map(.*, A, B) are what's needed?

@EricForgy -- you may like Jiahao Chen's talk https://www.youtube.com/watch?v=C2RO34b_oPM which I only found today, but seems like a good introduction to the world in which u*v' and u'*v make sense. It is interesting to wonder whether there's a path to a world which doesn't privilege two dimensions.

@StefanKarpinski
Copy link
Member

StefanKarpinski commented May 4, 2020

I haven't read all of this but the existence of this many words to be spilled by different people on the subject tells me what I need to know—this is too contentious to go in a stdlib. That taken together with the fact that it's possible to have this in a TensorCore.jl package instead indicates fairly strongly to me that this should be reverted and that a TensorCore.jl package be introduced instead which anyone who agrees with these meanings for these operators is welcome to use.

@Jutho
Copy link
Contributor

Jutho commented May 4, 2020

This makes me feel quite disappointed, not for the specific outcome, but for the way this went down. There was a constructive and long discussion about this for some time, a consensus was reached, a nice implementation was provided, everything was left open for a long time. This gets merged, one person shouts out that this is not good without all that much of motivation, or arguments which apply much more strongly to existing functionality, and frankly in a rather condescending tone. Other people try to reconstruct several of the previous arguments for how this consensus was reached in a detailed (and thus quite long) manner. Which then leads to the conclusion that this is all too many words, and therefore does not belong here.

Luckily this is not at all how I typically experience the Julia community.

Hence, revert away. With this PR, I have to probably spend 30 seconds fixing some conflicts in some of my libraries to change a freshly defined into an overloaded LinearAlgebra.⊗. With this PR reverted, I have no work at all.

@mcabbott
Copy link
Contributor

mcabbott commented May 4, 2020

Should and kron simply swop places?

kron obeys LinearAlgeabra's restriction to matrices, and matches matlab's conventions. If you only have matrices & vectors, then the order problem doesn't matter as long as you are consistent. Reshaping a matrix to a vector (or v-v) violates LinearAlgeabra's assumptions about row indices, in exactly the manner objected to above.

Whereas works for Base's Array{T,N}, which treats all N, and has no notion of that the second index of an Array{T,2} is co/contravariant. While there were many words spilled in getting this straight, there have been no examples offered of mathematical objects you might hope to represent by Array{T,N} which commonly define to mean something else.

This addresses the desire for generic concepts, nicely expressed by @Jutho. And addresses the objection of not quite fitting into LinearAlgebra. (The implementation is careful not to take a view of LinearAlgebra's slightly odd Adjoint types.) Perhaps moving kron can help balance the scales as regards what's in Base.

@MasonProtter
Copy link
Contributor

MasonProtter commented May 4, 2020

As someone who was lurking here, I just want to chime in that I think this implementation is great and I would be sad to see it reverted. The complaints from Eric seem to generally be complaints about how Array{T, N} works, not really anything that could be reasonably addressed in Base or LinearAlgebra whether or not this PR happened.

Should we just stop developing LinearAlgebra because a few people are unhappy that we didn't take dual spaces seriously enough? No! A conversation about how Base + stdlibs handles co(ntra)variance will need to wait for 2.0.

Regarding names, I agree with @mcabbott that if I was at a black-board with someone and they said "V tensor U", I'd know they meant "the tensor product between V and U", and this is an expression I've heard many times before. That said, if people really want them to be renamed tensorproduct and hadamardproduct, okay whatever. I wasn't planning on using the ascii names anyways.

@EricForgy
Copy link

EricForgy commented May 5, 2020

Jutho said:

rather condescending tone

I have nothing but adoration for everyone commenting on this PR and if my tone said anything else, then that says more about weakness in my ability to communicate in ascii (and maybe because I have other soul-crushing responsibilites at the moment and not a lot of time to comment properly) than any intention on my part. I do apologize for any miscommunication.

I tried to explain my initial allergic reaction. Initially, it was simply about introducing unicode. Once unicode is more universally workable in Windows REPL (I think it will be soon), I'll have less alergic reactions to unicode, but that is an insignificant matter not worth discussing further I think.

Rereading the thread again, I think even hadamard isn't really necessary, but I certainly think that introducing tensor to LinearAlgebra at this time is premature. I have a vested interest in seeing Julia get tensors correct so I don't apologize for objecting to this PR into a standard library.

Tim said:

For ⊗, it's a little unfortunate other packages define disagreeing methods given that this is the definition of outer product.

I (respectfully) disagree. In LinearAlgebra, an MxN array of numbers is meant to represent a linear map between finite-dimensional vector spaces. On the linked Wikipedia page, I see:

image

Although this is represented as a two-dimensional array of numbers they call "matrix", it is not a linear map and is out of scope (imo) for LinearAlgebra. That is a twice-contravariant tensor. As I noted above, there are four distinct geometric elements that can be achieved from tensor product of vectors and dual vectors:

  • u ⊗ v - This is a twice-contravariant tensor (Covector -> Vector) outside the scope of LinearAlgebra.
  • u ⊗ v' - This is a matrix, i.e. linear map (Vector -> Vector), and is in scope of LinearAlgebra.
  • u' ⊗ v - This is a dual matrix, i.e. a linear map (Covector -> Covector) outside the scope of LinearAlgebra.
  • u' ⊗ v' - This is a twice-covariant tensor (Vector -> Covector) outside the scope of LinearAlgebra.

All of the above have representations as 2D arrays of numbers but only the second one is a linear map appropriate for LinearAlgebra.

If anyone tries to use u ⊗ v as a linear map, I think there is a high chance they have formulated the problem incorrectly. Having said that, if your vectors are real and your bases are orthonormal, then the M x N array representation of u ⊗ v and the M x N array representation of u ⊗ v' are the same, but only the latter is a linear map appropriate for LinearAlgebra.

A little further down the WIkipedia page, I see this:

image

This notation only makes sense if u and v are column matrices (which we decided not to do) and the fact Julia implemented this as dispatch was a mistake in my opinion that should be fixed in vX.0 (hopefully X = 2) because you are effectively saying * = ⊗. This seems to be the cause for some confusion in various packages (from what I gather).

Instead of dispatching u*v', we should have made that u ⊗ v' since it is really tensor product.

If this PR is reverted, then I don't take it as a waste of anyone's time and I hope the discussion here can lead to a truly great implementation of MultilinearAlgebra.

@Jutho
Copy link
Contributor

Jutho commented May 5, 2020

Dear @EricForgy , that comment was certainly not addressed to you (and I should probably not have made it anyway, this was a spur of disappointment).

I like what your attempts at a more rigorous treatment of covariant and contravariant tensors, but I think that goes beyond the scope of LinearAlgebra, because it is too advanced for most users. Furthermore, as you point out, for real euclidean spaces all these things become equivalent, which is the use case of probably over 90% of the users. Finally, I don't think LinearAlgebra only considers matrices which are linear maps. dot(v,A,w) could be used to represent a bilinear form, i.e. a twice covariant tensor. But again, this finegrained distinction is irrelevant for cartesian tensors and I don't think you want to enforce this on users that do not need this.

@EricForgy
Copy link

Thanks Jutho.

I certainly agree that getting a robust formulation of contravariant and covariant tensors is outside the scope of LinearAlgebra. I do think (aside from the possible exception of dot), LinearAlgebra should be constrained to vector spaces, dual vector spaces and linear maps between them. Tensor product takes you out of scope for everything except u ⊗ v' for which it is currently an error.

What I see for Julia 2.0 is a new standard library MultilinearAlgebra which gets all this stuff right in the general multilinear case (and I actually think TensorKit is a good contender as a starting point) and then LinearAlgebra is revised to be consistent with this more general framework (and maybe LinearAlgebra depends on and extends MultilinearAlgebra).

And to Mason's point,

The complaints from Eric seem to generally be complaints about how Array{T, N} works, not really anything that could be reasonably addressed in Base or LinearAlgebra whether or not this PR happened.

I don't see it that way. I am not concerned about Array{T,N} as a general data structure. It is fine. I am concerned about this specific PR. I don't think belongs here as implemented in this PR.

timholy added a commit that referenced this pull request May 5, 2020
…ors (#35150)"

This reverts commit f36036c.

This was viewed by some as premature, and there seems to be willingness to develop
it in a package first with an eye towards future inclsion.
timholy added a commit that referenced this pull request May 5, 2020
…ors (#35150)"

This reverts commit f36036c.

This was viewed by some as premature, and there seems to be willingness to develop
it in a package first with an eye towards future inclusion.
@timholy
Copy link
Member Author

timholy commented May 5, 2020

I've decided to revert this, see #35744. The discussion was extremely fruitful and it has resulted in a new package, TensorCore, that can be used on any version of Julia except nightly (and will be usable there too once the reversion merges).

To those who are disappointed, do not think this is the end of the road. There will be a Julia 1.6 release, after all, and the principle of "do no harm" means it's better to be patient and get it right than to rush it in now at a time where it is still controversial.

If you want to see this eventually in LinearAlgebra or a new standard library, there is a very clear path forward: use TensorCore, depend on TensorCore, and develop TensorCore. Show that it is such a good foundation that everyone should want it. There is a glorious history of good packages becoming features of Julia itself, and there is no reason to believe it couldn't be true here one day as well. So, to those who are disappointed, keep in mind that you are the ones who can change it!

@StefanKarpinski
Copy link
Member

StefanKarpinski commented May 5, 2020

This makes me feel quite disappointed, not for the specific outcome, but for the way this went down. There was a constructive and long discussion about this for some time, a consensus was reached, a nice implementation was provided, everything was left open for a long time. This gets merged, one person shouts out that this is not good without all that much of motivation, or arguments which apply much more strongly to existing functionality, and frankly in a rather condescending tone. Other people try to reconstruct several of the previous arguments for how this consensus was reached in a detailed (and thus quite long) manner. Which then leads to the conclusion that this is all too many words, and therefore does not belong here.

I certainly didn't intend to be condescending. From my perspective, there was a very long technical discussion that I didn't have the time or expertise to follow, which I was hoping would result in an "executive summary" of some kind before the resolution of this issue. But that didn't happen, instead there was a fairly abrupt merging of the PR without any summary.

I think that the output of these long conversations needs to be more than just a PR that could potentially be merged, it should also include a (short) summary of the reasons why, not just for those who were participating in the conversation, but also for those of us who are on the hook for maintaining, explaining, and justifying any APIs that ship with Julia. It's a bit different when the people having the conversation are the same people who are on the hook for maintaining and explaining it, but that's mostly not the case here: of the many people who participated here, only @timholy spends a meaningful amount of time maintaining Base or stdlib/LinearAlgebra.jl — and he seems to prefer (as is done now) to have this in a separate package.

I would note that I wasn't entirely alone here: @stevengj also expressed disapproval of this PR—and I've learned over the years to pay attention even if he is the only person expressing disapproval of something. He was, for example, the only person who complained about the soft scope change before 1.0, which turned out to be kind of a big kerfluffle. This issue was also discussed on the most recent triage call (which is when I wrote some of my comments), and the common sentiment was: what have they signed us up for with this PR and and why?

In any case, I strongly believe that having this in a small package outside of stdlibs as has now been done is a much better way for this to be developed, adopted and maintained.

@Jutho
Copy link
Contributor

Jutho commented May 5, 2020

Dear @StefanKarpinski, this comment was in fact also not addressed to you, but let's just forget about it. I certainly understand your point of view as maintainer, and it makes more sense to me than any of the other counter arguments presented, such as the ambiguity of the symbol and the name, which applies much stronger to some other symbols exported by LinearAlgebra.

That's also what I meant with inconsistency, but it is I assume unavoidable that such inconsistencies sometimes arise in a project which is simultaneously developed by so many enthusiastic developers and contributors. And maybe some of those (such as \times for the cross product) can also be revisited in a 2.0 release.

@JeffBezanson
Copy link
Member

It would be great to have an issue (is there one already?) listing all the dodgy stuff in LinearAlgebra that we want to deprecate or revisit. I certainly wouldn't spend much time defending kron, personally :)

@StefanKarpinski
Copy link
Member

The argument "we already do X, which makes even less sense, so why not also do Y?" is definitely not the kind of argument we should accept. ["There's already a bit of mud in the house, so why not also smear dog poop everywhere?"] Instead, we should not do Y and open an issue about changing X in 2.0 when we can.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
linear algebra Linear algebra
Projects
None yet
Development

Successfully merging this pull request may close these issues.