Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arraypocalypse Now and Then #255

Closed
21 of 27 tasks
mbauman opened this issue Sep 15, 2015 · 276 comments
Closed
21 of 27 tasks

Arraypocalypse Now and Then #255

mbauman opened this issue Sep 15, 2015 · 276 comments
Labels
arrays [a, r, r, a, y, s]

Comments

@mbauman
Copy link
Member

mbauman commented Sep 15, 2015

This issue supersedes the 0.4 work towards array nirvana (#7941), and will tracks the issues we aim to complete during 0.5 and beyond — now updated through work on 0.7. This is an umbrella issue and will track specific tasks in other issues. Please feel free to add things that I've missed.

Required underlying technologies

Major 0.5 breaking behavior changes

Major 0.6 breaking behavior changes

Possible future breaking changes

New functionality

Other speculative possibilities

  • A mutable fixed-size buffer type, which would allow for a Julia-native Array definition (#12447); this type could also be used for I/O buffers and string storage.
  • Base IndexSet IntSet on BitArray or perhaps any AbstractArray{Bool}. (#20456)
  • Rework nonscalar indexing to prevent calling find on logical arrays and simply wrap it with an IndexSet LogicalIndex instead? (#19730)
  • Negated indexing with complement IndexSet (NegatedIndex type julia#1032) or special Not type? (Perhaps in a package: https://github.com/mbauman/InvertedIndices.jl)
  • Deprecate the linearization of trailing dimensions when more than one index is provided (partial linear indexing). (#20079)
  • Only allow indexing into N-dimensional arrays with 1 or N indices, deprecating "partial" indexing and trailing singleton dimensions (Omitted and additionally indexed dimensions in getindex julia#5396 and deprecate (then remove) generalized linear indexing julia#14770). Initial attempt at WIP/RFH: Deprecate generalized linear indexing julia#20040. Only allow linear indexing when there is exactly one index provided. Only allow omitting indices (using less than N indices in an N-dimensional array) when all omitted dimensions are of length 1. Only allow trailing indices (more than N indices) when all indices are 1 (singletons). (#21750)
  • Find alternate syntax for typed arrays – indexing into a type (T[...]) is kind of a bad pun. This syntax is especially bad since some variations are parsed as indexing into a type while others are parsed as special forms (typed hvcat, typed comprehensions)
  • Change hashing to run-length encoding of the diff of arrays, which would allow integer ranges and arrays to hash as equal again. (Hashing integer ranges  julia#12226 (comment), Make arrays and ranges hash and compare equal julia#16401)
  • Move sparse arrays out of base into a standard package. (#25249)
  • Allow nontraditional indices (#16260)
  • @sub @view macro (#16564)
@tbreloff
Copy link

This looks like a great list Matt, thanks. I'm a little scared of the
fallout but it'll be a huge leap forward for the language.

On Tuesday, September 15, 2015, Matt Bauman [email protected]
wrote:

This issue supersedes the 0.4 work towards array nirvana (#7941
JuliaLang/julia#7941), and will track the
issues we aim to complete during 0.5. This is an umbrella issue and will
track specific tasks in other issues. Please feel free to add things that
I've missed (at least during the first half of 0.5).

Required underlying technologies

Breaking behavior changes

New functionality

Other speculative possibilities

  • A mutable fixed-size buffer type, which would allow for a
    Julia-native Array definition (#12447
    Introduce Buffer type and make Array an abstraction on top of it julia#12447)
  • Base IndexSet on BitArray or perhaps any AbstractArray{Bool}.
  • Rework nonscalar indexing to prevent calling find on logical arrays
    and simply wrap it with an IndexSet instead?
  • Negated indexing with complement IndexSet? (Perhaps in a package)


Reply to this email directly or view it on GitHub
#255.

@JeffBezanson
Copy link
Member

Yes, great list! Let's roll up our sleeves!

@tbreloff
Copy link

Are there any pieces of this that are unclaimed? I'd like to help, but I
don't want to step on anyone's toes.

On Tuesday, September 15, 2015, Jeff Bezanson [email protected]
wrote:

Yes, great list! Let's roll up our sleeves!


Reply to this email directly or view it on GitHub
#255.

@StefanKarpinski
Copy link
Member

It's a phenomenal list. There's actually only a few changes that are very breaking, which is nice, but those are significant enough.

@mbauman
Copy link
Member Author

mbauman commented Sep 16, 2015

There is definitely more than enough work to go around! I don't think any of these tasks are claimed.

Things like dropping scalar dimensions are rather simple changes, but will take a lot of work finding and fixing bugs... and that part is easy to collaborate on. Same goes for views (if you ignore the perf issues with ReshapedArrays and inbounds). Anyone is welcome to dig in!

@jakebolewski
Copy link
Member

Views is hard, you have to make it through bootstrap without 🔪 yourself in the 👀.

@StefanKarpinski
Copy link
Member

Having just done a bunch of work to get string changes through bootstrap, I'm inclined to believe this.

@milktrader
Copy link

Thanks for doing this @mbauman, so much to digest

@jiahao
Copy link
Member

jiahao commented Sep 16, 2015

I've added "Remove default no-op behavior for (c)transpose" as an item. I expect much complaining, but as we've discussed before, it's simply wrong to assume that <:Any is a scalar and the logic error rears its head every time one tries to wrap and/or implement custom array/matrix types. cc @jakebolewski @andreasnoack

@StefanKarpinski
Copy link
Member

I think we need to think through the options for that carefully. It's pretty idiomatic to write A' to transpose a non-complex matrix.

@mbauman
Copy link
Member Author

mbauman commented Sep 16, 2015

Isn't it possible that the (c)transpose wrapper will solve this issue? There's a lot of design work that will need to go into it, but:

transpose(A::AbstractVectorOrMatrix) = TransposeType(A) # Will call `transpose` upon indexing, too
transpose(::AbstractArray) = error("cannot take transpose of 3+ dim array") # yeah, error methods aren't the best…
transpose(x) = x

@carnaval
Copy link

related: I think some of the typing problems in linalg (and transpose behavior) comes from the fact that we represent a blocked linear operator as an array of arrays. We may want to switch to a new type that knows of the various sizes inside it for that, I remember discussing that with @andreasnoack. 0.5 might be a time to at least think about it.

@jiahao
Copy link
Member

jiahao commented Sep 16, 2015

Isn't it possible that the (c)transpose wrapper will solve this issue?

Maybe; we'd have to think about it.

The blocking issue last time is that transpose has to be recursive to handle cases like Matrix{Matrix} (Example: A = [rand(1:5, 2, 2) for i=1:2, j=1:2]; A') correctly, but people want to write A' on 2D arrays on non-numeric types (e.g. images, as Matrix{<:Colorant}) and expect the transpose to not apply to the scalar elements. The no-op transpose(x::Any) method exists to handle these cases. However, this definition conflicts with matrix-like objects, which have the algebraic semantics of matrices but are not stored internally in any array-like form, and hence by JuliaLang/julia#987 should not be an AbstractArray (QRCompactWYQ is the poster child, but we have many such examples). If you introduce a new matrix-like type, you have to explicitly define (c)transpose otherwise you get the no-op fallback which is a source of many bugs.

To be clear, the behavior we would break explicitly is the claim (which you can find in the help for permutedims) that

Transpose is equivalent to permutedims(A, [2,1]).

This equivalence makes no sense for types that are not AbstractArrays and that are AbstractArrays of non-scalars, and we actually do have matrix-like types that need this more abstract sense of transpose.

@tbreloff
Copy link

I think assuming that a Matrix{Matrix} will automatically recursively
transpose the elements is bad and dangerous. I'd rather see a special type
BlockMatrix{Matrix} that does what you're looking for.

On Wed, Sep 16, 2015 at 10:30 AM, Jiahao Chen [email protected]
wrote:

Isn't it possible that the (c)transpose wrapper will solve this issue?

Maybe; we'd have to think about it.

The blocking issue last time is that transpose has to be recursive to
handle cases like Matrix{Matrix} (Example: A = [rand(1:5, 2, 2) for
i=1:2, j=1:2]; A') correctly. However, people want to write A' on 2D
arrays on non-numeric types (e.g. images, as Matrix{<:Colorant}) and
expect the transpose to not apply to the scalar elements. The no-op
transpose(x::Any) method exists to handle these cases. However, this
definition conflicts with matrix-like objects, which have the algebraic
semantics of matrices but are not stored internally in any array-like form,
and hence by JuliaLang/julia#987 JuliaLang/julia#987 should
not be an AbstractArray (QRCompactWYQ is the poster child, but we have
many such examples). If you introduce a new matrix-like type, you have to
explicitly define (c)transpose otherwise you get the no-op fallback which
is a source of many bugs.

To be clear, the behavior we would break explicitly is that

Transpose is equivalent to permutedims(A, [2,1]).

would now be false. This equivalence makes no sense for types that are not
AbstractArrays, and we actually do have matrix-like types that need this
more abstract sense of transpose.


Reply to this email directly or view it on GitHub
#255.

@jiahao
Copy link
Member

jiahao commented Sep 16, 2015

@tbreloff that's exactly my point. Most people think it's odd that transpose should be recursive, but it must be to be a mathematically correct transpose, and that disquiet exposes corner cases where transposition is not simply permutedims(A, [2,1]). (Although it’s true that Matrix{Matrix}} is not really a blocked matrix type, because there are absolutely no guarantees that the inner matrices have dimensions that are consistent with any partitioning of a larger matrix.)

@mbauman
Copy link
Member Author

mbauman commented Sep 16, 2015

Ah, yes, I forgot about all the Tensor-like objects that aren't AbstractArrays. No matter what happens, either the authors of Tensor-like objects will need to communicate to Julia somehow that they're not scalars (sometimes being an AbstractArray works, but not always), or the authors of Scalar-like objects will need to do the reverse (sometimes being a Number works, but not always), or both. This same sort of scalar-or-not question rears its head all over the place… e.g., indexing: JuliaLang/julia#12567.

Right now we require a mishmash of method specializations for the tensors with some scalar-like fallbacks. This means that some get forgotten and we end up with scalar methods getting called, returning the wrong result.

Since we can't communicate this through supertypes (JuliaLang/julia#987 (comment)), I think it's gotta either be through better documentation of the required methods or having those methods explicitly encoded into a traits-based system. And if we remove all fallbacks (which ensures correct behavior at the cost of missing methods for everyone), I think we need to make this as simple as possible with traits.

@johnmyleswhite
Copy link
Member

+1

I really think we should start enumerating traits for AbstractArray's. Linear indexing seems to have been an amazing example of the power of a few careful design decisions involving traits.

@mbauman
Copy link
Member Author

mbauman commented Sep 16, 2015

One possible solution would be to keep AbstractArray <: Any, and introduce AbstractNonArrayTensor <: Any alongside it. Everything else would be considered "scalar" as far as indexing and linear algebra are concerned.

Note that this is distinct from and much more well-defined than an Atom vs. Collection distinction (JuliaLang/julia#7244 (comment)); A[:] = (1,2,3) and A[:] = "123" behave very differently from A[:] = 1:3 for A = Array(Any, 3), as they should.

@ScottPJones
Copy link

I really think we should start enumerating traits for AbstractArray's.

@johnmyleswhite By any chance did you mean language supported "traits"? That's one thing I've really wanted to see in the language since JuliaCon.

@johnmyleswhite
Copy link
Member

Yes, I meant language supported traits.

@ScottPJones
Copy link

Do you have any ideas/suggestions about how traits could be added to Julia, syntactically? They would be very useful for arrays, strings, encodings, at the very least.

@johnmyleswhite
Copy link
Member

I prefer leaving those decisions to Jeff rather than speculating.

@ViralBShah
Copy link
Member

Given that this is an umbrella issue, it would be nice to discuss specific items in their own issues.

@ScottPJones
Copy link

I do think though that having traits in the language might substantially change the design for Arrays, which is why discussion of traits, at least in the area of how they could be used for better Array abstractions, would be useful.

@mbauman
Copy link
Member Author

mbauman commented Sep 16, 2015

Please move traits discussion to JuliaLang/julia#5, and the (c)transpose issue to JuliaLang/julia#13171.

@StefanKarpinski
Copy link
Member

I don't think the syntax for traits needs to be figured out before figuring out what traits are important for arrays. In fact, having more examples of traits that we actually need is excellent for helping design the language feature.

@ScottPJones
Copy link

Ok, good point, as long as traits are being thought of as part of the design.

@ScottPJones
Copy link

Upper and lower for triangular matrices? Those seem like good candidates for being done as traits to me.

@davidanthoff
Copy link

We've had that for ages; see sub and slice, which got fast in time for julia-0.4. We now have a few additional view types, too (at least ReshapedArray and the unexported PermutedDimsArray).

Has there been any consideration of renaming sub to view? sub really doesn't indicate very well that it will return a view...

@JaredCrean2
Copy link

view is used by the ArrayViewspackage which currently has significant performance advantages in certain cases (small contiguous views).

@andreasnoack
Copy link
Member

Following up on @davidanthoff, I think we should deprecate either sub or slice and it should probably be sub now that getindex behavior matches slice.

@mbauman
Copy link
Member Author

mbauman commented May 24, 2016

Is the problem with allowing floats to be used for indexing technical or philosophical?

It's a mix of both, but if we detangle scalar/nonscalar to_index like I suggest above then that removes (the last of?) the technical reasons. Linear indexing requires doing math on the indices, which requires integers. A lot has changed since we merged that deprecation (JuliaLang/julia#10458).

@davidanthoff
Copy link

Ah, I had thought the base capability is now a complete superset of the ArrayViews stuff and always preferred... It is a bit of a shame that the more intuitive name is used in the package and base is left with something less clear...

@StefanKarpinski
Copy link
Member

It seems unfortunate and somewhat backwards for a name used in a package to block choosing a much clearer name for a function in Base. Perhaps the way forward is to eliminate the performance advantage of ArrayViews, deprecate that package, and change the function name to Base.view.

@timholy
Copy link
Member

timholy commented May 24, 2016

My assumption/hope is that the performance gap will go away when we are able to heapstack-allocate containers with references...and I am skeptical that there's anything we can do about it without that. The ArrayView wrapper is just smaller, so being able to inline & elide the wrapper creation should do the trick.

That's why the difference only shows up with creation of small arrays---on most other benchmarks, SubArray equals or outperforms ArrayViews.

Oh, and I agree about deprecating sub.

@JeffBezanson
Copy link
Member

Base and ArrayViews could both use view. After all, ImageView also uses view :)

@JaredCrean2
Copy link

JaredCrean2 commented May 24, 2016

I suppose that could work because ArrayViews defines methods only for Array but Base defines methods for AbstractArray (I think?). How would this work when different scopes exists:

module A
  function myfunc(a::AbstractMatrix)
     av = view(a, :, 1)
     # do something with av
   end
end

module B 
  using ArrayViews
end

Does typeof(av) change depending on whether module B has been loaded (assuming a is an Array)?

@timholy
Copy link
Member

timholy commented May 24, 2016

ArrayViews would have to stop exporting view, and anytime you wanted to use ArrayViews' version you'd say ArrayViews.view(A, :, 1).

ImageView would also have to stop exporting view, but I'm fine with that idea.

@StefanKarpinski
Copy link
Member

Of the remaining issues, only #57 and JuliaLang/julia#16846 remain for 0.5; moving this issue to 0.6.

@timholy
Copy link
Member

timholy commented Jun 16, 2016

I'm also planning to do JuliaLang/julia#16260 (comment):

  • transitionally (julia-0.5 only) make size and length throw an error for arrays with unconventional indexing
  • introduce @arraysafe to rewrite calls to size and length to something that doesn't throw an error
  • merge allocate_for into similar

Probably won't be done until after JuliaCon, unfortunately. In that same discussion, @eschnett has made a case for introducing a separate type for linear indexing, and pointed out that the introduction of linearindices is an opportune moment to do so; I don't disagree, but I don't think I'll have time to tackle that myself, so that one is up-for-grabs.

@StefanKarpinski
Copy link
Member

As long as you're on it, @timholy – needs to be done by next week so we can tag an RC. If you'd like to open an issue and put it in the 0.5.0 milestone so we can track it, you can.

@ufechner7
Copy link

Shouldn't the title be adapted, if the milestone is moved?

@StefanKarpinski StefanKarpinski changed the title Arraypocalypse Now (0.5 release) Arraypocalypse Now (0.5 release and onward) Jun 17, 2016
@StefanKarpinski StefanKarpinski changed the title Arraypocalypse Now (0.5 release and onward) Arraypocalypse Now Jun 17, 2016
@StefanKarpinski StefanKarpinski changed the title Arraypocalypse Now Arraypocalypse Now and Then Jun 17, 2016
@JeffBezanson
Copy link
Member

I believe the 0.6 parts of this are reflected in more specific issues and PRs; moving to 1.0.

@stevengj
Copy link
Member

See also JuliaLang/julia#20164 to more easily opt-in to views for a large block of code.

@StefanKarpinski
Copy link
Member

Everything in this issue is either done, has its own issue, or isn't going to happen (I checked).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrays [a, r, r, a, y, s]
Projects
None yet
Development

No branches or pull requests