RFC: use first/last as endpoints for Range indexing #15750

timholy · 2016-04-02T22:12:25Z

This is basically a trial balloon to see what folks think of some thoughts I've had on handling syntax challenges for certain parts of #15648.

Demo:

julia> a = rand(5)
5-element Array{Float64,1}:
 0.963267 
 0.651218 
 0.149261 
 0.0209307
 0.323606 

julia> a[first:3]
3-element Array{Float64,1}:
 0.963267
 0.651218
 0.149261

julia> a[2:last]
4-element Array{Float64,1}:
 0.651218 
 0.149261 
 0.0209307
 0.323606

Compared to a[2:end], a[2:last] does not require special parser tricks, and 2:last has independent existence and can be passed as an argument to a function.

mbauman · 2016-04-02T22:18:03Z

base/multidimensional.jl

@@ -210,15 +210,15 @@ index_shape_dim(A, dim, ::Colon) = (trailingsize(A, dim),)
 # ambiguities for AbstractArray subtypes. See the note in abstractarray.jl

 # Note that it's most efficient to call checkbounds first, and then to_index
-@inline function _getindex(l::LinearIndexing, A::AbstractArray, I::Union{Real, AbstractArray, Colon}...)
+@inline function _getindex(l::LinearIndexing, A::AbstractArray, I::Union{Real, AbstractArray, Colon, EndpointRange}...)


Shouldn't be necessary if EndpointRange <: Range{Int}.

mbauman · 2016-04-02T22:26:27Z

Clever. Another thought I've had here is to parse A[CartesianIndex{2}((1,1,)), (end+1)÷2, end-1] as A[CartesianIndex{2}((1,1,)), (x)->(x+1)÷2, (x)->x-1]. The array would pass its last index in each dimension to the anonymous function. That doesn't solve the first index, though, and I have no idea what kind of performance impact it might have.

I do imagine that the majority of uses of end don't do any arithmetic on it, and are just ranges like you propose.

timholy · 2016-04-02T22:47:19Z

Arithmetic is also doable (e.g., add fields first_offset and last_offset). A question is whether div is worth supporting, or just offsets.

eschnett · 2016-04-02T23:44:13Z

Offsets are obviously quite common.

If you're using div (or a multiplier), then my guess is that this is usually a sign of a manual array reshaping. If reshaping is easy, and there's a tutorial and examples, then the remaining cases are likely corner cases where a second line of code assigning the result of size to local variable might not hurt too much.

mbauman · 2016-04-03T00:32:27Z

Other methods I've used or seen are min/clamp, mod1(i, end), setdiff(1:end, idxs), 1:end .!= i and as array concatenation elements (e.g., A[vcat(1:i,end-i+1:end)]). But I probably (ab)use this more than I should.

Fast anonymous functions are amazing:

julia> Base.getindex(A::AbstractArray, f::Function) = A[f(length(A))]

julia> f(A) = A[(x)->x-1]
       g(A) = A[end-1];

julia> @benchmark f(1:10)
================ Benchmark Results ========================
     Time per evaluation: 5.69 ns [5.59 ns, 5.79 ns]
Proportion of time in GC: 0.00% [0.00%, 0.00%]
        Memory allocated: 0.00 bytes
   Number of allocations: 0 allocations
       Number of samples: 5401
   Number of evaluations: 344501
         R² of OLS model: 0.956
 Time spent benchmarking: 6.15 s


julia> @benchmark g(1:10)
================ Benchmark Results ========================
     Time per evaluation: 5.80 ns [5.72 ns, 5.89 ns]
Proportion of time in GC: 0.00% [0.00%, 0.00%]
        Memory allocated: 0.00 bytes
   Number of allocations: 0 allocations
       Number of samples: 7401
   Number of evaluations: 2314101
         R² of OLS model: 0.957
 Time spent benchmarking: 8.34 s

timholy · 2016-04-03T11:14:45Z

Indeed the more flexible implementation would be through building up anonymous functions. That should be able to handle any expression.

Then the key question is whether it's OK/attractive to (ab)use first and last this way, or whether we should introduce constants named something like ifirst and ilast or firstindex and lastindex.

JeffreySarnoff · 2016-04-04T08:47:23Z

the current docs say this about first, last:
first is the first element of an iterable collection ...
last is the last element of an ordered collection ...
wlog, first is the initial and last is the final element of a (bounded) ordered collection.
That seems appropriate to n-d array indexing and subindexing.

endof gives the index used with last when last is used applicatively.
If it is important to follow that use, we could use firstof, lastof to retrieve the indices that first, last use when applied as functions -- I'd rather use first, last both ways than deal with another set of things to remember (firstof, lastof). Nonetheless, if there is an important structural advantage to introducing firstof, lastof, it's not too much of an inconvenience.

timholy · 2016-04-04T10:30:17Z

Good point about the distinction between value and index. One of the other notions I'm contemplating is introducing an index function, which is just a wrapper around an array indicating that one wants the index(es) rather than the values. I.e., for x in X iterates over the values, while for I in index(X) iterates over the index. This is very much like our current eachindex, but eachindex is designed to pick a particular (fast) iterator, whereas I think of index as being lazy (allowing the choice of particular iterator to be delayed).

Of course, perhaps it would be better to call it keys, since we already have that function. But by this perspective, keys is really index(stored(container)), meaning it only iterates over stored values (thinking about an associative container as a sparse array), so maybe we still want index.

If we introduce that, then I think endof could be replaced by last(index(a)). This might then give a justification to use first, last as index-markers rather than introducing firstof, lastof.

If instead we keep endof and use it as an indexing marker, maybe its converse should be called beginof.

kmsquire · 2016-04-04T14:03:13Z

I love the fact that you dispatch on function types for this, @timholy!

That said, I would really hope to keep this final choice of names simple
and intuitive. Since we already use end in a very similar way to last,
I would prefer either

start and end, or
first and last, as here, but deprecate end

(assuming, of course, that last can fully replace end functionally).

We do have a start function, already, at least, which indicates the start
of iteration, so maybe it's not too much of a pun?

Cheers!

On Monday, April 4, 2016, Tim Holy [email protected] wrote:

Good point about the distinction between value and index. One of the other
notions I'm contemplating is introducing an index function, which is just
a wrapper around an array indicating that one wants the index(es) rather
than the values. I.e., for x in X iterates over the values, while for I
in index(X) iterates over the index. This is very much like our current
eachindex, but eachindex is designed to pick a particular (fast)
iterator, whereas I think of index as being lazy (allowing the choice of
particular iterator to be delayed).

Of course, perhaps it would be better to call it keys, since we already
have that function. But by this perspective
http://julialang.org/blog/2016/03/arrays-iteration#formalizing-abstractarray,
keys is really index(stored(container)), meaning it only iterates over
stored values (thinking about an associative container as a sparse
array), so maybe we still want index.

If we introduce that, then I think endof could be replaced by
last(index(a)). This might then give a justification to use first, last
as index-markers rather than introducing firstof, lastof.

If instead we keep endof and use it as an indexing marker, maybe its
converse should be called beginof.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#15750 (comment)

mbauman · 2016-04-04T14:31:41Z

end can only be used within indexing syntax, since it's a keyword. The inverse there would be begin — we'd have to deprecate using begin blocks within index expressions, but that seems to be a very unlikely use. While that would work, it has the downside of increasing special syntaxes… and couldn't be used within normal function calls (e.g., sub(A, 2:end) doesn't work).

I agree there should only be one way to do it, so if we find a replacement here, end would be deprecated.

Dispatch on function types here is cute, but I'm afraid it's too cute. Why should first+1 work but not start+1? We'll need a dedicated type to keep track of operations in any case… so I think I'd just use something like End(): https://bitbucket.org/maurow/ragged.jl/src/ebbd1cdfd796ef626e9c5193dc75b6494fe8b21f/src/Ragged.jl?at=master&fileviewer=file-view-default#Ragged.jl-13:41. I'm still not a huge fan, since it feels like we're re-implementing anonymous functions.

timholy · 2016-04-04T15:20:17Z

Thanks, @kmsquire. And thanks, @mbauman, for writing almost word-for-word the post I would have written.

However, rather than the implementation in your link, I think that (now) the better approach is

immutable IndexFunction{F<:Function}
    index::F
end

IndexFunction(::First) = IndexFunction((A,d) -> first(inds(A,d)))
IndexFunction(::Last)  = IndexFunction((A,d) -> last(inds(A,d)))

(+)(m::Union{First,Last}, i::Number) = IndexFunction(m)+i
(+)(m::IndexFunction, i::Number) = IndexFunction((A,d) -> m.index(A,d)+i)
...

etc. We could let First either be the type of const ifirst = First() or an alias for typeof(first), depending on what we decide on the balance of "too cute" vs "reduce exports." (I'm leaning towards the constants idea myself, though not strongly.)

Either way, I think this is fully composable, and it uses anonymous functions rather than reimplementing them. It does require that you define all the operations in #15750 (comment).

mbauman · 2016-04-04T16:59:51Z

Yes, that works. I guess my point is that it's not all that different from a terse lambda notation (#5571 (comment)). Instead of using syntax and precedence to solve the closure expansion issue, you must manually define the functions that it can expand through. It seems fiddly (and maybe has ambiguity issues), but the proposed rules in 5571 are fiddly, too.

timholy · 2016-04-04T19:10:43Z

Are you proposing to instead implement these operations via special parsing rules? Or pointing out that a parser-free solution shares similar, significant downsides to that example?

I'm not thrilled either, but I don't see a solution different from "lean on the parser even harder than we have so far" (get the parsing working outside of []) or "implement a specific set of operations" (parser-free).

mbauman · 2016-04-04T20:16:28Z

I point to it because it feels like the same problem to me. We want to be able to express this:

sub(A, (ifirst, ilast) -> ifirst+1:ilast, (ifirst, ilast) -> setdiff(ifirst:ilast, idxs), (ifirst, ilast) -> [ifirst,ilast])

in a terse, easy-to-use, and easy-to-understand manner.

I suppose the set of operations is limited enough that we'll be able to cover 99.9% of uses with your inside-out approach, but I don't see a path forward with vect/vcat. And I think that all the common operations can be defined on f(::IndexFunction, ::Union{Number, AbstractArray}) and vis-versa, preventing the ambiguity madness that stems from f(::T, ::Any), f(::Any, ::T) pairs. It's probably the best way forward for now… I'm just a stick in the mud.

timholy · 2016-04-04T21:23:55Z

One further option: indexes = index_extractor(A)[1:end-1, [2;4:end]] and define the getindex(::IndexExtractor, indexes...) method to just return the selected indexes as a tuple. So it becomes index(A, index_extractor(A)[1:end-1, [2;4:end]]...), which is not exactly gorgeous, but as something hidden inside an @iterate it could be livable.

That's back to "leaning on the parser"; we'd want to add A[begin:5] as well.

Jeffrey-Sarnoff · 2016-04-04T22:41:27Z

either first,last xor begin,end xor start,finish .. (imo):
first,last is better for first(seq), last(seq) and works for seq[first] seq[last]
begin,end is better for seq[begin], seq[end] and works for begin(iter), end(iter)
start, finish is better for start(iter), finish(iter) and works for iter[start], iter[finish]
so, (again imo)
if this is more aligned with sequences than iterables then first, last
if this is more aligned with iterables than sequences then start, finish
if there is no stronger alignment, then begin,end

oxinabox · 2016-04-05T12:31:36Z

re:

If you're using div (or a multiplier), then my guess is that this is usually a sign of a manual array reshaping.

While yes using div(end,x) is used for manual reshaping.
There are other use cases.

For example, it is very common in machine learning to need to divide a dataset into test, validation and train subsets. This currently looks like:

train=raw[:, 8*end÷10]
validation=raw[:, 8*end÷10: 9*end÷10] 
test = raw[:, 9*end÷10:end]

(Remembering that ÷ is the infix for div)

Removing div (and multiplitiplication) would break this, and it is one of the "really cool things that you can do in Julia"; that sells me on the language.

timholy · 2016-07-26T13:22:39Z

Closing; experimentation underway at https://github.com/timholy/EndpointRanges.jl.

andyferris · 2017-05-01T23:45:35Z

My vote here would be to reuse the begin keyword in indexing as sugar similar to end.

While a type here like in EndPointRanges is really quite cool, the great thing about end is you can perform arbitrary operations on it. Also, the story seems less clear for multidimensional arrays (perhaps there is a trick here - yet another wrapper function for getindex).

timholy · 2017-05-02T01:12:09Z

That's aligned with my current thinking. The only negative is that you can't easily extract the indices independently of the array itself. (We can't do that for conventional arrays either, of course.)

StefanKarpinski · 2017-05-02T16:20:47Z

begin does seem like the best option. It will add some additional hairiness to the parser, but using begin inside of an indexing expression is so bizarre that I don't think it's a practical problem.

mbauman · 2017-05-02T16:37:22Z

I'm in support. We get to simplify some of the parser rules in this release since we no longer need trailingsize for partial linear indexing. We'll need some new functions, though. first(indices(A, 2)) and first(linearindices(A)) only work for arrays.

Maybe extend endof(A) to support taking a dimension? And add beginof to mirror it? Then:

A[begin:end] => A[beginof(A):endof(A)]
A[begin:end, begin:end] => A[beginof(A, 1):endof(A, 1), beginof(A, 2):endof(A, 2)]

timholy · 2017-05-02T16:52:02Z

Yes, I suspect that's the best way to proceed.

oxinabox · 2017-05-03T00:50:01Z

Maybe extend endof(A) to support taking a dimension?

For reference of anyone who doesn't know (since apparently this hasn't been mentioned anywhere in this thread?).
Currently the parser/lowerer transforms

A[end] into A[endof(A)]
and
A[x, end, y] into A[x, size(A,2), y]

Currently, in the basic Array case endof(x)=length(x), so the size(X,d) makes sense to get the dimension end of a particular element.
But if the beginning were not always zero then this does not work any longer

andyferris · 2017-05-03T01:55:57Z

That's interesting.

For StaticArrays I feel something like A[x, size(A)[2], y] would have been slightly preferable (for constant propagation). For arbitrary indices I guess simply A[x, last(indices(A)[2]), y] should be fast enough? Would last(indices(A)[N]) be suitably "fast" to replace end?

Use first/last as endpoints for Range indexing

ccca50e

mbauman reviewed Apr 2, 2016
View reviewed changes

mbauman mentioned this pull request May 14, 2016

Support for 0-indexed and arbitrary-indexed arrays #16260

Merged

cstjean mentioned this pull request May 25, 2016

Create @view macro for creating SubArrays via indexing. #16564

Merged

timholy closed this Jul 26, 2016

timholy deleted the teh/endpoint_ranges branch July 26, 2016 13:22

yuyichao mentioned this pull request Aug 19, 2017

beginof(a) analogue to endof(a)? #23354

Closed

ararslan mentioned this pull request Feb 4, 2018

start keyword for array indexing #25873

Closed

stevengj mentioned this pull request Nov 26, 2019

support a[begin] for a[firstindex(a)] #33946

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: use first/last as endpoints for Range indexing #15750

RFC: use first/last as endpoints for Range indexing #15750

timholy commented Apr 2, 2016

mbauman Apr 2, 2016

mbauman commented Apr 2, 2016

timholy commented Apr 2, 2016

eschnett commented Apr 2, 2016

mbauman commented Apr 3, 2016

timholy commented Apr 3, 2016

JeffreySarnoff commented Apr 4, 2016

timholy commented Apr 4, 2016

kmsquire commented Apr 4, 2016

mbauman commented Apr 4, 2016

timholy commented Apr 4, 2016

mbauman commented Apr 4, 2016

timholy commented Apr 4, 2016

mbauman commented Apr 4, 2016

timholy commented Apr 4, 2016

Jeffrey-Sarnoff commented Apr 4, 2016

oxinabox commented Apr 5, 2016

timholy commented Jul 26, 2016

andyferris commented May 1, 2017

timholy commented May 2, 2017

StefanKarpinski commented May 2, 2017

mbauman commented May 2, 2017

timholy commented May 2, 2017

oxinabox commented May 3, 2017

andyferris commented May 3, 2017 •

edited

Loading

RFC: use first/last as endpoints for Range indexing #15750

RFC: use first/last as endpoints for Range indexing #15750

Conversation

timholy commented Apr 2, 2016

mbauman Apr 2, 2016

Choose a reason for hiding this comment

mbauman commented Apr 2, 2016

timholy commented Apr 2, 2016

eschnett commented Apr 2, 2016

mbauman commented Apr 3, 2016

timholy commented Apr 3, 2016

JeffreySarnoff commented Apr 4, 2016

timholy commented Apr 4, 2016

kmsquire commented Apr 4, 2016

mbauman commented Apr 4, 2016

timholy commented Apr 4, 2016

mbauman commented Apr 4, 2016

timholy commented Apr 4, 2016

mbauman commented Apr 4, 2016

timholy commented Apr 4, 2016

Jeffrey-Sarnoff commented Apr 4, 2016

oxinabox commented Apr 5, 2016

timholy commented Jul 26, 2016

andyferris commented May 1, 2017

timholy commented May 2, 2017

StefanKarpinski commented May 2, 2017

mbauman commented May 2, 2017

timholy commented May 2, 2017

oxinabox commented May 3, 2017

andyferris commented May 3, 2017 • edited Loading

andyferris commented May 3, 2017 •

edited

Loading