Export oneto rather than implement range(stop) #39242

mkitti · 2021-01-14T02:11:45Z

Export `oneto` as an alternative to `range(stop)`

oneto(n) is an unambiguous alternative to range(stop)
oneto(n) may be more intuitive than 1:n for some people from distinct coding traditions (e.g. Python)
oneto(n) uses Julia's type system to allow for fast code (via Base.OneTo)

Since a single argument range mapping to Base.OneTo is controversial (see #39223), let's export the recently created oneto (all lower case) which is unambiguous and accepts a single Integer argument.

oneto was recently merged into Base via #37741. oneto is five lowercase letters like range but it clearly describes what it does. Additionally, as demonstrated by #37741 it can be extended for uses not directly involving Base.OneTo.

By exporting oneto we will make a highly optimized code path easily available for a very common use case.

base/range.jl

timholy · 2021-01-14T09:22:11Z

Can you explain the need for this? I don't think #39223 is compelling. There's a downside to creating confusion about whether people should use 1:n or oneto(n); anyone who cares can also use Base.OneTo directly.

mkitti · 2021-01-14T17:39:27Z

Executive Summary

oneto(n) is an unambiguous alternative to range(stop)
oneto(n) may be more intuitive than 1:n for some people from distinct coding traditions (e.g. Python)
oneto(n) uses Julia's type system to allow for fast code (via Base.OneTo)

Exposition

In Julia 1.7, Julia will likely have a range that can be specified fully by positional arguments as was decided in #38750 (comment) . The two and three argument version of range as range(start, stop) and range(start, stop, length) is now pending in a pull request in #39228. I have my reservations about this, but I think the ship has sailed. The decision has been made.

Upon considering two and three positional argument range, it's a natural question if there should be a one argument range. If implemented, notes from triage indicated this should be range(stop). If there were to be a range(stop), it makes sense to map it to Base.OneTo. This is part of what #39223 was trying to implement under the theme of start = 1 being a reasonable default. While a prospective range([start], stop, [length]) in Julia somewhat parallels range([start], stop, [step]) in Python, it is rather confusing in either language.

The syntax 1:n is intuitive syntax for someone coming from a MATLAB background. I read that as "one to n" in my mind. From a Python background, that is not the case at all since this reads as "slice from the 2nd element to the nth element at index n - 1". That makes my mind hurt a bit, and I imagine someone coming from Python may have a similar experience trying to understand 1:n.

The Julia community has grown significantly and is now large enough to perhaps permit some differences in coding style. Given that Base.OneTo(n) and now oneto(n) is actually faster than 1:n in some circumstances as I learned from reading your code, @timholy, would it really be terrible if some people used oneto(n) rather than 1:n if oneto(n) made more sense in their minds?

Upon coming across oneto, I thought this would be a good alternative to a single argument range(stop) or range(length). It is clear and unambiguous in terms of what it does. It could also be extended for distinct types if needed eventually. Being lowercase makes it easier to type and use. Exporting oneto would make this readable syntax even easier to use and could be an efficient and intuitive alternative to 1:n for some. At the end of the day, it may help broaden the appeal of Julia.

mbauman · 2021-01-14T19:34:10Z

The primary reason-for-being for Base.OneTo is not performance. It's for dispatch disambiguation from an offset array when implementing and manipulating axes. Now there may have been a very marginal performance difference in specific situations too, but I'd be really surprised if that still existed — especially when comparing against a literal 1:n.

In my view, if the possibility of returning a OneTo simply and straight-forwardly "fell out" of our definition of range then we might as well return it. But I don't see a driving need for a function to return a OneTo.

That said, it really surprises/distresses me how many packages are reaching for Base.OneTo to define a loop's iteration space. 😕

chriselrod · 2021-01-14T20:41:54Z

I'd expect it to make a performance difference when the range gets passed around, e.g. that view(a, Base.OneTo(59)) would have an advantage over view(a, 1:59), because you're passing around a single integer instead of 2, and the 1 is known at compile time.

julia> a = rand(100);

julia> @btime sum(view($a, 1:59))
  27.865 ns (0 allocations: 0 bytes)
29.085241951206672

julia> @btime sum(view($a, Base.OneTo(59)))
  27.558 ns (0 allocations: 0 bytes)
29.085241951206672

julia> @btime sum(view($a, 1:59))
  27.873 ns (0 allocations: 0 bytes)
29.085241951206672

julia> @btime sum(view($a, Base.OneTo(59)))
  27.528 ns (0 allocations: 0 bytes)
29.085241951206672

That's easily within the margin of noise. So I wouldn't expect much difference in typical use.

But whenever you aren't passing it around, and instead writing something like for i in 1:N, then there shouldn't be any difference, the literal 1 is also known to the compiler.

That said, it really surprises/distresses me

I didn't know it was that unidiomatic. I'll stick to 1:N if I have a fixed length N then (of course, when it corresponds to an array, axes or eachindex are preferred).

mbauman · 2021-01-14T21:41:37Z

That's an interesting example. On 1.5 I see a 2x (simd) speedup for OneTo but can't see the difference in the LLVM (or anywhere LLVM is making use of the hardcoded 1 beyond the stack-allocated struct size, for that matter. Compare code_llvm(Base.mapreduce_impl, Tuple{typeof(identity), typeof(Base.add_sum), SubArray{Float64,1,Array{Float64,1},Tuple{UnitRange{Int64}},true}, Int64, Int64, Int64}, debuginfo=:none) vs code_llvm(Base.mapreduce_impl, Tuple{typeof(identity), typeof(Base.add_sum), SubArray{Float64,1,Array{Float64,1},Tuple{Base.OneTo{Int64}},true}, Int64, Int64, Int64}, debuginfo=:none)). On 1.6/master I see identical performance.

"Distresses" is far too strong, but it sure feels funny to reach for an unexported and specialized type that's 10 more characters when there's no functional difference (looking at for loops specifically here). I know we're not zen of python's strict only-one-way-to-do it here, but this feels superfluous to me.

timholy · 2021-01-14T22:18:01Z

Yeah, I'm bothered by that too. Probably those of us who were around when we added OneTo know best that it was never intended to escape quite so thoroughly. I don't see it often, but in remarkable timing just reviewed a similar PR yesterday, JuliaImages/Images.jl#935.

Co-authored-by: Daniel Karrasch <[email protected]>

mkitti · 2021-01-14T23:53:46Z

We have not formally declared what is public vs private API: #7561 #35715 . Both issues remain open.

Regarding Base.OneTo specifically, it is documented in the Base docs which mentions nothing about it being internal.
https://docs.julialang.org/en/v1/base/math/#Base.OneTo

However, Base.OneTo is not exported which creates some suspicion that maybe it should not be used:
https://discourse.julialang.org/t/when-is-it-safe-to-use-base-oneto/12512

On the other hand, it is used in examples in the Base docs such as for mod:
https://docs.julialang.org/en/v1/base/math/#Base.mod

To clarify, I'm not suggesting we export Base.OneTo but merely oneto, which just so happens to be implemented with Base.OneTo. I specifically added the documentation that does not guarantee it will return Base.OneTo~~, but I did add a cross reference and mentioned it in NEWS.md.~~ Edit: I removed all references to Base.OneTo from the docs or NEWS.md.

I am suggesting we export oneto as an alternative to creating a single positional argument of range. This does not export the type Base.OneTo or its constructor directly.

mkitti · 2021-01-14T23:59:45Z

I should cross reference my related PR #39241 which essentially does the following:

range(; stop) = 1:stop
range(; stop::Integer) = Base.OneTo(stop)
range(; length::Integer) = Base.OneTo(length)

#39241 deals with the single keyword situation only as opposed to #39223 which tried to do a lot more.

chriselrod · 2021-01-15T05:19:03Z

That's an interesting example. On 1.5 I see a 2x (simd) speedup for OneTo but can't see the difference in the LLVM (or anywhere LLVM is making use of the hardcoded 1 beyond the stack-allocated struct size, for that matter. Compare code_llvm(Base.mapreduce_impl, Tuple{typeof(identity), typeof(Base.add_sum), SubArray{Float64,1,Array{Float64,1},Tuple{UnitRange{Int64}},true}, Int64, Int64, Int64}, debuginfo=:none) vs code_llvm(Base.mapreduce_impl, Tuple{typeof(identity), typeof(Base.add_sum), SubArray{Float64,1,Array{Float64,1},Tuple{Base.OneTo{Int64}},true}, Int64, Int64, Int64}, debuginfo=:none)). On 1.6/master I see identical performance.

I stared at that for a little while, but couldn't find the difference.
The diff between asm is

12c12
<         add     rsi, qword ptr [r14 + 24]
---
>         add     rsi, qword ptr [r14 + 16]
41c41
<         mov     r8, qword ptr [r14 + 24]
---
>         mov     r8, qword ptr [r14 + 16]

I'd have to look at the high level Julia code to see if maybe it's being taken care of outside an inline boundary.
If we take a simpler example:

function mysum(a)
    s = zero(eltype(a))
    @inbounds @simd for i ∈ eachindex(a)
        s += a[i]
    end
    s
end
code_llvm(mysum, Tuple{SubArray{Float64,1,Array{Float64,1},Tuple{UnitRange{Int64}},true}}, debuginfo=:none)
code_llvm(mysum, Tuple{SubArray{Float64,1,Array{Float64,1},Tuple{Base.OneTo{Int64}},true}}, debuginfo=:none)

code_native(mysum, Tuple{SubArray{Float64,1,Array{Float64,1},Tuple{UnitRange{Int64}},true}}, debuginfo=:none, syntax=:intel)
code_native(mysum, Tuple{SubArray{Float64,1,Array{Float64,1},Tuple{Base.OneTo{Int64}},true}}, debuginfo=:none, syntax=:intel)

From the UnitRange:

top:
  %1 = getelementptr inbounds { {}*, [1 x [2 x i64]], i64, i64 }, { {}*, [1 x [2 x i64]], i64, i64 }* %0, i64 0, i32 1, i64 0, i64 1
  %2 = getelementptr inbounds { {}*, [1 x [2 x i64]], i64, i64 }, { {}*, [1 x [2 x i64]], i64, i64 }* %0, i64 0, i32 1, i64 0, i64 0
  %3 = load i64, i64* %1, align 8
  %4 = load i64, i64* %2, align 8
  %5 = sub i64 %3, %4
  %6 = add i64 %5, 1

Base.OneTo:

  %1 = getelementptr inbounds { {}*, [1 x [1 x i64]], i64, i64 }, { {}*, [1 x [1 x i64]], i64, i64 }* %0, i64 0, i32 1, i64 0, i64 0
  %2 = load i64, i64* %1, align 8

A load, sub, and add were eliminated. The difference in assembly matches, UnittRange:

        mov     rax, qword ptr [rdi + 16]
        sub     rax, qword ptr [rdi + 8]
        inc     rax
        test    rax, rax

vs OneTo:

        mov     rax, qword ptr [rdi + 8]
        test    rax, rax

sub is also loading from memory, and then followed by an increment.

As for performance:

julia> x = rand(100);

julia> @btime mysum(view($x, 1:32))
  3.332 ns (0 allocations: 0 bytes)
17.408491415901285

julia> @btime mysum(view($x, Base.OneTo(32)))
  4.314 ns (0 allocations: 0 bytes)
17.408491415901285

julia> @btime mysum(view($x, 1:64))
  5.310 ns (0 allocations: 0 bytes)
33.568214211566435

julia> @btime mysum(view($x, Base.OneTo(64)))
  6.375 ns (0 allocations: 0 bytes)
33.568214211566435

julia> @btime mysum(view($x, 1:96))
  7.091 ns (0 allocations: 0 bytes)
47.45531942465689

julia> @btime mysum(view($x, Base.OneTo(96)))
  8.600 ns (0 allocations: 0 bytes)
47.45531942465689

Identical performance could be explained by inlining, but better?

julia> @noinline function mysum_noinline(a)
           s = zero(eltype(a))
           @inbounds @simd for i ∈ eachindex(a)
               s += a[i]
           end
           s
       end
mysum_noinline (generic function with 1 method)

julia> @btime mysum_noinline(view($x, 1:32))
  9.908 ns (0 allocations: 0 bytes)
17.408491415901285

julia> @btime mysum_noinline(view($x, Base.OneTo(32)))
  10.554 ns (0 allocations: 0 bytes)
17.408491415901285

julia> @btime mysum_noinline(view($x, 1:64))
  10.761 ns (0 allocations: 0 bytes)
33.568214211566435

julia> @btime mysum_noinline(view($x, Base.OneTo(64)))
  12.027 ns (0 allocations: 0 bytes)
33.568214211566435

julia> @btime mysum_noinline(view($x, 1:96))
  11.709 ns (0 allocations: 0 bytes)
47.45531942465689

julia> @btime mysum_noinline(view($x, Base.OneTo(96)))
  13.136 ns (0 allocations: 0 bytes)
47.45531942465689

FWIW, I restarted Julia and the difference reversed.

julia> @btime mysum_noinline(view($x, 1:32))
  11.762 ns (0 allocations: 0 bytes)
16.72168408851738

julia> @btime mysum_noinline(view($x, Base.OneTo(32)))
  8.926 ns (0 allocations: 0 bytes)
16.72168408851738

julia> @btime mysum_noinline(view($x, 1:64))
  11.040 ns (0 allocations: 0 bytes)
31.75307213919455

julia> @btime mysum_noinline(view($x, Base.OneTo(64)))
  10.593 ns (0 allocations: 0 bytes)
31.75307213919455

julia> @btime mysum_noinline(view($x, 1:96))
  12.046 ns (0 allocations: 0 bytes)
45.69956955548978

julia> @btime mysum_noinline(view($x, Base.OneTo(96)))
  11.742 ns (0 allocations: 0 bytes)
45.69956955548978

But these are just artifacts. 1 ns is much bigger than the theoretical difference we'd expect.

But it should be obvious that 2 instructions outside of a loop should make a tiny difference. A CPU running at 4GHz has 4 clock cycles per ns, and executes multiple instructions per clock cycle.

julia> using LinuxPerf#master

julia> foreachf(f::F, N, args::Vararg{<:Any,A}) where {F,A} = foreach(_ -> f(args...), Base.OneTo(N))

julia> @pstats "cpu-cycles,(instructions,branch-instructions,branch-misses),(task-clock,context-switches,cpu-migrations,page-faults),(L1-dcache-load-misses,L1-dcache-loads,L1-icache-load-misses),(dTLB-load-misses,dTLB-loads),(iTLB-load-misses,iTLB-loads)" begin
        foreachf(mysum_noinline, 10_000_000, view(x, Base.OneTo(96)))
       end
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
╶ cpu-cycles               2.48e+09   60.0%  #  4.3 cycles per ns
┌ instructions             7.02e+09   60.1%  #  2.8 insns per cycle
│ branch-instructions      1.27e+09   60.1%  # 18.1% of instructions
└ branch-misses            2.28e+06   60.1%  #  0.2% of branch instructions
┌ task-clock               5.81e+08  100.0%  # 580.8 ms
│ context-switches         0.00e+00  100.0%
│ cpu-migrations           0.00e+00  100.0%
└ page-faults              2.40e+01  100.0%
┌ L1-dcache-load-misses    1.65e+07   20.0%  #  0.7% of dcache loads
│ L1-dcache-loads          2.24e+09   20.0%
└ L1-icache-load-misses    5.32e+06   20.0%
┌ dTLB-load-misses         2.40e+05   20.0%  #  0.0% of dTLB loads
└ dTLB-loads               2.25e+09   20.0%
┌ iTLB-load-misses         2.46e+05   39.9%  # 87.6% of iTLB loads
└ iTLB-loads               2.81e+05   39.9%
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

With 2.8 instructions per cycle and 4.3 cycles per nanosecond, it would be extremely difficult to actually measure the (tiny) expected difference in performance of cutting out 2 instructions out of the many instructions it takes to run the loop. Any any other artifacts, at least some of which apply consistent (and far larger) biases throughout a Julia session (or at least many @benchmarks), make it (to quote Andrew Gelman) like trying to weigh a feather using a kitchen scale, while the feather is resting in the pouch of a kangaroo vigerously jumping up and down.

"Distresses" is far too strong, but it sure feels funny to reach for an unexported and specialized type that's 10 more characters when there's no functional difference (looking at for loops specifically here). I know we're not zen of python's strict only-one-way-to-do it here, but this feels superfluous to me.

Yeah, I'm bothered by that too. Probably those of us who were around when we added OneTo know best that it was never intended to escape quite so thoroughly. I don't see it often, but in remarkable timing just reviewed a similar PR yesterday, JuliaImages/Images.jl#935.

The reason I'd been using it for for loops on occasion was because it felt/looked more like axes or eachindex calls.

mkitti · 2021-01-17T07:20:49Z

In my view, if the possibility of returning a OneTo simply and straight-forwardly "fell out" of our definition of range then we might as well return it

The least controversial road to OneTo from range is via range(; stop) or range(; length) as in #39241. At least there you have a clearly designated property and you can assume both start and step are 1, so there is no ambiguity.

One positional argument is more challenging and controversial since you have to route from range(start; stop, length, step) and switch start to stop if we wanted to allow Floats as well as Integers. The method table ends up looking a bit strange because you never quite see stop as the first argument when you examine the output of methods(range). It could be simplified if range(stop::Integer) was the only single positional argument we would allow, but that's awkward because the only range argument that has to be an Integer otherwise is length. Because of all these issues, exporting oneto makes more sense to me and maybe less controversial.

range(start; stop=nothing, length::Union{Integer,Nothing}=nothing, step=nothing) =
    _range_positional(start, step, stop, length)

...

range(stop::Integer) = range_stop(stop)

_range_positional(stop::Any    , step::Nothing,      ::Nothing, len::Nothing) =
    _range(nothing, nothing, stop, nothing) # One arg interpreted as `stop`, could be nothing
_range_positional(start::Any    , step::Any    , stop::Any,     len::Any) =
    _range(start, step, stop, len)

...

range_stop(stop) = oneunit(stop):stop
range_stop(stop::Integer) = OneTo(stop)

julia> methods(range)
# 3 methods for generic function "range":
[1] range(stop::Integer) in Main at REPL[296]:1
[2] range(start; stop, length, step) in Main at REPL[295]:1
[3] range(start, stop; length, step) in Main at REPL[156]:1

julia> range(stop) = range_stop(stop) # Simpler implementation, but messier method table below
range (generic function with 3 methods)

julia> methods(range)
# 3 methods for generic function "range":
[1] range(stop::Integer) in Main at REPL[296]:1
[2] range(stop; stop, length, step) in Main at REPL[302]:1 # That looks messy
[3] range(start, stop; length, step) in Main at REPL[156]:1

https://github.com/JuliaLang/julia/blob/a03945e518c36837d99170a66342d00ab8de64ab/base/range.jl

I'll consider resubmitting a one positional argument range PR after the two and three positional argument range is merged as @mbauman suggested .

Seeing implementation like `Base.OneTo` in error messages may be confusing to some users (cf discussion in JuliaLang#39242, [discourse](https://discourse.julialang.org/t/promote-shape-dimension-mismatch/57529/)). This PR turns ```julia julia> ones(2, 3) + ones(3, 2) ERROR: DimensionMismatch("dimensions must match: a has dims (Base.OneTo(2), Base.OneTo(3)), b has dims (Base.OneTo(3), Base.OneTo(2)), mismatch at 1") ``` into ```julia julia> ones(2, 3) + ones(3, 2) ERROR: DimensionMismatch("dimensions must match: a has axes (1:2, 1:3), b has axes (1:3, 1:2), mismatch at 1") ``` Fixes JuliaLang#40118. Acked-by: Tamas K. Papp <[email protected]>

Seeing implementation details like `Base.OneTo` in error messages may be confusing to some users (cf discussion in #39242, [discourse](https://discourse.julialang.org/t/promote-shape-dimension-mismatch/57529/)). This PR turns ```julia julia> ones(2, 3) + ones(3, 2) ERROR: DimensionMismatch("dimensions must match: a has dims (Base.OneTo(2), Base.OneTo(3)), b has dims (Base.OneTo(3), Base.OneTo(2)), mismatch at 1") ``` into ```julia julia> ones(2, 3) + ones(3, 2) ERROR: DimensionMismatch("dimensions must match: a has size (2, 3), b has size (3, 2), mismatch at 1") ``` Fixes #40118. (This is basically #40124, but redone because I made a mess rebasing). --------- Co-authored-by: Jameson Nash <[email protected]>

mkitti added 2 commits January 13, 2021 20:58

Export oneto

f69895c

oneto: Document and test

891b87c

mkitti mentioned this pull request Jan 14, 2021

range: Assume start=1 when not given. Use OneTo #39223

Closed

oneto: fix typo in doc

63d3b26

dkarrasch reviewed Jan 14, 2021

View reviewed changes

base/range.jl Outdated Show resolved Hide resolved

Change one to 1 for oneto documentation

b2fda74

Co-authored-by: Daniel Karrasch <[email protected]>

mkitti added 3 commits January 14, 2021 19:29

oneto: Simplify docs and NEWS, removes OneTo refs

aaa7b08

oneto: Add crossref to : and range

4d75866

Merge branch 'export_oneto' of github.com:mkitti/julia into export_oneto

caaddab

mkitti changed the title ~~Export oneto~~ Export oneto rather than implement range(stop) Jan 15, 2021

mkitti changed the title ~~Export oneto rather than implement range(stop)~~ Export oneto rather than implement range(stop) Jan 15, 2021

oneto: Fix doc test for Julia 1.7

55b21d4

mkitti closed this Jan 17, 2021

tpapp mentioned this pull request Mar 20, 2021

Promote_shape dimension mismatch print. #40118

Closed

tpapp mentioned this pull request Mar 21, 2021

Normalize indices in promote_shape error messages. #40124

Closed

mkitti mentioned this pull request Apr 25, 2021

range(; stop) and range(; length). Single keyword args only. Single pos arg not allowed. #39241

Merged

tpapp mentioned this pull request Jun 22, 2021

Normalize indices in promote_shape error messages (take 2) #41311

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Export oneto rather than implement range(stop) #39242

Export oneto rather than implement range(stop) #39242

mkitti commented Jan 14, 2021 •

edited

Loading

timholy commented Jan 14, 2021

mkitti commented Jan 14, 2021

mbauman commented Jan 14, 2021 •

edited

Loading

chriselrod commented Jan 14, 2021

mbauman commented Jan 14, 2021 •

edited

Loading

timholy commented Jan 14, 2021

mkitti commented Jan 14, 2021 •

edited

Loading

mkitti commented Jan 14, 2021

chriselrod commented Jan 15, 2021

mkitti commented Jan 17, 2021

Export oneto rather than implement range(stop) #39242

Export oneto rather than implement range(stop) #39242

Conversation

mkitti commented Jan 14, 2021 • edited Loading

Export oneto as an alternative to range(stop)

timholy commented Jan 14, 2021

mkitti commented Jan 14, 2021

Executive Summary

Exposition

mbauman commented Jan 14, 2021 • edited Loading

chriselrod commented Jan 14, 2021

mbauman commented Jan 14, 2021 • edited Loading

timholy commented Jan 14, 2021

mkitti commented Jan 14, 2021 • edited Loading

mkitti commented Jan 14, 2021

chriselrod commented Jan 15, 2021

mkitti commented Jan 17, 2021

mkitti commented Jan 14, 2021 •

edited

Loading

Export `oneto` as an alternative to `range(stop)`

mbauman commented Jan 14, 2021 •

edited

Loading

mbauman commented Jan 14, 2021 •

edited

Loading

mkitti commented Jan 14, 2021 •

edited

Loading