Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export oneto rather than implement range(stop) #39242

Closed
wants to merge 8 commits into from

Conversation

mkitti
Copy link
Contributor

@mkitti mkitti commented Jan 14, 2021

Export oneto as an alternative to range(stop)

  • oneto(n) is an unambiguous alternative to range(stop)
  • oneto(n) may be more intuitive than 1:n for some people from distinct coding traditions (e.g. Python)
  • oneto(n) uses Julia's type system to allow for fast code (via Base.OneTo)

Since a single argument range mapping to Base.OneTo is controversial (see #39223), let's export the recently created oneto (all lower case) which is unambiguous and accepts a single Integer argument.

oneto was recently merged into Base via #37741. oneto is five lowercase letters like range but it clearly describes what it does. Additionally, as demonstrated by #37741 it can be extended for uses not directly involving Base.OneTo.

By exporting oneto we will make a highly optimized code path easily available for a very common use case.

base/range.jl Outdated Show resolved Hide resolved
@timholy
Copy link
Member

timholy commented Jan 14, 2021

Can you explain the need for this? I don't think #39223 is compelling. There's a downside to creating confusion about whether people should use 1:n or oneto(n); anyone who cares can also use Base.OneTo directly.

@mkitti
Copy link
Contributor Author

mkitti commented Jan 14, 2021

Executive Summary

  • oneto(n) is an unambiguous alternative to range(stop)
  • oneto(n) may be more intuitive than 1:n for some people from distinct coding traditions (e.g. Python)
  • oneto(n) uses Julia's type system to allow for fast code (via Base.OneTo)

Exposition

In Julia 1.7, Julia will likely have a range that can be specified fully by positional arguments as was decided in #38750 (comment) . The two and three argument version of range as range(start, stop) and range(start, stop, length) is now pending in a pull request in #39228. I have my reservations about this, but I think the ship has sailed. The decision has been made.

Upon considering two and three positional argument range, it's a natural question if there should be a one argument range. If implemented, notes from triage indicated this should be range(stop). If there were to be a range(stop), it makes sense to map it to Base.OneTo. This is part of what #39223 was trying to implement under the theme of start = 1 being a reasonable default. While a prospective range([start], stop, [length]) in Julia somewhat parallels range([start], stop, [step]) in Python, it is rather confusing in either language.

The syntax 1:n is intuitive syntax for someone coming from a MATLAB background. I read that as "one to n" in my mind. From a Python background, that is not the case at all since this reads as "slice from the 2nd element to the nth element at index n - 1". That makes my mind hurt a bit, and I imagine someone coming from Python may have a similar experience trying to understand 1:n.

The Julia community has grown significantly and is now large enough to perhaps permit some differences in coding style. Given that Base.OneTo(n) and now oneto(n) is actually faster than 1:n in some circumstances as I learned from reading your code, @timholy, would it really be terrible if some people used oneto(n) rather than 1:n if oneto(n) made more sense in their minds?

Upon coming across oneto, I thought this would be a good alternative to a single argument range(stop) or range(length). It is clear and unambiguous in terms of what it does. It could also be extended for distinct types if needed eventually. Being lowercase makes it easier to type and use. Exporting oneto would make this readable syntax even easier to use and could be an efficient and intuitive alternative to 1:n for some. At the end of the day, it may help broaden the appeal of Julia.

@mbauman
Copy link
Member

mbauman commented Jan 14, 2021

The primary reason-for-being for Base.OneTo is not performance. It's for dispatch disambiguation from an offset array when implementing and manipulating axes. Now there may have been a very marginal performance difference in specific situations too, but I'd be really surprised if that still existed — especially when comparing against a literal 1:n.

In my view, if the possibility of returning a OneTo simply and straight-forwardly "fell out" of our definition of range then we might as well return it. But I don't see a driving need for a function to return a OneTo.

That said, it really surprises/distresses me how many packages are reaching for Base.OneTo to define a loop's iteration space. 😕

@chriselrod
Copy link
Contributor

I'd expect it to make a performance difference when the range gets passed around, e.g. that view(a, Base.OneTo(59)) would have an advantage over view(a, 1:59), because you're passing around a single integer instead of 2, and the 1 is known at compile time.

julia> a = rand(100);

julia> @btime sum(view($a, 1:59))
  27.865 ns (0 allocations: 0 bytes)
29.085241951206672

julia> @btime sum(view($a, Base.OneTo(59)))
  27.558 ns (0 allocations: 0 bytes)
29.085241951206672

julia> @btime sum(view($a, 1:59))
  27.873 ns (0 allocations: 0 bytes)
29.085241951206672

julia> @btime sum(view($a, Base.OneTo(59)))
  27.528 ns (0 allocations: 0 bytes)
29.085241951206672

That's easily within the margin of noise. So I wouldn't expect much difference in typical use.

But whenever you aren't passing it around, and instead writing something like for i in 1:N, then there shouldn't be any difference, the literal 1 is also known to the compiler.

That said, it really surprises/distresses me

I didn't know it was that unidiomatic. I'll stick to 1:N if I have a fixed length N then (of course, when it corresponds to an array, axes or eachindex are preferred).

@mbauman
Copy link
Member

mbauman commented Jan 14, 2021

That's an interesting example. On 1.5 I see a 2x (simd) speedup for OneTo but can't see the difference in the LLVM (or anywhere LLVM is making use of the hardcoded 1 beyond the stack-allocated struct size, for that matter. Compare code_llvm(Base.mapreduce_impl, Tuple{typeof(identity), typeof(Base.add_sum), SubArray{Float64,1,Array{Float64,1},Tuple{UnitRange{Int64}},true}, Int64, Int64, Int64}, debuginfo=:none) vs code_llvm(Base.mapreduce_impl, Tuple{typeof(identity), typeof(Base.add_sum), SubArray{Float64,1,Array{Float64,1},Tuple{Base.OneTo{Int64}},true}, Int64, Int64, Int64}, debuginfo=:none)). On 1.6/master I see identical performance.

"Distresses" is far too strong, but it sure feels funny to reach for an unexported and specialized type that's 10 more characters when there's no functional difference (looking at for loops specifically here). I know we're not zen of python's strict only-one-way-to-do it here, but this feels superfluous to me.

@timholy
Copy link
Member

timholy commented Jan 14, 2021

Yeah, I'm bothered by that too. Probably those of us who were around when we added OneTo know best that it was never intended to escape quite so thoroughly. I don't see it often, but in remarkable timing just reviewed a similar PR yesterday, JuliaImages/Images.jl#935.

@mkitti
Copy link
Contributor Author

mkitti commented Jan 14, 2021

We have not formally declared what is public vs private API: #7561 #35715 . Both issues remain open.

Regarding Base.OneTo specifically, it is documented in the Base docs which mentions nothing about it being internal.
https://docs.julialang.org/en/v1/base/math/#Base.OneTo

However, Base.OneTo is not exported which creates some suspicion that maybe it should not be used:
https://discourse.julialang.org/t/when-is-it-safe-to-use-base-oneto/12512

On the other hand, it is used in examples in the Base docs such as for mod:
https://docs.julialang.org/en/v1/base/math/#Base.mod

To clarify, I'm not suggesting we export Base.OneTo but merely oneto, which just so happens to be implemented with Base.OneTo. I specifically added the documentation that does not guarantee it will return Base.OneTo, but I did add a cross reference and mentioned it in NEWS.md. Edit: I removed all references to Base.OneTo from the docs or NEWS.md.

I am suggesting we export oneto as an alternative to creating a single positional argument of range. This does not export the type Base.OneTo or its constructor directly.

@mkitti
Copy link
Contributor Author

mkitti commented Jan 14, 2021

I should cross reference my related PR #39241 which essentially does the following:

range(; stop) = 1:stop
range(; stop::Integer) = Base.OneTo(stop)
range(; length::Integer) = Base.OneTo(length) 

#39241 deals with the single keyword situation only as opposed to #39223 which tried to do a lot more.

@mkitti mkitti changed the title Export oneto Export oneto rather than implement range(stop) Jan 15, 2021
@mkitti mkitti changed the title Export oneto rather than implement range(stop) Export oneto rather than implement range(stop) Jan 15, 2021
@chriselrod
Copy link
Contributor

That's an interesting example. On 1.5 I see a 2x (simd) speedup for OneTo but can't see the difference in the LLVM (or anywhere LLVM is making use of the hardcoded 1 beyond the stack-allocated struct size, for that matter. Compare code_llvm(Base.mapreduce_impl, Tuple{typeof(identity), typeof(Base.add_sum), SubArray{Float64,1,Array{Float64,1},Tuple{UnitRange{Int64}},true}, Int64, Int64, Int64}, debuginfo=:none) vs code_llvm(Base.mapreduce_impl, Tuple{typeof(identity), typeof(Base.add_sum), SubArray{Float64,1,Array{Float64,1},Tuple{Base.OneTo{Int64}},true}, Int64, Int64, Int64}, debuginfo=:none)). On 1.6/master I see identical performance.

I stared at that for a little while, but couldn't find the difference.
The diff between asm is

12c12
<         add     rsi, qword ptr [r14 + 24]
---
>         add     rsi, qword ptr [r14 + 16]
41c41
<         mov     r8, qword ptr [r14 + 24]
---
>         mov     r8, qword ptr [r14 + 16]

I'd have to look at the high level Julia code to see if maybe it's being taken care of outside an inline boundary.
If we take a simpler example:

function mysum(a)
    s = zero(eltype(a))
    @inbounds @simd for i  eachindex(a)
        s += a[i]
    end
    s
end
code_llvm(mysum, Tuple{SubArray{Float64,1,Array{Float64,1},Tuple{UnitRange{Int64}},true}}, debuginfo=:none)
code_llvm(mysum, Tuple{SubArray{Float64,1,Array{Float64,1},Tuple{Base.OneTo{Int64}},true}}, debuginfo=:none)

code_native(mysum, Tuple{SubArray{Float64,1,Array{Float64,1},Tuple{UnitRange{Int64}},true}}, debuginfo=:none, syntax=:intel)
code_native(mysum, Tuple{SubArray{Float64,1,Array{Float64,1},Tuple{Base.OneTo{Int64}},true}}, debuginfo=:none, syntax=:intel)

From the UnitRange:

top:
  %1 = getelementptr inbounds { {}*, [1 x [2 x i64]], i64, i64 }, { {}*, [1 x [2 x i64]], i64, i64 }* %0, i64 0, i32 1, i64 0, i64 1
  %2 = getelementptr inbounds { {}*, [1 x [2 x i64]], i64, i64 }, { {}*, [1 x [2 x i64]], i64, i64 }* %0, i64 0, i32 1, i64 0, i64 0
  %3 = load i64, i64* %1, align 8
  %4 = load i64, i64* %2, align 8
  %5 = sub i64 %3, %4
  %6 = add i64 %5, 1

Base.OneTo:

  %1 = getelementptr inbounds { {}*, [1 x [1 x i64]], i64, i64 }, { {}*, [1 x [1 x i64]], i64, i64 }* %0, i64 0, i32 1, i64 0, i64 0
  %2 = load i64, i64* %1, align 8

A load, sub, and add were eliminated. The difference in assembly matches, UnittRange:

        mov     rax, qword ptr [rdi + 16]
        sub     rax, qword ptr [rdi + 8]
        inc     rax
        test    rax, rax

vs OneTo:

        mov     rax, qword ptr [rdi + 8]
        test    rax, rax

sub is also loading from memory, and then followed by an increment.

As for performance:

julia> x = rand(100);

julia> @btime mysum(view($x, 1:32))
  3.332 ns (0 allocations: 0 bytes)
17.408491415901285

julia> @btime mysum(view($x, Base.OneTo(32)))
  4.314 ns (0 allocations: 0 bytes)
17.408491415901285

julia> @btime mysum(view($x, 1:64))
  5.310 ns (0 allocations: 0 bytes)
33.568214211566435

julia> @btime mysum(view($x, Base.OneTo(64)))
  6.375 ns (0 allocations: 0 bytes)
33.568214211566435

julia> @btime mysum(view($x, 1:96))
  7.091 ns (0 allocations: 0 bytes)
47.45531942465689

julia> @btime mysum(view($x, Base.OneTo(96)))
  8.600 ns (0 allocations: 0 bytes)
47.45531942465689

Identical performance could be explained by inlining, but better?

julia> @noinline function mysum_noinline(a)
           s = zero(eltype(a))
           @inbounds @simd for i  eachindex(a)
               s += a[i]
           end
           s
       end
mysum_noinline (generic function with 1 method)

julia> @btime mysum_noinline(view($x, 1:32))
  9.908 ns (0 allocations: 0 bytes)
17.408491415901285

julia> @btime mysum_noinline(view($x, Base.OneTo(32)))
  10.554 ns (0 allocations: 0 bytes)
17.408491415901285

julia> @btime mysum_noinline(view($x, 1:64))
  10.761 ns (0 allocations: 0 bytes)
33.568214211566435

julia> @btime mysum_noinline(view($x, Base.OneTo(64)))
  12.027 ns (0 allocations: 0 bytes)
33.568214211566435

julia> @btime mysum_noinline(view($x, 1:96))
  11.709 ns (0 allocations: 0 bytes)
47.45531942465689

julia> @btime mysum_noinline(view($x, Base.OneTo(96)))
  13.136 ns (0 allocations: 0 bytes)
47.45531942465689

FWIW, I restarted Julia and the difference reversed.

julia> @btime mysum_noinline(view($x, 1:32))
  11.762 ns (0 allocations: 0 bytes)
16.72168408851738

julia> @btime mysum_noinline(view($x, Base.OneTo(32)))
  8.926 ns (0 allocations: 0 bytes)
16.72168408851738

julia> @btime mysum_noinline(view($x, 1:64))
  11.040 ns (0 allocations: 0 bytes)
31.75307213919455

julia> @btime mysum_noinline(view($x, Base.OneTo(64)))
  10.593 ns (0 allocations: 0 bytes)
31.75307213919455

julia> @btime mysum_noinline(view($x, 1:96))
  12.046 ns (0 allocations: 0 bytes)
45.69956955548978

julia> @btime mysum_noinline(view($x, Base.OneTo(96)))
  11.742 ns (0 allocations: 0 bytes)
45.69956955548978

But these are just artifacts. 1 ns is much bigger than the theoretical difference we'd expect.

But it should be obvious that 2 instructions outside of a loop should make a tiny difference. A CPU running at 4GHz has 4 clock cycles per ns, and executes multiple instructions per clock cycle.

julia> using LinuxPerf#master

julia> foreachf(f::F, N, args::Vararg{<:Any,A}) where {F,A} = foreach(_ -> f(args...), Base.OneTo(N))

julia> @pstats "cpu-cycles,(instructions,branch-instructions,branch-misses),(task-clock,context-switches,cpu-migrations,page-faults),(L1-dcache-load-misses,L1-dcache-loads,L1-icache-load-misses),(dTLB-load-misses,dTLB-loads),(iTLB-load-misses,iTLB-loads)" begin
        foreachf(mysum_noinline, 10_000_000, view(x, Base.OneTo(96)))
       end
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
╶ cpu-cycles               2.48e+09   60.0%  #  4.3 cycles per ns
┌ instructions             7.02e+09   60.1%  #  2.8 insns per cycle
│ branch-instructions      1.27e+09   60.1%  # 18.1% of instructions
└ branch-misses            2.28e+06   60.1%  #  0.2% of branch instructions
┌ task-clock               5.81e+08  100.0%  # 580.8 ms
│ context-switches         0.00e+00  100.0%
│ cpu-migrations           0.00e+00  100.0%
└ page-faults              2.40e+01  100.0%
┌ L1-dcache-load-misses    1.65e+07   20.0%  #  0.7% of dcache loads
│ L1-dcache-loads          2.24e+09   20.0%
└ L1-icache-load-misses    5.32e+06   20.0%
┌ dTLB-load-misses         2.40e+05   20.0%  #  0.0% of dTLB loads
└ dTLB-loads               2.25e+09   20.0%
┌ iTLB-load-misses         2.46e+05   39.9%  # 87.6% of iTLB loads
└ iTLB-loads               2.81e+05   39.9%
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

With 2.8 instructions per cycle and 4.3 cycles per nanosecond, it would be extremely difficult to actually measure the (tiny) expected difference in performance of cutting out 2 instructions out of the many instructions it takes to run the loop. Any any other artifacts, at least some of which apply consistent (and far larger) biases throughout a Julia session (or at least many @benchmarks), make it (to quote Andrew Gelman) like trying to weigh a feather using a kitchen scale, while the feather is resting in the pouch of a kangaroo vigerously jumping up and down.

"Distresses" is far too strong, but it sure feels funny to reach for an unexported and specialized type that's 10 more characters when there's no functional difference (looking at for loops specifically here). I know we're not zen of python's strict only-one-way-to-do it here, but this feels superfluous to me.

Yeah, I'm bothered by that too. Probably those of us who were around when we added OneTo know best that it was never intended to escape quite so thoroughly. I don't see it often, but in remarkable timing just reviewed a similar PR yesterday, JuliaImages/Images.jl#935.

The reason I'd been using it for for loops on occasion was because it felt/looked more like axes or eachindex calls.

@mkitti
Copy link
Contributor Author

mkitti commented Jan 17, 2021

In my view, if the possibility of returning a OneTo simply and straight-forwardly "fell out" of our definition of range then we might as well return it

The least controversial road to OneTo from range is via range(; stop) or range(; length) as in #39241. At least there you have a clearly designated property and you can assume both start and step are 1, so there is no ambiguity.

One positional argument is more challenging and controversial since you have to route from range(start; stop, length, step) and switch start to stop if we wanted to allow Floats as well as Integers. The method table ends up looking a bit strange because you never quite see stop as the first argument when you examine the output of methods(range). It could be simplified if range(stop::Integer) was the only single positional argument we would allow, but that's awkward because the only range argument that has to be an Integer otherwise is length. Because of all these issues, exporting oneto makes more sense to me and maybe less controversial.

range(start; stop=nothing, length::Union{Integer,Nothing}=nothing, step=nothing) =
    _range_positional(start, step, stop, length)

...

range(stop::Integer) = range_stop(stop)

_range_positional(stop::Any    , step::Nothing,      ::Nothing, len::Nothing) =
    _range(nothing, nothing, stop, nothing) # One arg interpreted as `stop`, could be nothing
_range_positional(start::Any    , step::Any    , stop::Any,     len::Any) =
    _range(start, step, stop, len)

...

range_stop(stop) = oneunit(stop):stop
range_stop(stop::Integer) = OneTo(stop)
julia> methods(range)
# 3 methods for generic function "range":
[1] range(stop::Integer) in Main at REPL[296]:1
[2] range(start; stop, length, step) in Main at REPL[295]:1
[3] range(start, stop; length, step) in Main at REPL[156]:1

julia> range(stop) = range_stop(stop) # Simpler implementation, but messier method table below
range (generic function with 3 methods)

julia> methods(range)
# 3 methods for generic function "range":
[1] range(stop::Integer) in Main at REPL[296]:1
[2] range(stop; stop, length, step) in Main at REPL[302]:1 # That looks messy
[3] range(start, stop; length, step) in Main at REPL[156]:1

https://github.com/JuliaLang/julia/blob/a03945e518c36837d99170a66342d00ab8de64ab/base/range.jl

I'll consider resubmitting a one positional argument range PR after the two and three positional argument range is merged as @mbauman suggested .

@mkitti mkitti closed this Jan 17, 2021
tpapp added a commit to tpapp/julia that referenced this pull request Mar 21, 2021
Seeing implementation like `Base.OneTo` in error messages may be
confusing to some users (cf discussion in JuliaLang#39242,
[discourse](https://discourse.julialang.org/t/promote-shape-dimension-mismatch/57529/)).

This PR turns
```julia
julia> ones(2, 3) + ones(3, 2)
ERROR: DimensionMismatch("dimensions must match: a has dims (Base.OneTo(2), Base.OneTo(3)), b has dims (Base.OneTo(3), Base.OneTo(2)), mismatch at 1")
```
into
```julia
julia> ones(2, 3) + ones(3, 2)
ERROR: DimensionMismatch("dimensions must match: a has axes (1:2, 1:3), b has axes (1:3, 1:2), mismatch at 1")
```

Fixes JuliaLang#40118.

Acked-by: Tamas K. Papp <[email protected]>
tpapp added a commit to tpapp/julia that referenced this pull request Jun 3, 2021
Seeing implementation like `Base.OneTo` in error messages may be
confusing to some users (cf discussion in JuliaLang#39242,
[discourse](https://discourse.julialang.org/t/promote-shape-dimension-mismatch/57529/)).

This PR turns
```julia
julia> ones(2, 3) + ones(3, 2)
ERROR: DimensionMismatch("dimensions must match: a has dims (Base.OneTo(2), Base.OneTo(3)), b has dims (Base.OneTo(3), Base.OneTo(2)), mismatch at 1")
```
into
```julia
julia> ones(2, 3) + ones(3, 2)
ERROR: DimensionMismatch("dimensions must match: a has axes (1:2, 1:3), b has axes (1:3, 1:2), mismatch at 1")
```

Fixes JuliaLang#40118.

Acked-by: Tamas K. Papp <[email protected]>
tpapp added a commit to tpapp/julia that referenced this pull request Jun 22, 2021
Seeing implementation like `Base.OneTo` in error messages may be
confusing to some users (cf discussion in JuliaLang#39242,
[discourse](https://discourse.julialang.org/t/promote-shape-dimension-mismatch/57529/)).

This PR turns
```julia
julia> ones(2, 3) + ones(3, 2)
ERROR: DimensionMismatch("dimensions must match: a has dims (Base.OneTo(2), Base.OneTo(3)), b has dims (Base.OneTo(3), Base.OneTo(2)), mismatch at 1")
```
into
```julia
julia> ones(2, 3) + ones(3, 2)
ERROR: DimensionMismatch("dimensions must match: a has axes (1:2, 1:3), b has axes (1:3, 1:2), mismatch at 1")
```

Fixes JuliaLang#40118.

Acked-by: Tamas K. Papp <[email protected]>
vtjnash pushed a commit to tpapp/julia that referenced this pull request Feb 3, 2024
Seeing implementation like `Base.OneTo` in error messages may be
confusing to some users (cf discussion in JuliaLang#39242,
[discourse](https://discourse.julialang.org/t/promote-shape-dimension-mismatch/57529/)).

This PR turns
```julia
julia> ones(2, 3) + ones(3, 2)
ERROR: DimensionMismatch("dimensions must match: a has dims (Base.OneTo(2), Base.OneTo(3)), b has dims (Base.OneTo(3), Base.OneTo(2)), mismatch at 1")
```
into
```julia
julia> ones(2, 3) + ones(3, 2)
ERROR: DimensionMismatch("dimensions must match: a has axes (1:2, 1:3), b has axes (1:3, 1:2), mismatch at 1")
```

Fixes JuliaLang#40118.

Acked-by: Tamas K. Papp <[email protected]>
vtjnash added a commit that referenced this pull request Feb 4, 2024
Seeing implementation details like `Base.OneTo` in error messages may
be confusing to some users (cf discussion in #39242,
[discourse](https://discourse.julialang.org/t/promote-shape-dimension-mismatch/57529/)).

This PR turns
```julia
julia> ones(2, 3) + ones(3, 2)
ERROR: DimensionMismatch("dimensions must match: a has dims (Base.OneTo(2), Base.OneTo(3)), b has dims (Base.OneTo(3), Base.OneTo(2)), mismatch at 1")
```
into
```julia
julia> ones(2, 3) + ones(3, 2)
ERROR: DimensionMismatch("dimensions must match: a has size (2, 3), b has size (3, 2), mismatch at 1")
```

Fixes #40118. 

(This is basically #40124, but redone because I made a mess rebasing).

---------

Co-authored-by: Jameson Nash <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants