Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fast implementation of digits with base a power of 2 #31722

Merged
merged 3 commits into from
May 13, 2019

Conversation

giordano
Copy link
Contributor

This PR adds a fast specialised implementation of digits(::Integer, base = 2). @ararslan said that making base = 2 a special case wouldn't be a bad idea. Performance with other bases shouldn't be affected.

Before this PR:

julia> @benchmark digits(3, base = 2)
BenchmarkTools.Trial: 
  memory estimate:  96 bytes
  allocs estimate:  1
  --------------
  minimum time:     40.681 ns (0.00% GC)
  median time:      41.959 ns (0.00% GC)
  mean time:        52.063 ns (14.25% GC)
  maximum time:     43.487 μs (99.85% GC)
  --------------
  samples:          10000
  evals/sample:     991

julia> @benchmark digits(3, base = 10)
BenchmarkTools.Trial: 
  memory estimate:  96 bytes
  allocs estimate:  1
  --------------
  minimum time:     35.241 ns (0.00% GC)
  median time:      36.893 ns (0.00% GC)
  mean time:        47.171 ns (16.41% GC)
  maximum time:     43.818 μs (99.88% GC)
  --------------
  samples:          10000
  evals/sample:     992

julia> @benchmark digits(1234567890123456, base = 2)
BenchmarkTools.Trial: 
  memory estimate:  496 bytes
  allocs estimate:  1
  --------------
  minimum time:     658.308 ns (0.00% GC)
  median time:      667.686 ns (0.00% GC)
  mean time:        730.018 ns (5.45% GC)
  maximum time:     247.337 μs (99.70% GC)
  --------------
  samples:          10000
  evals/sample:     159

julia> @benchmark digits(1234567890123456, base = 10)
BenchmarkTools.Trial: 
  memory estimate:  208 bytes
  allocs estimate:  1
  --------------
  minimum time:     222.505 ns (0.00% GC)
  median time:      225.409 ns (0.00% GC)
  mean time:        247.892 ns (5.90% GC)
  maximum time:     87.951 μs (99.67% GC)
  --------------
  samples:          10000
  evals/sample:     487

After this PR:

julia> @benchmark digits(3, base = 2)
BenchmarkTools.Trial: 
  memory estimate:  96 bytes
  allocs estimate:  1
  --------------
  minimum time:     34.707 ns (0.00% GC)
  median time:      35.394 ns (0.00% GC)
  mean time:        43.532 ns (14.30% GC)
  maximum time:     37.210 μs (99.85% GC)
  --------------
  samples:          10000
  evals/sample:     993

julia> @benchmark digits(3, base = 10)
BenchmarkTools.Trial: 
  memory estimate:  96 bytes
  allocs estimate:  1
  --------------
  minimum time:     34.356 ns (0.00% GC)
  median time:      35.174 ns (0.00% GC)
  mean time:        43.805 ns (14.93% GC)
  maximum time:     38.129 μs (99.86% GC)
  --------------
  samples:          10000
  evals/sample:     993

julia> @benchmark digits(1234567890123456, base = 2)
BenchmarkTools.Trial: 
  memory estimate:  496 bytes
  allocs estimate:  1
  --------------
  minimum time:     75.659 ns (0.00% GC)
  median time:      77.740 ns (0.00% GC)
  mean time:        88.100 ns (8.38% GC)
  maximum time:     30.302 μs (99.61% GC)
  --------------
  samples:          10000
  evals/sample:     971

julia> @benchmark digits(1234567890123456, base = 10)
BenchmarkTools.Trial: 
  memory estimate:  208 bytes
  allocs estimate:  1
  --------------
  minimum time:     215.581 ns (0.00% GC)
  median time:      217.726 ns (0.00% GC)
  mean time:        236.653 ns (5.16% GC)
  maximum time:     73.486 μs (99.63% GC)
  --------------
  samples:          10000
  evals/sample:     525

@ararslan ararslan added the performance Must go faster label Apr 14, 2019
@antoine-levitt
Copy link
Contributor

Have you tried benchmarking inside a function? Can't the compiler do constant propagation by itself here?

@giordano
Copy link
Contributor Author

Like this?

julia> function test()
           n = 1234567890123456
           digits(n, base = 2)
       end
test (generic function with 1 method)

julia> @btime test();
  74.073 ns (1 allocation: 496 bytes)

julia> function test()
           n = 1234567890123456
           digits(n, base = 10)
       end
test (generic function with 1 method)

julia> @btime test();
  215.014 ns (1 allocation: 208 bytes)

@giordano
Copy link
Contributor Author

With this change

@@ -755,9 +755,17 @@ function digits!(a::AbstractVector{T}, n::Integer; base::Integer = 10) where T<:
     isempty(a) && return a
 
     if base > 0
-        for i in eachindex(a)
-            n, d = divrem(n, base)
-            a[i] = d
+        if ispow2(base)
+            k = trailing_zeros(base)
+            c = base - 1
+            for i in eachindex(a)
+                a[i] = (n >> (k * (i - 1))) & c
+            end
+        else
+            for i in eachindex(a)
+                n, d = divrem(n, base)
+                a[i] = d
+            end
         end
     else
         # manually peel one loop iteration for type stability

we can have specialised implementation for all bases that are powers of 2. Here are benchmarks with the proposed change to the PR:

# Before
julia> @btime digits(1, base = 2);
  33.756 ns (1 allocation: 96 bytes)

julia> @btime digits(1234567890123456, base = 2);
  660.050 ns (1 allocation: 496 bytes)

julia> @btime digits(81985529216486895, base = 2);
  759.219 ns (1 allocation: 544 bytes)

julia> @btime digits(1, base = 16);
  34.774 ns (1 allocation: 96 bytes)

julia> @btime digits(81985529216486895, base = 16);
  219.475 ns (1 allocation: 208 bytes)


# After
julia> @btime digits(1, base = 2);
  33.044 ns (1 allocation: 96 bytes)

julia> @btime digits(1234567890123456, base = 2);
  111.649 ns (1 allocation: 496 bytes)

julia> @btime digits(81985529216486895, base = 2);
  122.859 ns (1 allocation: 544 bytes)

julia> @btime digits(1, base = 16);
  32.785 ns (1 allocation: 96 bytes)

julia> @btime digits(81985529216486895, base = 16);
  58.451 ns (1 allocation: 208 bytes)

Note that performance of base = 2 gets a bit worse compared to the current version of the PR, but still way better than the generic implementation in master.

base/intfuncs.jl Outdated Show resolved Hide resolved
@giordano
Copy link
Contributor Author

Ok, I've fixed the first index as suggested by Rafael and implemented the fast method for all powers of 2, as well as added more tests.

@giordano giordano changed the title Fast implementation of digits(::Integer, base = 2) Fast implementation of digits with base a power of 2 Apr 15, 2019
@giordano
Copy link
Contributor Author

Bump 🙂

@StefanKarpinski
Copy link
Member

I guess the question is who should make a decision here? This seems like a straightforward speedup. And which version should we go with?

@giordano
Copy link
Contributor Author

who should make a decision here?

I have no idea 😂

And which version should we go with?

I'd suggest the last one with any power of 2, I think it's a common case to see the expansion in base 8 and 16, not only base 2. Anyway, I can easily revert to the previous version if we want to prioritise base = 2.

@stevengj
Copy link
Member

Should there be a faster ndigits in this case too, or does it spend a negligible amount of time in ndigits?

@giordano
Copy link
Contributor Author

giordano commented Apr 25, 2019

It doesn't seem that ndigits has a significant impact on performance of digits:

julia> Profile.clear(); @profile for n in 1:1_000_000
           digits(1234567890123456, base = 2)
       end

julia> Profile.print()
133 ./task.jl:268; (::getfield(REPL, Symbol("##26#27")){REPL.REPLBackend})()
 133 ...e/julia/stdlib/v1.3/REPL/src/REPL.jl:118; macro expansion
  133 ...e/julia/stdlib/v1.3/REPL/src/REPL.jl:86; eval_user_input(::Any, ::REPL.REPLBackend)
   133 ./boot.jl:330; eval(::Module, ::Any)
    1   ./intfuncs.jl:755; #digits!#326(::Int64, ::typeof(digits!), ::Array{Int64...
    1   ./intfuncs.jl:513; ndigits0z(::Int64, ::Int64)
    131 .../stdlib/v1.3/Profile/src/Profile.jl:25; top-level scope
     131 ./REPL[17]:2; macro expansion
      131 ./none:0; #digits
       131 ./intfuncs.jl:710; #digits#324
        131 ./none:0; #digits
         131 ./intfuncs.jl:714; #digits#325
          34 ./array.jl:441; zeros
           26 ./array.jl:445; zeros
            26 ./boot.jl:413; Type
             26 ./boot.jl:404; Type
           8  ./array.jl:446; zeros
            8 ./array.jl:295; fill!(::Array{Int64,1}, ::Int64)
             7 ./array.jl:766; setindex!
             1 ./range.jl:595; iterate
              1 ./promotion.jl:403; ==
          93 ./none:0; #digits!
           3  ./int.jl:0; #digits!#326(::Int64, ::typeof(digits!), ::Array{In...
           9  ./int.jl:444; #digits!#326(::Int64, ::typeof(digits!), ::Array{In...
           3  ./intfuncs.jl:0; #digits!#326(::Int64, ::typeof(digits!), ::Array{In...
           1  ./intfuncs.jl:752; #digits!#326(::Int64, ::typeof(digits!), ::Array{In...
           2  ./intfuncs.jl:755; #digits!#326(::Int64, ::typeof(digits!), ::Array{In...
           75 ./intfuncs.jl:762; #digits!#326(::Int64, ::typeof(digits!), ::Array{In...
            15 ./array.jl:766; setindex!
            10 ./int.jl:293; &
            10 ./int.jl:437; >>
            38 ./int.jl:444; >>
             4  ./array.jl:766; <<
             27 ./int.jl:439; <<
             7  ./int.jl:424; <=
          4  ./none:0; #ndigits
           4 ./intfuncs.jl:544; #ndigits#322
            3 ./intfuncs.jl:513; ndigits0z(::Int64, ::Int64)
             1 ./int.jl:49; <
            1 ./intfuncs.jl:516; ndigits0z(::Int64, ::Int64)

Actually, ndigits is already optimised for base = 2, 8, 16:

julia/base/intfuncs.jl

Lines 459 to 461 in 93c9ae4

b == 2 && return sizeof(x)<<3 - leading_zeros(x)
b == 8 && return (sizeof(x)<<3 - leading_zeros(x) + 2) ÷ 3
b == 16 && return sizeof(x)<<1 - leading_zeros(x)>>2

Also in a non-optimised case (e.g., base = 3), the ndigits takes less than 7% of the total time of digits:

julia> Profile.clear(); @profile for n in 1:1_000_000
           digits(1234567890123456, base = 3)
       end

julia> Profile.print()
484 ./task.jl:268; (::getfield(REPL, Symbol("##26#27")){REPL.REPLBackend})()
 484 ...e/julia/stdlib/v1.3/REPL/src/REPL.jl:118; macro expansion
  484 ...e/julia/stdlib/v1.3/REPL/src/REPL.jl:86; eval_user_input(::Any, ::REPL.REPLBackend)
   484 ./boot.jl:330; eval(::Module, ::Any)
    1   ./intfuncs.jl:752; #digits!#326(::Int64, ::typeof(digits!), ::Array{Int64...
    1   ./intfuncs.jl:513; ndigits0z(::Int64, ::Int64)
    1   .../stdlib/v1.3/Profile/src/Profile.jl:0; top-level scope
    477 .../stdlib/v1.3/Profile/src/Profile.jl:25; top-level scope
     477 ./REPL[22]:2; macro expansion
      477 ./none:0; #digits
       477 ./intfuncs.jl:710; #digits#324
        477 ./none:0; #digits
         477 ./intfuncs.jl:714; #digits#325
          60  ./array.jl:441; zeros
           53 ./array.jl:445; zeros
            53 ./boot.jl:413; Type
             53 ./boot.jl:404; Type
           7  ./array.jl:446; zeros
            2 ./array.jl:294; fill!(::Array{Int64,1}, ::Int64)
            4 ./array.jl:295; fill!(::Array{Int64,1}, ::Int64)
             4 ./array.jl:766; setindex!
          384 ./none:0; #digits!
           96  ./array.jl:0; #digits!#326(::Int64, ::typeof(digits!), ::Array{I...
           30  ./intfuncs.jl:0; #digits!#326(::Int64, ::typeof(digits!), ::Array{I...
           9   ./intfuncs.jl:752; #digits!#326(::Int64, ::typeof(digits!), ::Array{I...
            6 ./int.jl:136; abs
             6 ./int.jl:96; flipsign
           4   ./intfuncs.jl:755; #digits!#326(::Int64, ::typeof(digits!), ::Array{I...
            1 ./abstractarray.jl:917; isempty
             1 ./array.jl:200; length
           7   ./intfuncs.jl:766; #digits!#326(::Int64, ::typeof(digits!), ::Array{I...
            7 ./number.jl:105; divrem
             7 ./int.jl:228; div
           238 ./intfuncs.jl:767; #digits!#326(::Int64, ::typeof(digits!), ::Array{I...
            225 ./array.jl:0; setindex!
            13  ./array.jl:766; setindex!
          33  ./none:0; #ndigits
           33 ./intfuncs.jl:544; #ndigits#322
            2  ./intfuncs.jl:513; ndigits0z(::Int64, ::Int64)
            30 ./intfuncs.jl:516; ndigits0z(::Int64, ::Int64)
             5 ./intfuncs.jl:0; ndigits0zpb(::Int64, ::Int64)
             1 ./intfuncs.jl:454; ndigits0zpb(::Int64, ::Int64)
             1 ./intfuncs.jl:456; ndigits0zpb(::Int64, ::Int64)
              1 ./int.jl:136; abs
               1 ./int.jl:96; flipsign
             7 ./intfuncs.jl:470; ndigits0zpb(::Int64, ::Int64)
              6 ./int.jl:0; div
              1 ./int.jl:230; div
             8 ./intfuncs.jl:474; ndigits0zpb(::Int64, ::Int64)
             4 ./intfuncs.jl:475; ndigits0zpb(::Int64, ::Int64)
              4 ./int.jl:54; *
             3 ./intfuncs.jl:476; ndigits0zpb(::Int64, ::Int64)
              3 ./int.jl:53; +

@giordano
Copy link
Contributor Author

giordano commented Apr 25, 2019

I've changed the condition for the fast path to

ispow2(base) && n >= 0 && n isa Base.BitInteger

n must be non-negative, otherwise the result would be simply wrong (and I've added tests for that), I'm also testing that n is a BitInteger because I think that in general it's not guaranteed that we can do bit twiddling for non bit-types.

Regarding ndigits, the "problematic" case is when digits! is optimised but ndigits0zpb is not (any power of 2 different from 2, 8, and 16):

julia> Profile.clear(); @profile for n in 1:1_000_000
           digits(1234567890123456, base = 4)
       end

julia> Profile.print()
123 ./task.jl:268; (::getfield(REPL, Symbol("##26#27")){REPL.REPLBackend})()
 123 ...e/julia/stdlib/v1.3/REPL/src/REPL.jl:118; macro expansion
  123 ...e/julia/stdlib/v1.3/REPL/src/REPL.jl:86; eval_user_input(::Any, ::REPL.REPLBackend)
   123 ./boot.jl:330; eval(::Module, ::Any)
    2   ./intfuncs.jl:513; ndigits0z(::Int64, ::Int64)
    118 .../stdlib/v1.3/Profile/src/Profile.jl:25; top-level scope
     118 ./REPL[20]:2; macro expansion
      118 ./none:0; #digits
       118 ./intfuncs.jl:710; #digits#324
        118 ./none:0; #digits
         118 ./intfuncs.jl:714; #digits#325
          37 ./array.jl:441; zeros
           23 ./array.jl:445; zeros
            23 ./boot.jl:413; Type
             23 ./boot.jl:404; Type
           14 ./array.jl:446; zeros
            8 ./array.jl:294; fill!(::Array{Int64,1}, ::Int64)
             1 ./abstractarray.jl:212; eachindex
              1 ./abstractarray.jl:95; axes1
               1 ./abstractarray.jl:75; axes
                1 ./array.jl:155; size
            6 ./array.jl:295; fill!(::Array{Int64,1}, ::Int64)
             5 ./array.jl:766; setindex!
             1 ./range.jl:595; iterate
              1 ./promotion.jl:403; ==
          45 ./none:0; #digits!
           5  ./int.jl:444; #digits!#19(::Int64, ::typeof(digits!), ::Array{Int...
           2  /tmp/foo.jl:0; #digits!#19(::Int64, ::typeof(digits!), ::Array{Int...
           4  /tmp/foo.jl:2; #digits!#19(::Int64, ::typeof(digits!), ::Array{Int...
            2 ./int.jl:136; abs
             2 ./int.jl:96; flipsign
           2  /tmp/foo.jl:5; #digits!#19(::Int64, ::typeof(digits!), ::Array{Int...
           31 /tmp/foo.jl:12; #digits!#19(::Int64, ::typeof(digits!), ::Array{Int...
            10 ./array.jl:766; setindex!
            1  ./int.jl:293; &
            1  ./int.jl:437; >>
            17 ./int.jl:444; >>
             14 ./int.jl:439; <<
             1  ./int.jl:424; <=
             2  ./int.jl:437; >>
           1  /tmp/foo.jl:15; #digits!#19(::Int64, ::typeof(digits!), ::Array{Int...
            1 ./abstractarray.jl:212; eachindex
             1 ./abstractarray.jl:95; axes1
              1 ./abstractarray.jl:75; axes
               1 ./array.jl:155; size
          36 ./none:0; #ndigits
           36 ./intfuncs.jl:544; #ndigits#322
            3  ./intfuncs.jl:513; ndigits0z(::Int64, ::Int64)
             2 ./int.jl:49; <
            33 ./intfuncs.jl:516; ndigits0z(::Int64, ::Int64)
             3  ./intfuncs.jl:0; ndigits0zpb(::Int64, ::Int64)
             1  ./intfuncs.jl:454; ndigits0zpb(::Int64, ::Int64)
             14 ./intfuncs.jl:470; ndigits0zpb(::Int64, ::Int64)
              14 ./int.jl:0; div
             5  ./intfuncs.jl:474; ndigits0zpb(::Int64, ::Int64)
             8  ./intfuncs.jl:475; ndigits0zpb(::Int64, ::Int64)
              8 ./int.jl:54; *
             2  ./intfuncs.jl:476; ndigits0zpb(::Int64, ::Int64)
              2 ./int.jl:53; +

Here ndigits takes almost the same time as digits!, but I don't have an easy fix for this.

@giordano
Copy link
Contributor Author

How about

@@ -460,6 +460,10 @@ function ndigits0zpb(x::Integer, b::Integer)
         b == 8  && return (sizeof(x)<<3 - leading_zeros(x) + 2) ÷ 3
         b == 16 && return sizeof(x)<<1 - leading_zeros(x)>>2
         b == 10 && return bit_ndigits0z(x)
+        if ispow2(b)
+            dv, rm = divrem(sizeof(x)<<3 - leading_zeros(x), trailing_zeros(b))
+            return iszero(rm) ? dv : dv + 1
+        end
     end
 
     d = 0

in ndigits0zpb?

Before this change:

julia> @btime ndigits(81985529216486895, base = 4);
  29.934 ns (0 allocations: 0 bytes)

julia> @btime digits(81985529216486895, base = 4);
  106.863 ns (1 allocation: 336 bytes)

After it:

julia> @btime ndigits(81985529216486895, base = 4);
  8.963 ns (0 allocations: 0 bytes)

julia> @btime digits(81985529216486895, base = 4);
  85.248 ns (1 allocation: 336 bytes)

I think it'd be nicer to remove the special cases for 2, 8, 16, but this generic method I came up with is a bit slower than those super-optimised cases.

@giordano
Copy link
Contributor Author

giordano commented May 2, 2019

Any comment on the change to ndigits0zpb proposed above?

@StefanKarpinski
Copy link
Member

That seems good to me. In general, as long as it's a) correct, b) faster, and c) not too insane, I'm happy with any of these changes.

@giordano
Copy link
Contributor Author

giordano commented May 2, 2019

Ok, I installed the proposed change. Here are some benchmarks:

before the PR:

julia> for base in (2, 3, 4, 5, 8, 10, 16),
           n in (1, 3, 1234567890123456, 81985529216486895)
           println("n = $n, base = $base")
           @btime digits($n, base = $base)
       end
n = 1, base = 2
  33.569 ns (1 allocation: 96 bytes)
n = 3, base = 2
  38.626 ns (1 allocation: 96 bytes)
n = 1234567890123456, base = 2
  642.904 ns (1 allocation: 496 bytes)
n = 81985529216486895, base = 2
  759.684 ns (1 allocation: 544 bytes)
n = 1, base = 3
  37.212 ns (1 allocation: 96 bytes)
n = 3, base = 3
  44.564 ns (1 allocation: 96 bytes)
n = 1234567890123456, base = 3
  432.618 ns (1 allocation: 336 bytes)
n = 81985529216486895, base = 3
  509.099 ns (1 allocation: 368 bytes)
n = 1, base = 4
  36.202 ns (1 allocation: 96 bytes)
n = 3, base = 4
  37.109 ns (1 allocation: 96 bytes)
n = 1234567890123456, base = 4
  341.088 ns (1 allocation: 288 bytes)
n = 81985529216486895, base = 4
  399.398 ns (1 allocation: 336 bytes)
n = 1, base = 5
  37.231 ns (1 allocation: 96 bytes)
n = 3, base = 5
  37.755 ns (1 allocation: 96 bytes)
n = 1234567890123456, base = 5
  307.689 ns (1 allocation: 256 bytes)
n = 81985529216486895, base = 5
  362.034 ns (1 allocation: 288 bytes)
n = 1, base = 8
  33.838 ns (1 allocation: 96 bytes)
n = 3, base = 8
  33.675 ns (1 allocation: 96 bytes)
n = 1234567890123456, base = 8
  232.650 ns (1 allocation: 224 bytes)
n = 81985529216486895, base = 8
  273.321 ns (1 allocation: 240 bytes)
n = 1, base = 10
  34.165 ns (1 allocation: 96 bytes)
n = 3, base = 10
  34.032 ns (1 allocation: 96 bytes)
n = 1234567890123456, base = 10
  218.868 ns (1 allocation: 208 bytes)
n = 81985529216486895, base = 10
  246.200 ns (1 allocation: 224 bytes)
n = 1, base = 16
  33.623 ns (1 allocation: 96 bytes)
n = 3, base = 16
  33.525 ns (1 allocation: 96 bytes)
n = 1234567890123456, base = 16
  183.280 ns (1 allocation: 192 bytes)
n = 81985529216486895, base = 16
  221.509 ns (1 allocation: 208 bytes)

After the PR:

julia> for base in (2, 3, 4, 5, 8, 10, 16),
           n in (1, 3, 1234567890123456, 81985529216486895)
           println("n = $n, base = $base")
           @btime digits($n, base = $base)
       end
n = 1, base = 2
  33.610 ns (1 allocation: 96 bytes)
n = 3, base = 2
  34.253 ns (1 allocation: 96 bytes)
n = 1234567890123456, base = 2
  110.220 ns (1 allocation: 496 bytes)
n = 81985529216486895, base = 2
  123.984 ns (1 allocation: 544 bytes)
n = 1, base = 3
  37.236 ns (1 allocation: 96 bytes)
n = 3, base = 3
  44.465 ns (1 allocation: 96 bytes)
n = 1234567890123456, base = 3
  430.688 ns (1 allocation: 336 bytes)
n = 81985529216486895, base = 3
  509.145 ns (1 allocation: 368 bytes)
n = 1, base = 4
  34.196 ns (1 allocation: 96 bytes)
n = 3, base = 4
  34.354 ns (1 allocation: 96 bytes)
n = 1234567890123456, base = 4
  81.159 ns (1 allocation: 288 bytes)
n = 81985529216486895, base = 4
  86.555 ns (1 allocation: 336 bytes)
n = 1, base = 5
  37.245 ns (1 allocation: 96 bytes)
n = 3, base = 5
  37.056 ns (1 allocation: 96 bytes)
n = 1234567890123456, base = 5
  306.630 ns (1 allocation: 256 bytes)
n = 81985529216486895, base = 5
  359.809 ns (1 allocation: 288 bytes)
n = 1, base = 8
  33.450 ns (1 allocation: 96 bytes)
n = 3, base = 8
  33.501 ns (1 allocation: 96 bytes)
n = 1234567890123456, base = 8
  59.394 ns (1 allocation: 224 bytes)
n = 81985529216486895, base = 8
  61.872 ns (1 allocation: 240 bytes)
n = 1, base = 10
  34.125 ns (1 allocation: 96 bytes)
n = 3, base = 10
  35.073 ns (1 allocation: 96 bytes)
n = 1234567890123456, base = 10
  217.121 ns (1 allocation: 208 bytes)
n = 81985529216486895, base = 10
  244.356 ns (1 allocation: 224 bytes)
n = 1, base = 16
  33.395 ns (1 allocation: 96 bytes)
n = 3, base = 16
  33.392 ns (1 allocation: 96 bytes)
n = 1234567890123456, base = 16
  54.078 ns (1 allocation: 192 bytes)
n = 81985529216486895, base = 16
  57.518 ns (1 allocation: 208 bytes)

I also included some benchmarks for bases not powers of 2 (3, 5, 10), to make sure that performance is effectively unaffected in those cases.

a[i] = d
if ispow2(base) && n >= 0 && n isa Base.BitInteger
k = trailing_zeros(base)
c = base - 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You make the assumption that base behaves like Base.BitIntegers, so I suggest you write in the condition 2 lines above: && base <= typemax(Int) and then base = Int(base). (We can't write base = Int(base) unconditionally at the top of the function, because we support bases which don't fit in an Int.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If your concern is that c would always be an Int or larger, even if base is of a smaller type, what about instead

c = base - one(base)

?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern is that trailing_zeros(base) won't necessarily be implemented for all integer types, and that base - 1 also assumes 2-complement implementation, so seems safer to just use Int (or typeof(n) which is Base.BitIntgers).
Moreover, even if base :: BigInt which is implementing 2-complements logic, it's wasteful in the loop below to convert everything to BigInt with the & operation.

Copy link
Contributor Author

@giordano giordano May 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! Note, however, that while a base larger than typemax is possible in digits!, it doesn't currently work in digits because ndigits internally truncates the base to Int:

julia> digits(9223372036854775810, base = 9223372036854775808)
ERROR: InexactError: trunc(Int64, 9223372036854775808)
Stacktrace:
 [1] throw_inexacterror(::Symbol, ::Any, ::Int128) at ./boot.jl:560
 [2] checked_trunc_sint at ./boot.jl:582 [inlined]
 [3] toInt64 at ./boot.jl:631 [inlined]
 [4] Type at ./boot.jl:710 [inlined]
 [5] ndigits0zpb(::Int128, ::Int128) at ./intfuncs.jl:455
 [6] ndigits0z(::Int128, ::Int128) at ./intfuncs.jl:520
 [7] #ndigits#322 at ./intfuncs.jl:548 [inlined]
 [8] #ndigits at ./none:0 [inlined]
 [9] #digits#325 at ./intfuncs.jl:718 [inlined]
 [10] #digits at ./none:0 [inlined]
 [11] #digits#324 at ./intfuncs.jl:714 [inlined]
 [12] (::getfield(Base, Symbol("#kw##digits")))(::NamedTuple{(:base,),Tuple{Int128}}, ::typeof(digits), ::Int128) at ./none:0
 [13] top-level scope at REPL[2]:1

julia> digits!(zeros(Int, 2), 9223372036854775810, base = 9223372036854775808)
2-element Array{Int64,1}:
 2
 1

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right, on the other hand it works with BigInt. We may have to refine these "casts" in a later iteration. Maybe a future improvement here would be then to convert base to the smallest bit integer type ">= Int" which can represent it.

k = trailing_zeros(base)
c = base - 1
for i in eachindex(a)
a[i] = (n >> (k * (i - firstindex(a)))) & c
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the record: here it's assumed that firstindex(x) is an integer and that the indices of a are consecutive, which is not enforced by the AbstractArray interface apparently. The same assumption is made below in the base < 0 branch, so anyway this is IMHO not blocking merging this PR.

@rfourquet
Copy link
Member

The improvements are great! I didn't try myself to find corner cases where this could be slower, but it seems like mostly increasing performance, fo the price of only a small increase of complexity, so I say let's go with it. I will merge after you fix the base problem from my comment.

@rfourquet
Copy link
Member

The failing travis is "No output has been received in the last 30m0s" so assuming it's unrelated.
Thanks for this excellent PR @giordano !

@rfourquet rfourquet merged commit 3efdda6 into JuliaLang:master May 13, 2019
@giordano giordano deleted the digits-base-2 branch May 13, 2019 09:25
@c42f
Copy link
Member

c42f commented Aug 22, 2019

It seems this might have caused the performance regression reported at https://discourse.julialang.org/t/julia-v1-3-runs-this-script-at-least-10x-slower-than-julia-v1-1-on-my-computer/27815. Possibly just because it makes digits more complicated which in turn affects the inlining decisions made by the compiler?

@rapus95
Copy link
Contributor

rapus95 commented Aug 22, 2019

Just out of curiosity, right now the functions are set up like that:

digits(...)=digits!(..., fitting array, ...)
digits!(...)= complicated stuff

why wouldn't it make sense to flip that over and make digits an iterator? ->

digits(...)=iterator with equally complicated stuff
function digits!(...,dest,...)
  dest.=digits(...)
end

The general question here is, whether there are any cases where an iterator is slower than the corresponding array (in the non-mutating case digits(...))? An iterator is generally preferable due to the ability to omit any memory access here, isn't it?
If it is only that we want random access we could also implement the AbstractArray protocol since we can calculate the digit for an arbitrary index directly. Or alternatively if you intend to request several locations multiple times, you can simply collect the iterator first.

@c42f
Copy link
Member

c42f commented Aug 22, 2019

An iterator would be a reasonable API but would be a breaking change. A custom array type which calculates the digits on demand would be less of a change but I'm not sure that can be done efficiently (I guess the main cost is mostly the integer divides (the divrem calls) so for efficiency you really want no more than d divisions where d is the number of digits). It's possible that caching the value of Base.MultiplicativeInverses.multiplicativeinverse(base) might solve that problem.

One fun alternative might be to do a custom array type where the digits are calculated up front but bitpacked into the bits of the next larger integer type. Then you get extremely efficient non-allocating storage as well as the simplicity of calculating the digits in a single pass.

@rapus95
Copy link
Contributor

rapus95 commented Sep 4, 2019

An iterator would be a reasonable API but would be a breaking change. A custom array type which calculates the digits on demand would be less of a change but I'm not sure that can be done efficiently (I guess the main cost is mostly the integer divides (the divrem calls) so for efficiency you really want no more than O(d) divisions where d is the number of digits). It's possible that caching the value of Base.MultiplicativeInverses.multiplicativeinverse(base) might solve that problem.

How about keeping the custom array only as iterator and collecting the whole structure into an abstract array only when the first random access via getindex is made? That way we get superior performance for all iterations and have only an O(d) extra overhead when needed for random access.

EDIT: On the other hand, why didn't we incorporate the power of two optimization of this method into the div and remainder functions? That way we wouldn't specialize the caller to do a better job than the actual function made for this purpose.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants