Fast implementation of digits with base a power of 2 #31722

giordano · 2019-04-14T15:16:32Z

This PR adds a fast specialised implementation of digits(::Integer, base = 2). @ararslan said that making base = 2 a special case wouldn't be a bad idea. Performance with other bases shouldn't be affected.

Before this PR:

julia> @benchmark digits(3, base = 2)
BenchmarkTools.Trial: 
  memory estimate:  96 bytes
  allocs estimate:  1
  --------------
  minimum time:     40.681 ns (0.00% GC)
  median time:      41.959 ns (0.00% GC)
  mean time:        52.063 ns (14.25% GC)
  maximum time:     43.487 μs (99.85% GC)
  --------------
  samples:          10000
  evals/sample:     991

julia> @benchmark digits(3, base = 10)
BenchmarkTools.Trial: 
  memory estimate:  96 bytes
  allocs estimate:  1
  --------------
  minimum time:     35.241 ns (0.00% GC)
  median time:      36.893 ns (0.00% GC)
  mean time:        47.171 ns (16.41% GC)
  maximum time:     43.818 μs (99.88% GC)
  --------------
  samples:          10000
  evals/sample:     992

julia> @benchmark digits(1234567890123456, base = 2)
BenchmarkTools.Trial: 
  memory estimate:  496 bytes
  allocs estimate:  1
  --------------
  minimum time:     658.308 ns (0.00% GC)
  median time:      667.686 ns (0.00% GC)
  mean time:        730.018 ns (5.45% GC)
  maximum time:     247.337 μs (99.70% GC)
  --------------
  samples:          10000
  evals/sample:     159

julia> @benchmark digits(1234567890123456, base = 10)
BenchmarkTools.Trial: 
  memory estimate:  208 bytes
  allocs estimate:  1
  --------------
  minimum time:     222.505 ns (0.00% GC)
  median time:      225.409 ns (0.00% GC)
  mean time:        247.892 ns (5.90% GC)
  maximum time:     87.951 μs (99.67% GC)
  --------------
  samples:          10000
  evals/sample:     487

After this PR:

julia> @benchmark digits(3, base = 2)
BenchmarkTools.Trial: 
  memory estimate:  96 bytes
  allocs estimate:  1
  --------------
  minimum time:     34.707 ns (0.00% GC)
  median time:      35.394 ns (0.00% GC)
  mean time:        43.532 ns (14.30% GC)
  maximum time:     37.210 μs (99.85% GC)
  --------------
  samples:          10000
  evals/sample:     993

julia> @benchmark digits(3, base = 10)
BenchmarkTools.Trial: 
  memory estimate:  96 bytes
  allocs estimate:  1
  --------------
  minimum time:     34.356 ns (0.00% GC)
  median time:      35.174 ns (0.00% GC)
  mean time:        43.805 ns (14.93% GC)
  maximum time:     38.129 μs (99.86% GC)
  --------------
  samples:          10000
  evals/sample:     993

julia> @benchmark digits(1234567890123456, base = 2)
BenchmarkTools.Trial: 
  memory estimate:  496 bytes
  allocs estimate:  1
  --------------
  minimum time:     75.659 ns (0.00% GC)
  median time:      77.740 ns (0.00% GC)
  mean time:        88.100 ns (8.38% GC)
  maximum time:     30.302 μs (99.61% GC)
  --------------
  samples:          10000
  evals/sample:     971

julia> @benchmark digits(1234567890123456, base = 10)
BenchmarkTools.Trial: 
  memory estimate:  208 bytes
  allocs estimate:  1
  --------------
  minimum time:     215.581 ns (0.00% GC)
  median time:      217.726 ns (0.00% GC)
  mean time:        236.653 ns (5.16% GC)
  maximum time:     73.486 μs (99.63% GC)
  --------------
  samples:          10000
  evals/sample:     525

antoine-levitt · 2019-04-15T05:49:24Z

Have you tried benchmarking inside a function? Can't the compiler do constant propagation by itself here?

giordano · 2019-04-15T07:30:04Z

Like this?

julia> function test()
           n = 1234567890123456
           digits(n, base = 2)
       end
test (generic function with 1 method)

julia> @btime test();
  74.073 ns (1 allocation: 496 bytes)

julia> function test()
           n = 1234567890123456
           digits(n, base = 10)
       end
test (generic function with 1 method)

julia> @btime test();
  215.014 ns (1 allocation: 208 bytes)

giordano · 2019-04-15T08:31:59Z

With this change

@@ -755,9 +755,17 @@ function digits!(a::AbstractVector{T}, n::Integer; base::Integer = 10) where T<:
     isempty(a) && return a
 
     if base > 0
-        for i in eachindex(a)
-            n, d = divrem(n, base)
-            a[i] = d
+        if ispow2(base)
+            k = trailing_zeros(base)
+            c = base - 1
+            for i in eachindex(a)
+                a[i] = (n >> (k * (i - 1))) & c
+            end
+        else
+            for i in eachindex(a)
+                n, d = divrem(n, base)
+                a[i] = d
+            end
         end
     else
         # manually peel one loop iteration for type stability

we can have specialised implementation for all bases that are powers of 2. Here are benchmarks with the proposed change to the PR:

# Before
julia> @btime digits(1, base = 2);
  33.756 ns (1 allocation: 96 bytes)

julia> @btime digits(1234567890123456, base = 2);
  660.050 ns (1 allocation: 496 bytes)

julia> @btime digits(81985529216486895, base = 2);
  759.219 ns (1 allocation: 544 bytes)

julia> @btime digits(1, base = 16);
  34.774 ns (1 allocation: 96 bytes)

julia> @btime digits(81985529216486895, base = 16);
  219.475 ns (1 allocation: 208 bytes)


# After
julia> @btime digits(1, base = 2);
  33.044 ns (1 allocation: 96 bytes)

julia> @btime digits(1234567890123456, base = 2);
  111.649 ns (1 allocation: 496 bytes)

julia> @btime digits(81985529216486895, base = 2);
  122.859 ns (1 allocation: 544 bytes)

julia> @btime digits(1, base = 16);
  32.785 ns (1 allocation: 96 bytes)

julia> @btime digits(81985529216486895, base = 16);
  58.451 ns (1 allocation: 208 bytes)

Note that performance of base = 2 gets a bit worse compared to the current version of the PR, but still way better than the generic implementation in master.

base/intfuncs.jl

giordano · 2019-04-15T20:34:00Z

Ok, I've fixed the first index as suggested by Rafael and implemented the fast method for all powers of 2, as well as added more tests.

giordano · 2019-04-24T22:10:03Z

Bump 🙂

StefanKarpinski · 2019-04-24T22:16:49Z

I guess the question is who should make a decision here? This seems like a straightforward speedup. And which version should we go with?

giordano · 2019-04-24T22:23:01Z

who should make a decision here?

I have no idea 😂

And which version should we go with?

I'd suggest the last one with any power of 2, I think it's a common case to see the expansion in base 8 and 16, not only base 2. Anyway, I can easily revert to the previous version if we want to prioritise base = 2.

stevengj · 2019-04-25T16:35:57Z

Should there be a faster ndigits in this case too, or does it spend a negligible amount of time in ndigits?

giordano · 2019-04-25T21:43:31Z

It doesn't seem that ndigits has a significant impact on performance of digits:

julia> Profile.clear(); @profile for n in 1:1_000_000
           digits(1234567890123456, base = 2)
       end

julia> Profile.print()
133 ./task.jl:268; (::getfield(REPL, Symbol("##26#27")){REPL.REPLBackend})()
 133 ...e/julia/stdlib/v1.3/REPL/src/REPL.jl:118; macro expansion
  133 ...e/julia/stdlib/v1.3/REPL/src/REPL.jl:86; eval_user_input(::Any, ::REPL.REPLBackend)
   133 ./boot.jl:330; eval(::Module, ::Any)
    1   ./intfuncs.jl:755; #digits!#326(::Int64, ::typeof(digits!), ::Array{Int64...
    1   ./intfuncs.jl:513; ndigits0z(::Int64, ::Int64)
    131 .../stdlib/v1.3/Profile/src/Profile.jl:25; top-level scope
     131 ./REPL[17]:2; macro expansion
      131 ./none:0; #digits
       131 ./intfuncs.jl:710; #digits#324
        131 ./none:0; #digits
         131 ./intfuncs.jl:714; #digits#325
          34 ./array.jl:441; zeros
           26 ./array.jl:445; zeros
            26 ./boot.jl:413; Type
             26 ./boot.jl:404; Type
           8  ./array.jl:446; zeros
            8 ./array.jl:295; fill!(::Array{Int64,1}, ::Int64)
             7 ./array.jl:766; setindex!
             1 ./range.jl:595; iterate
              1 ./promotion.jl:403; ==
          93 ./none:0; #digits!
           3  ./int.jl:0; #digits!#326(::Int64, ::typeof(digits!), ::Array{In...
           9  ./int.jl:444; #digits!#326(::Int64, ::typeof(digits!), ::Array{In...
           3  ./intfuncs.jl:0; #digits!#326(::Int64, ::typeof(digits!), ::Array{In...
           1  ./intfuncs.jl:752; #digits!#326(::Int64, ::typeof(digits!), ::Array{In...
           2  ./intfuncs.jl:755; #digits!#326(::Int64, ::typeof(digits!), ::Array{In...
           75 ./intfuncs.jl:762; #digits!#326(::Int64, ::typeof(digits!), ::Array{In...
            15 ./array.jl:766; setindex!
            10 ./int.jl:293; &
            10 ./int.jl:437; >>
            38 ./int.jl:444; >>
             4  ./array.jl:766; <<
             27 ./int.jl:439; <<
             7  ./int.jl:424; <=
          4  ./none:0; #ndigits
           4 ./intfuncs.jl:544; #ndigits#322
            3 ./intfuncs.jl:513; ndigits0z(::Int64, ::Int64)
             1 ./int.jl:49; <
            1 ./intfuncs.jl:516; ndigits0z(::Int64, ::Int64)

Actually, ndigits is already optimised for base = 2, 8, 16:

julia/base/intfuncs.jl

Lines 459 to 461 in 93c9ae4

    
           b == 2  && return sizeof(x)<<3 - leading_zeros(x) 
        
           b == 8  && return (sizeof(x)<<3 - leading_zeros(x) + 2) ÷ 3 
        
           b == 16 && return sizeof(x)<<1 - leading_zeros(x)>>2

Also in a non-optimised case (e.g., base = 3), the ndigits takes less than 7% of the total time of digits:

julia> Profile.clear(); @profile for n in 1:1_000_000
           digits(1234567890123456, base = 3)
       end

julia> Profile.print()
484 ./task.jl:268; (::getfield(REPL, Symbol("##26#27")){REPL.REPLBackend})()
 484 ...e/julia/stdlib/v1.3/REPL/src/REPL.jl:118; macro expansion
  484 ...e/julia/stdlib/v1.3/REPL/src/REPL.jl:86; eval_user_input(::Any, ::REPL.REPLBackend)
   484 ./boot.jl:330; eval(::Module, ::Any)
    1   ./intfuncs.jl:752; #digits!#326(::Int64, ::typeof(digits!), ::Array{Int64...
    1   ./intfuncs.jl:513; ndigits0z(::Int64, ::Int64)
    1   .../stdlib/v1.3/Profile/src/Profile.jl:0; top-level scope
    477 .../stdlib/v1.3/Profile/src/Profile.jl:25; top-level scope
     477 ./REPL[22]:2; macro expansion
      477 ./none:0; #digits
       477 ./intfuncs.jl:710; #digits#324
        477 ./none:0; #digits
         477 ./intfuncs.jl:714; #digits#325
          60  ./array.jl:441; zeros
           53 ./array.jl:445; zeros
            53 ./boot.jl:413; Type
             53 ./boot.jl:404; Type
           7  ./array.jl:446; zeros
            2 ./array.jl:294; fill!(::Array{Int64,1}, ::Int64)
            4 ./array.jl:295; fill!(::Array{Int64,1}, ::Int64)
             4 ./array.jl:766; setindex!
          384 ./none:0; #digits!
           96  ./array.jl:0; #digits!#326(::Int64, ::typeof(digits!), ::Array{I...
           30  ./intfuncs.jl:0; #digits!#326(::Int64, ::typeof(digits!), ::Array{I...
           9   ./intfuncs.jl:752; #digits!#326(::Int64, ::typeof(digits!), ::Array{I...
            6 ./int.jl:136; abs
             6 ./int.jl:96; flipsign
           4   ./intfuncs.jl:755; #digits!#326(::Int64, ::typeof(digits!), ::Array{I...
            1 ./abstractarray.jl:917; isempty
             1 ./array.jl:200; length
           7   ./intfuncs.jl:766; #digits!#326(::Int64, ::typeof(digits!), ::Array{I...
            7 ./number.jl:105; divrem
             7 ./int.jl:228; div
           238 ./intfuncs.jl:767; #digits!#326(::Int64, ::typeof(digits!), ::Array{I...
            225 ./array.jl:0; setindex!
            13  ./array.jl:766; setindex!
          33  ./none:0; #ndigits
           33 ./intfuncs.jl:544; #ndigits#322
            2  ./intfuncs.jl:513; ndigits0z(::Int64, ::Int64)
            30 ./intfuncs.jl:516; ndigits0z(::Int64, ::Int64)
             5 ./intfuncs.jl:0; ndigits0zpb(::Int64, ::Int64)
             1 ./intfuncs.jl:454; ndigits0zpb(::Int64, ::Int64)
             1 ./intfuncs.jl:456; ndigits0zpb(::Int64, ::Int64)
              1 ./int.jl:136; abs
               1 ./int.jl:96; flipsign
             7 ./intfuncs.jl:470; ndigits0zpb(::Int64, ::Int64)
              6 ./int.jl:0; div
              1 ./int.jl:230; div
             8 ./intfuncs.jl:474; ndigits0zpb(::Int64, ::Int64)
             4 ./intfuncs.jl:475; ndigits0zpb(::Int64, ::Int64)
              4 ./int.jl:54; *
             3 ./intfuncs.jl:476; ndigits0zpb(::Int64, ::Int64)
              3 ./int.jl:53; +

giordano · 2019-04-25T22:35:44Z

I've changed the condition for the fast path to

ispow2(base) && n >= 0 && n isa Base.BitInteger

n must be non-negative, otherwise the result would be simply wrong (and I've added tests for that), I'm also testing that n is a BitInteger because I think that in general it's not guaranteed that we can do bit twiddling for non bit-types.

Regarding ndigits, the "problematic" case is when digits! is optimised but ndigits0zpb is not (any power of 2 different from 2, 8, and 16):

julia> Profile.clear(); @profile for n in 1:1_000_000
           digits(1234567890123456, base = 4)
       end

julia> Profile.print()
123 ./task.jl:268; (::getfield(REPL, Symbol("##26#27")){REPL.REPLBackend})()
 123 ...e/julia/stdlib/v1.3/REPL/src/REPL.jl:118; macro expansion
  123 ...e/julia/stdlib/v1.3/REPL/src/REPL.jl:86; eval_user_input(::Any, ::REPL.REPLBackend)
   123 ./boot.jl:330; eval(::Module, ::Any)
    2   ./intfuncs.jl:513; ndigits0z(::Int64, ::Int64)
    118 .../stdlib/v1.3/Profile/src/Profile.jl:25; top-level scope
     118 ./REPL[20]:2; macro expansion
      118 ./none:0; #digits
       118 ./intfuncs.jl:710; #digits#324
        118 ./none:0; #digits
         118 ./intfuncs.jl:714; #digits#325
          37 ./array.jl:441; zeros
           23 ./array.jl:445; zeros
            23 ./boot.jl:413; Type
             23 ./boot.jl:404; Type
           14 ./array.jl:446; zeros
            8 ./array.jl:294; fill!(::Array{Int64,1}, ::Int64)
             1 ./abstractarray.jl:212; eachindex
              1 ./abstractarray.jl:95; axes1
               1 ./abstractarray.jl:75; axes
                1 ./array.jl:155; size
            6 ./array.jl:295; fill!(::Array{Int64,1}, ::Int64)
             5 ./array.jl:766; setindex!
             1 ./range.jl:595; iterate
              1 ./promotion.jl:403; ==
          45 ./none:0; #digits!
           5  ./int.jl:444; #digits!#19(::Int64, ::typeof(digits!), ::Array{Int...
           2  /tmp/foo.jl:0; #digits!#19(::Int64, ::typeof(digits!), ::Array{Int...
           4  /tmp/foo.jl:2; #digits!#19(::Int64, ::typeof(digits!), ::Array{Int...
            2 ./int.jl:136; abs
             2 ./int.jl:96; flipsign
           2  /tmp/foo.jl:5; #digits!#19(::Int64, ::typeof(digits!), ::Array{Int...
           31 /tmp/foo.jl:12; #digits!#19(::Int64, ::typeof(digits!), ::Array{Int...
            10 ./array.jl:766; setindex!
            1  ./int.jl:293; &
            1  ./int.jl:437; >>
            17 ./int.jl:444; >>
             14 ./int.jl:439; <<
             1  ./int.jl:424; <=
             2  ./int.jl:437; >>
           1  /tmp/foo.jl:15; #digits!#19(::Int64, ::typeof(digits!), ::Array{Int...
            1 ./abstractarray.jl:212; eachindex
             1 ./abstractarray.jl:95; axes1
              1 ./abstractarray.jl:75; axes
               1 ./array.jl:155; size
          36 ./none:0; #ndigits
           36 ./intfuncs.jl:544; #ndigits#322
            3  ./intfuncs.jl:513; ndigits0z(::Int64, ::Int64)
             2 ./int.jl:49; <
            33 ./intfuncs.jl:516; ndigits0z(::Int64, ::Int64)
             3  ./intfuncs.jl:0; ndigits0zpb(::Int64, ::Int64)
             1  ./intfuncs.jl:454; ndigits0zpb(::Int64, ::Int64)
             14 ./intfuncs.jl:470; ndigits0zpb(::Int64, ::Int64)
              14 ./int.jl:0; div
             5  ./intfuncs.jl:474; ndigits0zpb(::Int64, ::Int64)
             8  ./intfuncs.jl:475; ndigits0zpb(::Int64, ::Int64)
              8 ./int.jl:54; *
             2  ./intfuncs.jl:476; ndigits0zpb(::Int64, ::Int64)
              2 ./int.jl:53; +

Here ndigits takes almost the same time as digits!, but I don't have an easy fix for this.

giordano · 2019-04-25T23:58:43Z

How about

@@ -460,6 +460,10 @@ function ndigits0zpb(x::Integer, b::Integer)
         b == 8  && return (sizeof(x)<<3 - leading_zeros(x) + 2) ÷ 3
         b == 16 && return sizeof(x)<<1 - leading_zeros(x)>>2
         b == 10 && return bit_ndigits0z(x)
+        if ispow2(b)
+            dv, rm = divrem(sizeof(x)<<3 - leading_zeros(x), trailing_zeros(b))
+            return iszero(rm) ? dv : dv + 1
+        end
     end
 
     d = 0

in ndigits0zpb?

Before this change:

julia> @btime ndigits(81985529216486895, base = 4);
  29.934 ns (0 allocations: 0 bytes)

julia> @btime digits(81985529216486895, base = 4);
  106.863 ns (1 allocation: 336 bytes)

After it:

julia> @btime ndigits(81985529216486895, base = 4);
  8.963 ns (0 allocations: 0 bytes)

julia> @btime digits(81985529216486895, base = 4);
  85.248 ns (1 allocation: 336 bytes)

I think it'd be nicer to remove the special cases for 2, 8, 16, but this generic method I came up with is a bit slower than those super-optimised cases.

giordano · 2019-05-02T12:33:13Z

Any comment on the change to ndigits0zpb proposed above?

StefanKarpinski · 2019-05-02T12:47:05Z

That seems good to me. In general, as long as it's a) correct, b) faster, and c) not too insane, I'm happy with any of these changes.

giordano · 2019-05-02T13:47:25Z

Ok, I installed the proposed change. Here are some benchmarks:

before the PR:

julia> for base in (2, 3, 4, 5, 8, 10, 16),
           n in (1, 3, 1234567890123456, 81985529216486895)
           println("n = $n, base = $base")
           @btime digits($n, base = $base)
       end
n = 1, base = 2
  33.569 ns (1 allocation: 96 bytes)
n = 3, base = 2
  38.626 ns (1 allocation: 96 bytes)
n = 1234567890123456, base = 2
  642.904 ns (1 allocation: 496 bytes)
n = 81985529216486895, base = 2
  759.684 ns (1 allocation: 544 bytes)
n = 1, base = 3
  37.212 ns (1 allocation: 96 bytes)
n = 3, base = 3
  44.564 ns (1 allocation: 96 bytes)
n = 1234567890123456, base = 3
  432.618 ns (1 allocation: 336 bytes)
n = 81985529216486895, base = 3
  509.099 ns (1 allocation: 368 bytes)
n = 1, base = 4
  36.202 ns (1 allocation: 96 bytes)
n = 3, base = 4
  37.109 ns (1 allocation: 96 bytes)
n = 1234567890123456, base = 4
  341.088 ns (1 allocation: 288 bytes)
n = 81985529216486895, base = 4
  399.398 ns (1 allocation: 336 bytes)
n = 1, base = 5
  37.231 ns (1 allocation: 96 bytes)
n = 3, base = 5
  37.755 ns (1 allocation: 96 bytes)
n = 1234567890123456, base = 5
  307.689 ns (1 allocation: 256 bytes)
n = 81985529216486895, base = 5
  362.034 ns (1 allocation: 288 bytes)
n = 1, base = 8
  33.838 ns (1 allocation: 96 bytes)
n = 3, base = 8
  33.675 ns (1 allocation: 96 bytes)
n = 1234567890123456, base = 8
  232.650 ns (1 allocation: 224 bytes)
n = 81985529216486895, base = 8
  273.321 ns (1 allocation: 240 bytes)
n = 1, base = 10
  34.165 ns (1 allocation: 96 bytes)
n = 3, base = 10
  34.032 ns (1 allocation: 96 bytes)
n = 1234567890123456, base = 10
  218.868 ns (1 allocation: 208 bytes)
n = 81985529216486895, base = 10
  246.200 ns (1 allocation: 224 bytes)
n = 1, base = 16
  33.623 ns (1 allocation: 96 bytes)
n = 3, base = 16
  33.525 ns (1 allocation: 96 bytes)
n = 1234567890123456, base = 16
  183.280 ns (1 allocation: 192 bytes)
n = 81985529216486895, base = 16
  221.509 ns (1 allocation: 208 bytes)

After the PR:

julia> for base in (2, 3, 4, 5, 8, 10, 16),
           n in (1, 3, 1234567890123456, 81985529216486895)
           println("n = $n, base = $base")
           @btime digits($n, base = $base)
       end
n = 1, base = 2
  33.610 ns (1 allocation: 96 bytes)
n = 3, base = 2
  34.253 ns (1 allocation: 96 bytes)
n = 1234567890123456, base = 2
  110.220 ns (1 allocation: 496 bytes)
n = 81985529216486895, base = 2
  123.984 ns (1 allocation: 544 bytes)
n = 1, base = 3
  37.236 ns (1 allocation: 96 bytes)
n = 3, base = 3
  44.465 ns (1 allocation: 96 bytes)
n = 1234567890123456, base = 3
  430.688 ns (1 allocation: 336 bytes)
n = 81985529216486895, base = 3
  509.145 ns (1 allocation: 368 bytes)
n = 1, base = 4
  34.196 ns (1 allocation: 96 bytes)
n = 3, base = 4
  34.354 ns (1 allocation: 96 bytes)
n = 1234567890123456, base = 4
  81.159 ns (1 allocation: 288 bytes)
n = 81985529216486895, base = 4
  86.555 ns (1 allocation: 336 bytes)
n = 1, base = 5
  37.245 ns (1 allocation: 96 bytes)
n = 3, base = 5
  37.056 ns (1 allocation: 96 bytes)
n = 1234567890123456, base = 5
  306.630 ns (1 allocation: 256 bytes)
n = 81985529216486895, base = 5
  359.809 ns (1 allocation: 288 bytes)
n = 1, base = 8
  33.450 ns (1 allocation: 96 bytes)
n = 3, base = 8
  33.501 ns (1 allocation: 96 bytes)
n = 1234567890123456, base = 8
  59.394 ns (1 allocation: 224 bytes)
n = 81985529216486895, base = 8
  61.872 ns (1 allocation: 240 bytes)
n = 1, base = 10
  34.125 ns (1 allocation: 96 bytes)
n = 3, base = 10
  35.073 ns (1 allocation: 96 bytes)
n = 1234567890123456, base = 10
  217.121 ns (1 allocation: 208 bytes)
n = 81985529216486895, base = 10
  244.356 ns (1 allocation: 224 bytes)
n = 1, base = 16
  33.395 ns (1 allocation: 96 bytes)
n = 3, base = 16
  33.392 ns (1 allocation: 96 bytes)
n = 1234567890123456, base = 16
  54.078 ns (1 allocation: 192 bytes)
n = 81985529216486895, base = 16
  57.518 ns (1 allocation: 208 bytes)

I also included some benchmarks for bases not powers of 2 (3, 5, 10), to make sure that performance is effectively unaffected in those cases.

rfourquet · 2019-05-12T09:04:28Z

base/intfuncs.jl

-            a[i] = d
+        if ispow2(base) && n >= 0 && n isa Base.BitInteger
+            k = trailing_zeros(base)
+            c = base - 1


You make the assumption that base behaves like Base.BitIntegers, so I suggest you write in the condition 2 lines above: && base <= typemax(Int) and then base = Int(base). (We can't write base = Int(base) unconditionally at the top of the function, because we support bases which don't fit in an Int.

If your concern is that c would always be an Int or larger, even if base is of a smaller type, what about instead

c = base - one(base)

?

My concern is that trailing_zeros(base) won't necessarily be implemented for all integer types, and that base - 1 also assumes 2-complement implementation, so seems safer to just use Int (or typeof(n) which is Base.BitIntgers).
Moreover, even if base :: BigInt which is implementing 2-complements logic, it's wasteful in the loop below to convert everything to BigInt with the & operation.

Done! Note, however, that while a base larger than typemax is possible in digits!, it doesn't currently work in digits because ndigits internally truncates the base to Int:

julia> digits(9223372036854775810, base = 9223372036854775808) ERROR: InexactError: trunc(Int64, 9223372036854775808) Stacktrace: [1] throw_inexacterror(::Symbol, ::Any, ::Int128) at ./boot.jl:560 [2] checked_trunc_sint at ./boot.jl:582 [inlined] [3] toInt64 at ./boot.jl:631 [inlined] [4] Type at ./boot.jl:710 [inlined] [5] ndigits0zpb(::Int128, ::Int128) at ./intfuncs.jl:455 [6] ndigits0z(::Int128, ::Int128) at ./intfuncs.jl:520 [7] #ndigits#322 at ./intfuncs.jl:548 [inlined] [8] #ndigits at ./none:0 [inlined] [9] #digits#325 at ./intfuncs.jl:718 [inlined] [10] #digits at ./none:0 [inlined] [11] #digits#324 at ./intfuncs.jl:714 [inlined] [12] (::getfield(Base, Symbol("#kw##digits")))(::NamedTuple{(:base,),Tuple{Int128}}, ::typeof(digits), ::Int128) at ./none:0 [13] top-level scope at REPL[2]:1 julia> digits!(zeros(Int, 2), 9223372036854775810, base = 9223372036854775808) 2-element Array{Int64,1}: 2 1

Oh right, on the other hand it works with BigInt. We may have to refine these "casts" in a later iteration. Maybe a future improvement here would be then to convert base to the smallest bit integer type ">= Int" which can represent it.

rfourquet · 2019-05-12T09:12:42Z

base/intfuncs.jl

+            k = trailing_zeros(base)
+            c = base - 1
+            for i in eachindex(a)
+                a[i] = (n >> (k * (i - firstindex(a)))) & c


For the record: here it's assumed that firstindex(x) is an integer and that the indices of a are consecutive, which is not enforced by the AbstractArray interface apparently. The same assumption is made below in the base < 0 branch, so anyway this is IMHO not blocking merging this PR.

rfourquet · 2019-05-12T09:23:43Z

The improvements are great! I didn't try myself to find corner cases where this could be slower, but it seems like mostly increasing performance, fo the price of only a small increase of complexity, so I say let's go with it. I will merge after you fix the base problem from my comment.

rfourquet · 2019-05-13T09:07:02Z

The failing travis is "No output has been received in the last 30m0s" so assuming it's unrelated.
Thanks for this excellent PR @giordano !

c42f · 2019-08-22T01:08:35Z

It seems this might have caused the performance regression reported at https://discourse.julialang.org/t/julia-v1-3-runs-this-script-at-least-10x-slower-than-julia-v1-1-on-my-computer/27815. Possibly just because it makes digits more complicated which in turn affects the inlining decisions made by the compiler?

rapus95 · 2019-08-22T01:44:50Z

Just out of curiosity, right now the functions are set up like that:

digits(...)=digits!(..., fitting array, ...)
digits!(...)= complicated stuff

why wouldn't it make sense to flip that over and make digits an iterator? ->

digits(...)=iterator with equally complicated stuff
function digits!(...,dest,...)
  dest.=digits(...)
end

The general question here is, whether there are any cases where an iterator is slower than the corresponding array (in the non-mutating case digits(...))? An iterator is generally preferable due to the ability to omit any memory access here, isn't it?
If it is only that we want random access we could also implement the AbstractArray protocol since we can calculate the digit for an arbitrary index directly. Or alternatively if you intend to request several locations multiple times, you can simply collect the iterator first.

c42f · 2019-08-22T07:43:29Z

An iterator would be a reasonable API but would be a breaking change. A custom array type which calculates the digits on demand would be less of a change but I'm not sure that can be done efficiently (I guess the main cost is mostly the integer divides (the divrem calls) so for efficiency you really want no more than d divisions where d is the number of digits). It's possible that caching the value of Base.MultiplicativeInverses.multiplicativeinverse(base) might solve that problem.

One fun alternative might be to do a custom array type where the digits are calculated up front but bitpacked into the bits of the next larger integer type. Then you get extremely efficient non-allocating storage as well as the simplicity of calculating the digits in a single pass.

rapus95 · 2019-09-04T12:09:17Z

An iterator would be a reasonable API but would be a breaking change. A custom array type which calculates the digits on demand would be less of a change but I'm not sure that can be done efficiently (I guess the main cost is mostly the integer divides (the divrem calls) so for efficiency you really want no more than O(d) divisions where d is the number of digits). It's possible that caching the value of Base.MultiplicativeInverses.multiplicativeinverse(base) might solve that problem.

How about keeping the custom array only as iterator and collecting the whole structure into an abstract array only when the first random access via getindex is made? That way we get superior performance for all iterations and have only an O(d) extra overhead when needed for random access.

EDIT: On the other hand, why didn't we incorporate the power of two optimization of this method into the div and remainder functions? That way we wouldn't specialize the caller to do a better job than the actual function made for this purpose.

ararslan added the performance Must go faster label Apr 14, 2019

rfourquet reviewed Apr 15, 2019

View reviewed changes

base/intfuncs.jl Outdated Show resolved Hide resolved

Fast implementation of digits(::Integer, base = 2)

1832f28

giordano force-pushed the digits-base-2 branch from a357d95 to 90d5f60 Compare April 15, 2019 20:30

giordano changed the title ~~Fast implementation of digits(::Integer, base = 2)~~ Fast implementation of digits with base a power of 2 Apr 15, 2019

StefanKarpinski closed this Apr 24, 2019

StefanKarpinski reopened this Apr 24, 2019

giordano force-pushed the digits-base-2 branch from 90d5f60 to f5e442f Compare April 25, 2019 22:30

Fast implementation of digits for all bases powers of 2

85a85a3

giordano force-pushed the digits-base-2 branch from f5e442f to 85a85a3 Compare May 2, 2019 13:43

rfourquet reviewed May 12, 2019

View reviewed changes

Convert base to an Int for the fast path of digits!

bf56a30

rfourquet merged commit 3efdda6 into JuliaLang:master May 13, 2019

giordano deleted the digits-base-2 branch May 13, 2019 09:25

goggle mentioned this pull request Aug 22, 2019

Julia 1.3 runs my script at least 10x slower than Julia 1.1 does on my computer #33014

Closed

giordano mentioned this pull request Nov 10, 2022

Add internal top_set_bit function #47523

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fast implementation of digits with base a power of 2 #31722

Fast implementation of digits with base a power of 2 #31722

giordano commented Apr 14, 2019

antoine-levitt commented Apr 15, 2019

giordano commented Apr 15, 2019

giordano commented Apr 15, 2019

giordano commented Apr 15, 2019

giordano commented Apr 24, 2019

StefanKarpinski commented Apr 24, 2019

giordano commented Apr 24, 2019

stevengj commented Apr 25, 2019

giordano commented Apr 25, 2019 •

edited

Loading

giordano commented Apr 25, 2019 •

edited

Loading

giordano commented Apr 25, 2019

giordano commented May 2, 2019

StefanKarpinski commented May 2, 2019

giordano commented May 2, 2019 •

edited

Loading

rfourquet May 12, 2019

giordano May 12, 2019

rfourquet May 12, 2019

giordano May 12, 2019 •

edited

Loading

rfourquet May 12, 2019

rfourquet May 12, 2019

rfourquet commented May 12, 2019

rfourquet commented May 13, 2019

c42f commented Aug 22, 2019

rapus95 commented Aug 22, 2019

c42f commented Aug 22, 2019 •

edited

Loading

rapus95 commented Sep 4, 2019 •

edited

Loading

Fast implementation of digits with base a power of 2 #31722

Fast implementation of digits with base a power of 2 #31722

Conversation

giordano commented Apr 14, 2019

antoine-levitt commented Apr 15, 2019

giordano commented Apr 15, 2019

giordano commented Apr 15, 2019

giordano commented Apr 15, 2019

giordano commented Apr 24, 2019

StefanKarpinski commented Apr 24, 2019

giordano commented Apr 24, 2019

stevengj commented Apr 25, 2019

giordano commented Apr 25, 2019 • edited Loading

giordano commented Apr 25, 2019 • edited Loading

giordano commented Apr 25, 2019

giordano commented May 2, 2019

StefanKarpinski commented May 2, 2019

giordano commented May 2, 2019 • edited Loading

rfourquet May 12, 2019

Choose a reason for hiding this comment

giordano May 12, 2019

Choose a reason for hiding this comment

rfourquet May 12, 2019

Choose a reason for hiding this comment

giordano May 12, 2019 • edited Loading

Choose a reason for hiding this comment

rfourquet May 12, 2019

Choose a reason for hiding this comment

rfourquet May 12, 2019

Choose a reason for hiding this comment

rfourquet commented May 12, 2019

rfourquet commented May 13, 2019

c42f commented Aug 22, 2019

rapus95 commented Aug 22, 2019

c42f commented Aug 22, 2019 • edited Loading

rapus95 commented Sep 4, 2019 • edited Loading

giordano commented Apr 25, 2019 •

edited

Loading

giordano commented Apr 25, 2019 •

edited

Loading

giordano commented May 2, 2019 •

edited

Loading

giordano May 12, 2019 •

edited

Loading

c42f commented Aug 22, 2019 •

edited

Loading

rapus95 commented Sep 4, 2019 •

edited

Loading