Performance benchmarks #32

mdavezac · 2016-10-09T18:57:14Z

I've added a number of benchmarks in a file, to check whether operations between unitful objects are just as fast as their unitless counterparts.

Unfortunately, this is not always the case (assuming the code in the pull-request is correct :) ). Most notably, operations involving unitful arrays are quite expensive.

Not sure whether you want this in the code itself.
The benchmarks are arranged as tests, and can be run with:

include("test/benchmarks.jl")

Benchmarks can be trialed in the REPL with:

judge_unit_benchmark(:((2u"m") * (2u"m^-1")))

Broken tests are those expression that do not evaluate as fast as their unit-less counterpart

ajkeller34 · 2016-10-09T21:09:01Z

Thank you so much for taking the time to implement benchmarking! For a package like this it is sorely needed.

I'll review this PR carefully when I get a moment to do so. It's disappointing that some operations seem slow, but keep in mind that at least with 0.5.0, there are some bugs in julia that could be impacting performance, notably JuliaLang/julia#18465. Rational numbers are used wherever exact unit conversions are possible and it could be that the type instability described in that issue is causing wide-ranging performance problems. Of course, this is just a guess, and I need to look into what you found a little more.

ajkeller34 · 2016-10-10T00:01:34Z

How about we do a little benchmarking case study. cc @timholy since he is a user of this package and is interested in its performance.

Let's first consider addition when units are mixed, which was a problematic benchmark:

julia> using Unitful, BenchmarkTools

julia> a = 1u"km"
1 km

julia> b = 2u"m"
2 m

julia> @benchmark +($a,$b)
BenchmarkTools.Trial: 
  samples:          10000
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%
  memory estimate:  144.00 bytes
  allocs estimate:  5
  minimum time:     10.72 μs (0.00% GC)
  median time:      12.44 μs (0.00% GC)
  mean time:        13.40 μs (0.00% GC)
  maximum time:     79.81 μs (0.00% GC)

Yikes! Why is that taking so long? Well, since you specified integers, an exact conversion is possible and Rationals are used. As I mentioned above, we know there is currently a type instability with Rational (note the return type of Any):

julia> @code_warntype +(1u"km",2u"m")
Variables:
  #self#::Base.#+

# I've omitted some output here

  end::Any

The return type of Any is a bad sign. There may be other performance penalties associated with exact conversion, but probably a lot of it is this type instability. One can avoid both the type instability and any legitimate penalties associated with using Rationals by using floating-point numbers. The return type is concrete, and the performance is way better:

julia> a = 1.0u"km"
1.0 km

julia> b = 2.0u"m"
2.0 m

julia> @benchmark +($a,$b)
BenchmarkTools.Trial: 
  samples:          10000
  evals/sample:     1000
  time tolerance:   5.00%
  memory tolerance: 1.00%
  memory estimate:  0.00 bytes
  allocs estimate:  0
  minimum time:     6.00 ns (0.00% GC)
  median time:      6.00 ns (0.00% GC)
  mean time:        6.24 ns (0.00% GC)
  maximum time:     43.00 ns (0.00% GC)

Now, let's compare with the performance of SIUnits:

julia> using SIUnits, SIUnits.ShortUnits

julia> a = 1.0km
1000.0 m

julia> b = 2.0m
1.0 m

julia> @benchmark +($a, $b)
BenchmarkTools.Trial: 
  samples:          10000
  evals/sample:     1000
  time tolerance:   5.00%
  memory tolerance: 1.00%
  memory estimate:  0.00 bytes
  allocs estimate:  0
  minimum time:     2.00 ns (0.00% GC)
  median time:      2.00 ns (0.00% GC)
  mean time:        2.06 ns (0.00% GC)
  maximum time:     47.00 ns (0.00% GC)

So, why does SIUnits win here (2ns vs 6ns)? Well, the only length unit SIUnits can internalize is the meter. Other length units specified by the user are just converted to meters, as you see above. So SIUnits wins because the benchmark doesn't count the computation needed to convert km to m; it has already been converted by the time the benchmarking is done. When issues beyond performance are considered, this automatic conversion to just one unit (say meters) for a given dimension (length) can make other issues hard to solve: Keno/SIUnits.jl#22, Keno/SIUnits.jl#92, Keno/SIUnits.jl#57, Keno/SIUnits.jl#8, to name a few.

In the case of arrays, let's again consider floating-point numbers only for simplicity, and do another comparison:

julia> using Unitful

julia> A = [1.0u"km", 2.0u"m"]
2-element Array{Quantity{Float64, Dimensions:{𝐋}, Units:{m}},1}:
 1000.0 m
    2.0 m

julia> @benchmark .+($A, $A)
BenchmarkTools.Trial: 
  samples:          10000
  evals/sample:     973
  time tolerance:   5.00%
  memory tolerance: 1.00%
  memory estimate:  144.00 bytes
  allocs estimate:  3
  minimum time:     73.00 ns (0.00% GC)
  median time:      79.00 ns (0.00% GC)
  mean time:        106.86 ns (20.18% GC)
  maximum time:     5.91 μs (97.90% GC)

julia> using SIUnits, SIUnits.ShortUnits

julia> B = [1.0km, 2.0m]
2-element Array{SIUnits.SIQuantity{Float64,1,0,0,0,0,0,0,0,0},1}:
 1000.0 m
    2.0 m

julia> @benchmark .+($B, $B)
BenchmarkTools.Trial: 
  samples:          10000
  evals/sample:     958
  time tolerance:   5.00%
  memory tolerance: 1.00%
  memory estimate:  144.00 bytes
  allocs estimate:  3
  minimum time:     86.00 ns (0.00% GC)
  median time:      92.00 ns (0.00% GC)
  mean time:        121.36 ns (18.40% GC)
  maximum time:     6.27 μs (96.46% GC)

julia> C = [1000.0, 2.0]
2-element Array{Float64,1}:
 1000.0
    2.0

julia> @benchmark .+($C, $C)
BenchmarkTools.Trial: 
  samples:          10000
  evals/sample:     979
  time tolerance:   5.00%
  memory tolerance: 1.00%
  memory estimate:  144.00 bytes
  allocs estimate:  3
  minimum time:     64.00 ns (0.00% GC)
  median time:      69.00 ns (0.00% GC)
  mean time:        96.29 ns (22.58% GC)
  maximum time:     5.82 μs (98.44% GC)

For both Unitful and SIUnits, to get a concretely-typed array, unit conversion is required before the benchmarking is done (no conversion required for Float64). In all cases, the performance is pretty similar. I wouldn't read too much into the differences here since I didn't do this very carefully.

Eventually (once I don't have to explain the type instability) I will write up some of this in the Unitful documentation, to emphasize differences between Unitful and SIUnits and enable users to choose the package that best suits their needs.

ajkeller34 · 2016-10-10T00:04:59Z

test/benchmarks.jl

+using Unitful
+using BenchmarkTools
+using Base.Test
+using DataFrames


DataFrames no longer needed?

Indeed. Sorry, the pull-request is a bit dirty. In part, it's because I'm not sure how benchmarks are integrated into packages. As part of the testing framework? outside of it? as a report? not all?

In any case, I'll correct that and the unused function.

ajkeller34 · 2016-10-10T00:06:57Z

test/benchmarks.jl

+using Base.Test
+using DataFrames
+
+function benchmark()


Is this function used anywhere? If I try it out I get the following:

julia> benchmark() ERROR: UndefVarError: benchmark! not defined in benchmark() at /Users/ajkeller/.julia/v0.5/Unitful/test/benchmarks.jl:10

mdavezac · 2016-10-10T08:22:15Z

What about the case for arrays with a single unit type [1u"m", 1u"m"] .* [1u"m", 1u"m"], where the eltype is concrete and so forth. Is there any reason to think it should be slower than the corresponding operation between bare arrays? I'd like to see if I can figure out a fix, if it's worth it.

timholy · 2016-10-10T13:15:28Z

You'd need to avoid using integers in that comparison, due to JuliaLang/julia#18465. Once that bug is fixed, it shouldn't matter.

mdavezac · 2016-10-13T09:25:11Z

I've tried this with arrays of floats, and verified Unitful does not add much overhead, if any.
However, it's a bit flaky. So, I'm not sure a unit-test format is the right way to go.
Closing this for now.

mdavezac added 4 commits October 9, 2016 19:51

Add functions to add and analyze benchmarks

c5be2d4

Adding default benchmarks

2bda342

Run benchmarks as tests

3dbbc40

More benchmarks + broken tests

2850c96

Broken tests are those expression that do not evaluate as fast as their unit-less counterpart

ajkeller34 requested changes Oct 10, 2016

View reviewed changes

mdavezac closed this Oct 13, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance benchmarks #32

Performance benchmarks #32

mdavezac commented Oct 9, 2016 •

edited

Loading

ajkeller34 commented Oct 9, 2016

ajkeller34 commented Oct 10, 2016

ajkeller34 Oct 10, 2016 •

edited

Loading

mdavezac Oct 10, 2016

ajkeller34 Oct 10, 2016

mdavezac commented Oct 10, 2016

timholy commented Oct 10, 2016

mdavezac commented Oct 13, 2016

Performance benchmarks #32

Performance benchmarks #32

Conversation

mdavezac commented Oct 9, 2016 • edited Loading

ajkeller34 commented Oct 9, 2016

ajkeller34 commented Oct 10, 2016

ajkeller34 Oct 10, 2016 • edited Loading

Choose a reason for hiding this comment

mdavezac Oct 10, 2016

Choose a reason for hiding this comment

ajkeller34 Oct 10, 2016

Choose a reason for hiding this comment

mdavezac commented Oct 10, 2016

timholy commented Oct 10, 2016

mdavezac commented Oct 13, 2016

mdavezac commented Oct 9, 2016 •

edited

Loading

ajkeller34 Oct 10, 2016 •

edited

Loading