Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add faster while loop implementations and new Time#+ benchmark #155

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 30 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -492,18 +492,18 @@ Comparison:

```
$ ruby -v code/enumerable/each_with_index-vs-while-loop.rb
ruby 2.2.0p0 (2014-12-25 revision 49005) [x86_64-darwin14]

ruby 2.5.1p57 (2018-03-29 revision 63029) [x86_64-linux]
Calculating -------------------------------------
While Loop 22.553k i/100ms
each_with_index 11.963k i/100ms
-------------------------------------------------
While Loop 240.752k (± 7.1%) i/s - 1.218M
each_with_index 126.753k (± 5.9%) i/s - 634.039k
While optimal 443.606 (± 2.5%) i/s - 2.250k in 5.075720s
While cached size 441.961 (± 0.5%) i/s - 2.244k in 5.077426s
While loop 363.202 (± 3.3%) i/s - 1.836k in 5.061400s
each_with_index 277.373 (± 1.1%) i/s - 1.404k in 5.062208s

Comparison:
While Loop: 240752.1 i/s
each_with_index: 126753.4 i/s - 1.90x slower
While optimal: 443.6 i/s
While cached size: 442.0 i/s - same-ish: difference falls within error
While loop: 363.2 i/s - 1.22x slower
each_with_index: 277.4 i/s - 1.60x slower
```

##### `Enumerable#map`...`Array#flatten` vs `Enumerable#flat_map` [code](code/enumerable/map-flatten-vs-flat_map.rb)
Expand Down Expand Up @@ -1303,6 +1303,27 @@ Comparison:
Time.parse: 43710.9 i/s - 2.62x slower
```

##### `Time#+` vs `Time.at(Time#to_f+)` [code](code/time/plus-vs-to_f-plus.rb)

This covers both high and low precision since high precision operations are more expensive.
Time#+ is slow because it calls Float#to_r and does a bunch of Bignum operations to avoid overflowing if the offset is ever insanely high.

```
$ ruby -v code/time/plus-vs-to_f-plus.rb
ruby 2.5.1p57 (2018-03-29 revision 63029) [x86_64-linux]
Calculating -------------------------------------
Time#to_f+ (low) 1.485k (± 1.5%) i/s - 7.548k in 5.084706s
Time#+ (low) 854.005 (± 4.2%) i/s - 4.300k in 5.046595s
Time#to_f+ (high) 1.051k (± 1.4%) i/s - 5.355k in 5.094035s
Time#+ (high) 836.424 (± 7.7%) i/s - 4.165k in 5.021368s

Comparison:
Time#to_f+ (low): 1484.8 i/s
Time#to_f+ (high): 1051.4 i/s - 1.41x slower
Time#+ (low): 854.0 i/s - 1.74x slower (3.11x on i386-mingw32)
Time#+ (high): 836.4 i/s - 1.78x slower (3.02x on i386-mingw32)
```

### Range

#### `cover?` vs `include?` [code](code/range/cover-vs-include.rb)
Expand Down
27 changes: 25 additions & 2 deletions code/enumerable/each_with_index-vs-while-loop.rb
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,27 @@

ARRAY = [*1..100]

def fastest
array = ARRAY
index = 0
size = array.size
while index < size
array[index] + index
index += 1
end
array
end

def faster
index = 0
size = ARRAY.size
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be fastest and current fastest should be removed, as they will always will be around the same. Constant lookup is not slower than local variable in terms of code execution. It's not even micro-optimization it some sort of pseudo-scientific nano-optimization :D

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact whole benchmark seems a bit wrong to me. It compares oranges to potatoes. It should test each* vs while not different memoization tricks:

require "benchmark/ips"

ARRAY = [*1..100]

def fastest
  index = 0
  size  = ARRAY.size

  while index < size
    ARRAY[index] + index
    index += 1
  end

  ARRAY
end

def faster
  index = 0

  ARRAY.each do |number|
    number + index
    index += 1
  end

  ARRAY
end

def slow
  ARRAY.each_with_index { |number, index| number + index }
end

Benchmark.ips do |x|
  x.report("fastest", 'fastest;' * 1000)
  x.report("faster", 'faster;' * 1000)
  x.report("slow", 'slow;' * 1000)
  x.compare!
end

Results will be:

Warming up --------------------------------------
             fastest    37.000  i/100ms
              faster    23.000  i/100ms
                slow    24.000  i/100ms
Calculating -------------------------------------
             fastest    375.210  (± 1.1%) i/s -      1.887k in   5.029639s
              faster    242.769  (± 2.1%) i/s -      1.219k in   5.023745s
                slow    236.484  (± 1.3%) i/s -      1.200k in   5.074932s

Comparison:
             fastest:      375.2 i/s
              faster:      242.8 i/s - 1.55x  slower
                slow:      236.5 i/s - 1.59x  slower

Copy link
Author

@bawNg bawNg May 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about removing the fastest micro optimized version but I decided to include it because local variables should be able to be optimized much better than other lookups. The difference may be very small with the current Ruby implementation but benchmarks do consistently show a performance difference in many cases even if it's overkill for most use cases. The Linux benchmarks I ran were up to 12 i/s faster which is 12k calls per second and an average of 19 i/s faster under Windows (constants are 1.04x slower). I'm running these benchmarks with high CPU priority on Windows and real-time scheduling on Linux to maximize consistency.

As for whether the benchmark should be comparing different memoization tricks or not, that may be a good reason to only include one while loop but when comparing a while loop to a native abstraction, it should ideally be written as optimally as possible. Array#each_with_index gets to take advantage of the C stack and avoid all the overhead of any kind of ruby variable when it comes to referencing the array and doing comparisons and increments on the index.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that local variable will be slower on such synthetic tests. But in real world example where payload of iteration is something less stupid it will seek to nothing. And I think the aim of this project is providing somewhat "better practices". If real code suffers from overhead of local var vs constant lookup - i would consider it as ruby issue.

while index < size
ARRAY[index] + index
index += 1
end
ARRAY
end

def fast
index = 0
while index < ARRAY.size
Expand All @@ -18,7 +39,9 @@ def slow
end

Benchmark.ips do |x|
x.report("While Loop") { fast }
x.report("each_with_index") { slow }
x.report("While optimal", 'fastest;' * 1000)
x.report("While cached size", 'faster;' * 1000)
x.report("While simple", 'fast;' * 1000)
x.report("each_with_index", 'slow;' * 1000)
x.compare!
end
33 changes: 33 additions & 0 deletions code/time/plus-vs-to_f-plus.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
require 'benchmark/ips'

TimeFloatLowPrecision = 1525262447.0
OffsetFloatLowPrecision = 1.1
TimeFloatHighPrecision = 1525262447.1234567
OffsetFloatHighPrecision = 1.1234567

TimeLowPrecision = Time.at(TimeFloatLowPrecision)
TimeHighPrecision = Time.at(TimeFloatHighPrecision)

def fast_low_precision
Time.at(TimeLowPrecision.to_f + OffsetFloatLowPrecision)
end

def slow_low_precision
TimeLowPrecision + OffsetFloatLowPrecision
end

def fast_high_precision
Time.at(TimeHighPrecision.to_f + OffsetFloatHighPrecision)
end

def slow_high_precision
TimeHighPrecision + OffsetFloatHighPrecision
end

Benchmark.ips do |x|
x.report('Time#to_f+ (low)', 'fast_low_precision;' * 1000)
x.report('Time#+ (low)', 'slow_low_precision;' * 1000)
x.report('Time#to_f+ (high)', 'fast_high_precision;' * 1000)
x.report('Time#+ (high)', 'slow_high_precision;' * 1000)
x.compare!
end