Breaking changes: increase performance #5

JayKickliter · 2020-10-14T16:38:31Z

This PR significantly increases training and winner-lookup speed. The speedup is primarily achieved by reducing heap allocations. However, the cost is that it introduces API-breaking changes.

The methodology was first to add benchmarking to the current code base and measuring performance deltas after every little tweak to the code. In some cases, removing intermediate heap allocations led to performance regressions. In those cases, I left comments explaining why they're necessary.

Because this PR introduces breaking changes, I added breaking change: the non-default feature serde-1. This helps with build times for people not interested in serialization. Building with serde-1 enables this crate's old [to, from]_json support. I believe that those functions are out of this crate's scope, but as long as they are disabled by default, I see no harm.

Benchmarks

I first ran cargo bench without any library modifications, and the output below is after rerunning it on the tip of this branch.

Training/Random/10      time:   [68.191 us 68.893 us 69.746 us]
                        thr8 Kelem/s 145.15 Kelem/s 146.65 Kelem/s]
                 change:
                        time:   [-4.4626% -3.2623% -2.0623%] (p = 0.00 < 0.05)
                        thrpt:  [+2.1057% +3.3724% +4.6710%]
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) high mild
  5 (5.00%) high severe
Training/Batch/10       time:   [59.171 us 59.415 us 59.682 us]
                        thrpt:  [167.55 Kelem/s 168.31 Kelem/s 169.00 Kelem/s]
                 change:
                        time:   [-19.837% -18.782% -17.789%] (p = 0.00 < 0.05)
                        thrpt:  [+21.639% +23.126% +24.745%]
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe
Training/Random/100     time:   [666.92 us 671.67 us 678.39 us]
                        thrpt:  [147.41 Kelem/s 148.88 Kelem/s 149.94 Kelem/s]
                 change:
                        time:   [-6.2380% -4.6824% -2.8067%] (p = 0.00 < 0.05)
                        thrpt:  [+2.8878% +4.9124% +6.6531%]
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
  5 (5.00%) high severe
Training/Batch/100      time:   [605.23 us 607.84 us 611.17 us]
                        thrpt:  [163.62 Kelem/s 164.52 Kelem/s 165.23 Kelem/s]
                 change:
                        time:   [-17.402% -16.414% -15.498%] (p = 0.00 < 0.05)
                        thrpt:  [+18.340% +19.637% +21.068%]
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) low mild
  5 (5.00%) high mild
  2 (2.00%) high severe
Training/Random/1000    time:   [6.7615 ms 6.7930 ms 6.8285 ms]
                        thrpt:  [146.45 Kelem/s 147.21 Kelem/s 147.90 Kelem/s]
                 change:
                        time:   [-3.2269% -2.6307% -2.0150%] (p = 0.00 < 0.05)
                        thrpt:  [+2.0564% +2.7017% +3.3345%]
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe
Training/Batch/1000     time:   [6.0276 ms 6.0493 ms 6.0753 ms]
                        thrpt:  [164.60 Kelem/s 165.31 Kelem/s 165.90 Kelem/s]
                 change:
                        time:   [-14.930% -14.487% -14.008%] (p = 0.00 < 0.05)
                        thrpt:  [+16.290% +16.942% +17.550%]
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe

Winner/Plain/4          time:   [1.0489 us 1.0518 us 1.0548 us]
                        change: [-45.352% -44.837% -44.342%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe
Winner/Distance/4       time:   [1.0665 us 1.0722 us 1.0785 us]
                        change: [-48.275% -47.959% -47.654%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) high mild
  4 (4.00%) high severe

Notes

I'm fairly certain that the lackluster speedup of random training is explained in this comment.

JayKickliter · 2020-10-14T16:43:21Z

Note: please don't merge this until someone with domain expertise checks that train_random is returning sensible results. I was able to add a test with reference output for train_batch to make sure I didn't break anything, but I don't know how to test random training.

JayKickliter added 3 commits October 14, 2020 09:08

Add benchmarks

15e1b92

Add reference distance map

9f71685

Feature-gate serde support

6508532

Reduce heap allocations

232f479

JayKickliter force-pushed the jsk/breaking/reduce-heap-allocations branch from b7f59a1 to 232f479 Compare October 14, 2020 21:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Breaking changes: increase performance #5

Breaking changes: increase performance #5

JayKickliter commented Oct 14, 2020 •

edited

Loading

JayKickliter commented Oct 14, 2020

Breaking changes: increase performance #5

Are you sure you want to change the base?

Breaking changes: increase performance #5

Conversation

JayKickliter commented Oct 14, 2020 • edited Loading

Benchmarks

Notes

JayKickliter commented Oct 14, 2020

JayKickliter commented Oct 14, 2020 •

edited

Loading