Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Breaking changes: increase performance #5

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

JayKickliter
Copy link
Contributor

@JayKickliter JayKickliter commented Oct 14, 2020

This PR significantly increases training and winner-lookup speed. The speedup is primarily achieved by reducing heap allocations. However, the cost is that it introduces API-breaking changes.

The methodology was first to add benchmarking to the current code base and measuring performance deltas after every little tweak to the code. In some cases, removing intermediate heap allocations led to performance regressions. In those cases, I left comments explaining why they're necessary.

Because this PR introduces breaking changes, I added breaking change: the non-default feature serde-1. This helps with build times for people not interested in serialization. Building with serde-1 enables this crate's old [to, from]_json support. I believe that those functions are out of this crate's scope, but as long as they are disabled by default, I see no harm.

Benchmarks

I first ran cargo bench without any library modifications, and the output below is after rerunning it on the tip of this branch.

Training/Random/10      time:   [68.191 us 68.893 us 69.746 us]
                        thr8 Kelem/s 145.15 Kelem/s 146.65 Kelem/s]
                 change:
                        time:   [-4.4626% -3.2623% -2.0623%] (p = 0.00 < 0.05)
                        thrpt:  [+2.1057% +3.3724% +4.6710%]
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) high mild
  5 (5.00%) high severe
Training/Batch/10       time:   [59.171 us 59.415 us 59.682 us]
                        thrpt:  [167.55 Kelem/s 168.31 Kelem/s 169.00 Kelem/s]
                 change:
                        time:   [-19.837% -18.782% -17.789%] (p = 0.00 < 0.05)
                        thrpt:  [+21.639% +23.126% +24.745%]
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe
Training/Random/100     time:   [666.92 us 671.67 us 678.39 us]
                        thrpt:  [147.41 Kelem/s 148.88 Kelem/s 149.94 Kelem/s]
                 change:
                        time:   [-6.2380% -4.6824% -2.8067%] (p = 0.00 < 0.05)
                        thrpt:  [+2.8878% +4.9124% +6.6531%]
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
  5 (5.00%) high severe
Training/Batch/100      time:   [605.23 us 607.84 us 611.17 us]
                        thrpt:  [163.62 Kelem/s 164.52 Kelem/s 165.23 Kelem/s]
                 change:
                        time:   [-17.402% -16.414% -15.498%] (p = 0.00 < 0.05)
                        thrpt:  [+18.340% +19.637% +21.068%]
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) low mild
  5 (5.00%) high mild
  2 (2.00%) high severe
Training/Random/1000    time:   [6.7615 ms 6.7930 ms 6.8285 ms]
                        thrpt:  [146.45 Kelem/s 147.21 Kelem/s 147.90 Kelem/s]
                 change:
                        time:   [-3.2269% -2.6307% -2.0150%] (p = 0.00 < 0.05)
                        thrpt:  [+2.0564% +2.7017% +3.3345%]
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe
Training/Batch/1000     time:   [6.0276 ms 6.0493 ms 6.0753 ms]
                        thrpt:  [164.60 Kelem/s 165.31 Kelem/s 165.90 Kelem/s]
                 change:
                        time:   [-14.930% -14.487% -14.008%] (p = 0.00 < 0.05)
                        thrpt:  [+16.290% +16.942% +17.550%]
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe

Winner/Plain/4          time:   [1.0489 us 1.0518 us 1.0548 us]
                        change: [-45.352% -44.837% -44.342%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe
Winner/Distance/4       time:   [1.0665 us 1.0722 us 1.0785 us]
                        change: [-48.275% -47.959% -47.654%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) high mild
  4 (4.00%) high severe

Notes

I'm fairly certain that the lackluster speedup of random training is explained in this comment.

@JayKickliter
Copy link
Contributor Author

Note: please don't merge this until someone with domain expertise checks that train_random is returning sensible results. I was able to add a test with reference output for train_batch to make sure I didn't break anything, but I don't know how to test random training.

@JayKickliter JayKickliter force-pushed the jsk/breaking/reduce-heap-allocations branch from b7f59a1 to 232f479 Compare October 14, 2020 21:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant