benchmark: make compare.R easier to understand #18373

AndreasMadsen · 2018-01-25T14:57:28Z

As talked about in #18112 (comment) this shows more clearly the variance of each comparison. This should also help us prevent over-running the benchmarks. If you see an accuracy of ±0.1% then properly you could spend fewer iterations running that ;)

Checklist

make -j4 test (UNIX), or vcbuild test (Windows) passes
commit message follows commit guidelines

Affected core subsystem(s)

benchmark

example output:

                                                                                  confidence improvement accuracy (*)    (**)   (***)
 fs/bench-readdir.js n=10000                                                                     -7.20 %       ±9.94% ±13.23% ±17.22%
 fs/bench-readdirSync.js n=10000                                                                 -0.46 %       ±6.97%  ±9.27% ±12.08%
 fs/bench-realpath.js pathType="relative" n=10000                                                 1.62 %       ±4.25%  ±5.65%  ±7.35%
 fs/bench-realpath.js pathType="resolved" n=10000                                                -1.68 %       ±3.89%  ±5.17%  ±6.74%
 fs/bench-realpathSync.js pathType="relative" n=10000                                             0.72 %       ±5.87%  ±7.81% ±10.18%
 fs/bench-realpathSync.js pathType="resolved" n=10000                                             0.16 %       ±1.61%  ±2.15%  ±2.80%
 fs/bench-stat.js statType="fstat" n=200000                                                      -2.39 %       ±4.50%  ±6.00%  ±7.83%
 fs/bench-stat.js statType="lstat" n=200000                                                      -2.94 %       ±5.34%  ±7.11%  ±9.25%
 fs/bench-stat.js statType="stat" n=200000                                                        2.30 %       ±4.29%  ±5.72%  ±7.45%
 fs/bench-statSync.js statSyncType="fstatSync" n=1000000                                         -1.15 %       ±5.69%  ±7.57%  ±9.85%
 fs/bench-statSync.js statSyncType="lstatSync" n=1000000                                          0.47 %       ±2.15%  ±2.90%  ±3.84%
 fs/bench-statSync.js statSyncType="statSync" n=1000000                                           0.62 %       ±2.99%  ±3.98%  ±5.19%
 fs/read-stream-throughput.js size=1024 filesize=1048576000 encodingType="asc"                    1.08 %       ±2.96%  ±3.94%  ±5.13%
 fs/read-stream-throughput.js size=1024 filesize=1048576000 encodingType="buf"                   -1.86 %       ±3.85%  ±5.13%  ±6.68%
 fs/read-stream-throughput.js size=1024 filesize=1048576000 encodingType="utf"                   -1.09 %       ±2.29%  ±3.04%  ±3.96%
 fs/read-stream-throughput.js size=1048576 filesize=1048576000 encodingType="asc"                -1.70 %       ±4.55%  ±6.07%  ±7.95%
 fs/read-stream-throughput.js size=1048576 filesize=1048576000 encodingType="buf"                -0.51 %       ±3.54%  ±4.71%  ±6.14%
 fs/read-stream-throughput.js size=1048576 filesize=1048576000 encodingType="utf"                 0.21 %      ±10.19% ±13.56% ±17.65%
 fs/read-stream-throughput.js size=4096 filesize=1048576000 encodingType="asc"                   -1.84 %       ±3.14%  ±4.17%  ±5.43%
 fs/read-stream-throughput.js size=4096 filesize=1048576000 encodingType="buf"                   -2.14 %       ±4.09%  ±5.44%  ±7.08%
 fs/read-stream-throughput.js size=4096 filesize=1048576000 encodingType="utf"                    1.17 %       ±3.53%  ±4.70%  ±6.11%
 fs/read-stream-throughput.js size=65535 filesize=1048576000 encodingType="asc"                   1.25 %       ±7.25%  ±9.65% ±12.56%
 fs/read-stream-throughput.js size=65535 filesize=1048576000 encodingType="buf"            *      6.19 %       ±5.13%  ±6.86%  ±9.01%
 fs/read-stream-throughput.js size=65535 filesize=1048576000 encodingType="utf"                   2.12 %       ±6.49%  ±8.64% ±11.25%
 fs/readfile.js concurrent=1 len=1024 dur=5                                                      -0.33 %       ±3.64%  ±4.84%  ±6.31%
 fs/readfile.js concurrent=1 len=16777216 dur=5                                                  -3.03 %       ±5.35%  ±7.12%  ±9.27%
 fs/readfile.js concurrent=10 len=1024 dur=5                                                     -0.98 %       ±2.13%  ±2.83%  ±3.68%
 fs/readfile.js concurrent=10 len=16777216 dur=5                                                  0.14 %       ±1.72%  ±2.29%  ±2.98%
 fs/readFileSync.js n=600000                                                                     -4.79 %       ±7.59% ±10.10% ±13.17%
 fs/write-stream-throughput.js size=1024 encodingType="asc" dur=5                                -1.60 %       ±8.77% ±11.66% ±15.18%
 fs/write-stream-throughput.js size=1024 encodingType="buf" dur=5                                 0.39 %      ±12.28% ±16.34% ±21.27%
 fs/write-stream-throughput.js size=1024 encodingType="utf" dur=5                                -0.77 %       ±5.77%  ±7.69% ±10.03%
 fs/write-stream-throughput.js size=1048576 encodingType="asc" dur=5                              0.18 %       ±0.99%  ±1.31%  ±1.71%
 fs/write-stream-throughput.js size=1048576 encodingType="buf" dur=5                       *     23.42 %      ±23.37% ±31.12% ±40.58%
 fs/write-stream-throughput.js size=1048576 encodingType="utf" dur=5                              2.73 %       ±7.07%  ±9.45% ±12.41%
 fs/write-stream-throughput.js size=2 encodingType="asc" dur=5                                    0.74 %      ±11.24% ±14.96% ±19.47%
 fs/write-stream-throughput.js size=2 encodingType="buf" dur=5                                   -2.84 %       ±5.64%  ±7.52%  ±9.81%
 fs/write-stream-throughput.js size=2 encodingType="utf" dur=5                             *    -15.35 %      ±14.18% ±18.87% ±24.58%
 fs/write-stream-throughput.js size=65535 encodingType="asc" dur=5                               -3.07 %      ±10.67% ±14.21% ±18.53%
 fs/write-stream-throughput.js size=65535 encodingType="buf" dur=5                         *    -12.87 %      ±11.94% ±15.88% ±20.67%
 fs/write-stream-throughput.js size=65535 encodingType="utf" dur=5                                3.28 %       ±9.81% ±13.08% ±17.06%

Be aware that when doing many comparisons the risk of a false-positive
result increases. In this case there are 41 comparisons, you can thus
expect the following amount of false-positive results:
  2.05 false positives, when considering a   5% risk acceptance (*, **, ***),
  0.41 false positives, when considering a   1% risk acceptance (**, ***),
  0.04 false positives, when considering a 0.1% risk acceptance (***)

AndreasMadsen · 2018-01-25T15:05:55Z

/cc @joyeecheung
/cc @jasnell @Matteo - who have been my "I don't understand statistic" test subjects.

joyeecheung

Not a R expert, but LGTM if benchmark CI is happy.

joyeecheung · 2018-01-25T15:16:16Z

Benchmark CI: https://ci.nodejs.org/job/benchmark-node-micro-benchmarks/103/

Looking at https://github.com/nodejs/benchmarking/blob/master/experimental/benchmarks/community-benchmark/run.sh I think this should be using the new R script for the results, cc @gareth-ellis

joyeecheung · 2018-01-25T17:46:56Z

                                               confidence improvement accuracy (*)    (**)   (***)
 arrays/var-int.js n=25 type="Array"                           4.97 %      ±13.05% ±17.96% ±24.65%
 arrays/var-int.js n=25 type="Buffer"                         -0.85 %      ±12.22% ±16.76% ±22.88%
 arrays/var-int.js n=25 type="Float32Array"                   -0.31 %       ±0.43%  ±0.58%  ±0.80%
 arrays/var-int.js n=25 type="Float64Array"                    0.34 %       ±0.50%  ±0.69%  ±0.95%
 arrays/var-int.js n=25 type="Int16Array"                      3.92 %       ±9.76% ±14.00% ±20.56%
 arrays/var-int.js n=25 type="Int32Array"                     -0.21 %       ±0.48%  ±0.66%  ±0.93%
 arrays/var-int.js n=25 type="Int8Array"                      -5.21 %       ±9.54% ±13.70% ±20.15%
 arrays/var-int.js n=25 type="Uint16Array"                    -0.14 %      ±12.71% ±17.42% ±23.73%
 arrays/var-int.js n=25 type="Uint32Array"                    -0.79 %       ±1.49%  ±2.13%  ±3.09%
 arrays/var-int.js n=25 type="Uint8Array"                      4.23 %      ±10.50% ±15.08% ±22.17%
 arrays/zero-float.js n=25 type="Array"                       -1.47 %       ±4.95%  ±7.10% ±10.41%
 arrays/zero-float.js n=25 type="Buffer"                       3.84 %       ±9.00% ±12.92% ±18.98%
 arrays/zero-float.js n=25 type="Float32Array"                 0.30 %       ±0.53%  ±0.73%  ±0.99%
 arrays/zero-float.js n=25 type="Float64Array"                -0.07 %       ±1.77%  ±2.50%  ±3.59%
 arrays/zero-float.js n=25 type="Int16Array"                   2.44 %       ±7.92% ±11.26% ±16.31%
 arrays/zero-float.js n=25 type="Int32Array"                  -0.04 %       ±0.63%  ±0.88%  ±1.24%
 arrays/zero-float.js n=25 type="Int8Array"                    0.10 %       ±2.76%  ±3.78%  ±5.16%
 arrays/zero-float.js n=25 type="Uint16Array"                 -0.72 %       ±1.92%  ±2.74%  ±3.98%
 arrays/zero-float.js n=25 type="Uint32Array"                  0.67 %       ±0.92%  ±1.27%  ±1.76%
 arrays/zero-float.js n=25 type="Uint8Array"                  -7.27 %      ±11.99% ±17.15% ±25.07%
 arrays/zero-int.js n=25 type="Array"                          2.16 %       ±8.68% ±11.95% ±16.41%
 arrays/zero-int.js n=25 type="Buffer"                         3.86 %       ±8.99% ±12.91% ±18.97%
 arrays/zero-int.js n=25 type="Float32Array"                  -0.26 %       ±0.57%  ±0.78%  ±1.07%
 arrays/zero-int.js n=25 type="Float64Array"                  -0.42 %       ±0.65%  ±0.90%  ±1.25%
 arrays/zero-int.js n=25 type="Int16Array"                     4.87 %       ±9.30% ±13.36% ±19.64%
 arrays/zero-int.js n=25 type="Int32Array"                     0.16 %       ±1.78%  ±2.51%  ±3.61%
 arrays/zero-int.js n=25 type="Int8Array"                      4.88 %       ±9.88% ±14.18% ±20.82%
 arrays/zero-int.js n=25 type="Uint16Array"                    3.03 %       ±9.31% ±13.27% ±19.29%
 arrays/zero-int.js n=25 type="Uint32Array"                    1.28 %       ±2.26%  ±3.24%  ±4.74%
 arrays/zero-int.js n=25 type="Uint8Array"                     4.17 %       ±9.60% ±13.79% ±20.27%

Be aware that when doing many comparisions the risk of a false-positive
result increases. In this case there are 30 comparisions, you can thus
expect the following amount of false-positive results:
  1.50 false positives, when considering a   5% risk acceptance (*, **, ***),
  0.30 false positives, when considering a   1% risk acceptance (**, ***),
  0.03 false positives, when considering a 0.1% risk acceptance (***)
Notifying upstream projects of job completion
Finished: SUCCESS

The accuracy there probably means those benchmarks are just not that reliable in nature...

gareth-ellis · 2018-01-27T11:00:36Z

@joyeecheung / @AndreasMadsen

LGTM, i seem to be having issues with my email notifications at the moment.
I've also noticed in the output that its not particularly easy to confirm what has been done - i should be able to update the script to make it very clear what change we make to the build under test, which should make this a lot clearer.

As there is the extra warning about false positives in the benchmark output, I think we can be sure that the change was in this build - but i think i can make it clearer in the future.!

Note, the job is actually running https://github.com/nodejs/benchmarking/blob/core-benchmark/experimental/benchmarks/community-benchmark/run.sh

I'll get this into master, as that's going to lead to even more confusion (I made some changes to try and reduce output from the build, but I need to also get it to take stderr away, as we have a lot of warnings that make the rest of the output trickier to understand.)

joyeecheung · 2018-01-29T05:26:55Z

Landed in 368517c, thanks!

PR-URL: #18373 Reviewed-By: Joyee Cheung <[email protected]> Reviewed-By: James M Snell <[email protected]> Reviewed-By: Ruben Bridgewater <[email protected]>

PR-URL: nodejs#18373 Reviewed-By: Joyee Cheung <[email protected]> Reviewed-By: James M Snell <[email protected]> Reviewed-By: Ruben Bridgewater <[email protected]>

benchmark: make compare.R easier to understand

5bb948c

nodejs-github-bot added the benchmark Issues and PRs related to the benchmark subsystem. label Jan 25, 2018

joyeecheung approved these changes Jan 25, 2018

View reviewed changes

jasnell approved these changes Jan 25, 2018

View reviewed changes

joyeecheung mentioned this pull request Jan 25, 2018

benchmark: cut down http benchmark run time #18379

Closed

3 tasks

maclover7 force-pushed the master branch from bb5575a to 993b716 Compare January 26, 2018 22:03

cjihrig force-pushed the master branch from 993b716 to 082f952 Compare January 26, 2018 22:36

gareth-ellis approved these changes Jan 27, 2018

View reviewed changes

BridgeAR approved these changes Jan 27, 2018

View reviewed changes

joyeecheung added the author ready PRs that have at least one approval, no pending requests for changes, and a CI started. label Jan 29, 2018

joyeecheung closed this Jan 29, 2018

joyeecheung removed the author ready PRs that have at least one approval, no pending requests for changes, and a CI started. label Jan 29, 2018

joyeecheung pushed a commit that referenced this pull request Jan 29, 2018

benchmark: make compare.R easier to understand

368517c

PR-URL: #18373 Reviewed-By: Joyee Cheung <[email protected]> Reviewed-By: James M Snell <[email protected]> Reviewed-By: Ruben Bridgewater <[email protected]>

evanlucas mentioned this pull request Jan 30, 2018

v9.5.0 release proposal #18464

Merged

evanlucas pushed a commit that referenced this pull request Jan 30, 2018

benchmark: make compare.R easier to understand

b5ec6ea

PR-URL: #18373 Reviewed-By: Joyee Cheung <[email protected]> Reviewed-By: James M Snell <[email protected]> Reviewed-By: Ruben Bridgewater <[email protected]>

MylesBorins added lts-watch-v6.x and removed lts-watch-v8.x labels Feb 27, 2018

MylesBorins pushed a commit that referenced this pull request Feb 27, 2018

benchmark: make compare.R easier to understand

f779a8b

PR-URL: #18373 Reviewed-By: Joyee Cheung <[email protected]> Reviewed-By: James M Snell <[email protected]> Reviewed-By: Ruben Bridgewater <[email protected]>

MylesBorins added the dont-land-on-v6.x label Feb 27, 2018

MylesBorins removed the lts-watch-v6.x label Feb 27, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

benchmark: make compare.R easier to understand #18373

benchmark: make compare.R easier to understand #18373

AndreasMadsen commented Jan 25, 2018 •

edited

Loading

AndreasMadsen commented Jan 25, 2018

joyeecheung left a comment

joyeecheung commented Jan 25, 2018

joyeecheung commented Jan 25, 2018

gareth-ellis commented Jan 27, 2018 •

edited

Loading

joyeecheung commented Jan 29, 2018

benchmark: make compare.R easier to understand #18373

benchmark: make compare.R easier to understand #18373

Conversation

AndreasMadsen commented Jan 25, 2018 • edited Loading

Checklist

Affected core subsystem(s)

AndreasMadsen commented Jan 25, 2018

joyeecheung left a comment

Choose a reason for hiding this comment

joyeecheung commented Jan 25, 2018

joyeecheung commented Jan 25, 2018

gareth-ellis commented Jan 27, 2018 • edited Loading

joyeecheung commented Jan 29, 2018

AndreasMadsen commented Jan 25, 2018 •

edited

Loading

gareth-ellis commented Jan 27, 2018 •

edited

Loading