Run comparative benchmark #11128

fcollonval · 2021-09-23T12:28:38Z

References

Follow-up of the discussion at the JuypterLab dev meeting jupyterlab/frontends-team-compass#128 (comment)

From all the tests carry out in this PR, here are the conclusions:

using a single page object instead of one per test, increases the execution time.
running 50 experiments are a reasonable number to reach statistical confidence - and using 100 experiments increases the chance some errors occurs (mainly related to test fixture teardown error - I could not figure out which fixture was producing that).
for the notebook with code cells, a delay of 500ms is not captured when opening the file (1000ms is though).
Not using galata fixtures makes test less flaky (and closing much faster so the helper is bringing lots of slow down). But a fake delay of 500ms is not capture neither on notebook open action with code cell but it is if there are only markdown.

Anyway this is a improvement.

Don't look at the latest benchmark result as they compared master with this branch that is changing the benchmark test - hence the impact on performance.

Code changes

Add interval change computation from jupyterlab/benchmarks repository
Homogeneous interface for the report and the graph creation functions in BenchmarkReporter.

Change the logic for benchmark test:

Compute the reference and the challenger in the same job to be hardware independent
For PR, the reference is the common ancestor with the target branch and the challenger is the head of the PR branch.
Set job to run:
- on approved PR review
- on PR review containing please run benchmark
- ~~every sunday to evaluate changes introduced by the last week commits~~
  This is commented for now
Document how to trigger a benchmark.

User-facing changes

N/A

Backwards-incompatible changes

N/A

jupyterlab-probot · 2021-09-23T12:28:40Z

Thanks for making a pull request to jupyterlab!
To try out this branch on binder, follow this link:

fcollonval

Comment test to trigger the benchmark test

fcollonval · 2021-09-24T09:56:29Z

First benchmark report @ 6706bac with number of experiments = 20

The execution time (in milliseconds) are grouped by test file, test type and browser.
For each case, the following values are computed: min <- [1st quartile - median - 3rd quartile] -> max.

Results table

Test file	large_code_notebook	large_md_notebook
open
`chromium`
actual	6895 <- [7210 - 7622 - 7819] -> 9317	3608 <- [3756 - 3873 - 4038] -> 4636
expected	6972 <- [7462 - 8080 - 8325] -> 9239	3624 <- [3742 - 3943 - 4084] -> 5416
Mean comparison	-9.7% -- -4.7% -- 0.2	-7.4% -- -1.8% -- 3.7
switch-from
`chromium`
actual	451 <- [460 - 488 - 508] -> 552	312 <- [341 - 363 - 372] -> 390
expected	446 <- [478 - 488 - 527] -> 683	322 <- [342 - 357 - 365] -> 446
Mean comparison	-10.9% -- -4.8% -- 1.3	-6.9% -- -1.9% -- 3.1
switch-to
`chromium`
actual	1727 <- [1788 - 1849 - 1906] -> 2114	1536 <- [1620 - 1669 - 1693] -> 1777
expected	1733 <- [1802 - 1828 - 1962] -> 2165	1539 <- [1603 - 1644 - 1702] -> 1945
Mean comparison	-5.7% -- -1.4% -- 2.9	-2.3% -- 0.7% -- 3.7
close
`chromium`
actual	5399 <- [6242 - 7170 - 8583] -> 16980	3369 <- [3900 - 4088 - 4316] -> 4952
expected	5479 <- [6051 - 6757 - 8168] -> 24997	3550 <- [3853 - 4092 - 4282] -> 4518
Mean comparison	-32.1% -- 0.2% -- 32.4	-4.7% -- 0.6% -- 6.0

Changes are computed with expected as reference.

fcollonval · 2021-09-24T15:55:38Z

Second report for 50 experiments @ ba05cad

The execution time (in milliseconds) are grouped by test file, test type and browser.
For each case, the following values are computed: min <- [1st quartile - median - 3rd quartile] -> max.

The mean relative comparison is computed with 95% confidence.

Results table

Test file	large_code_notebook	large_md_notebook
open
`chromium`
actual	5292 <- [6461 - 7048 - 7321] -> 8389	3197 <- [3285 - 3397 - 3487] -> 3754
expected	5062 <- [6424 - 7004 - 7236] -> 7506	3187 <- [3262 - 3296 - 3339] -> 3622
Mean relative change	3.5% ± 4.0%	2.5% ± 1.4%
switch-from
`chromium`
actual	402 <- [441 - 459 - 474] -> 534	287 <- [306 - 320 - 328] -> 372
expected	399 <- [428 - 444 - 466] -> 715	279 <- [300 - 317 - 327] -> 404
Mean relative change	0.8% ± 3.5%	0.7% ± 2.4%
switch-to
`chromium`
actual	1599 <- [1635 - 1684 - 1724] -> 1915	1379 <- [1430 - 1459 - 1501] -> 1563
expected	1558 <- [1624 - 1671 - 1697] -> 1880	1335 <- [1403 - 1433 - 1454] -> 1547
Mean relative change	1.7% ± 1.7%	2.3% ± 1.2%
close
`chromium`
actual	5132 <- [7442 - 11427 - 14877] -> 19522	3347 <- [3517 - 3670 - 3789] -> 4235
expected	5182 <- [6805 - 12295 - 14860] -> 15736	3182 <- [3475 - 3569 - 3704] -> 4069
Mean relative change	0.5% ± 14.2%	3.1% ± 2.3%

Changes are computed with expected as reference.

fcollonval · 2021-09-25T08:11:45Z

Third report - 100 experiments @ 55a1b26

The test for markdown file crashed after 51 experiments - only result for the code case are available

The execution time (in milliseconds) are grouped by test file, test type and browser.
For each case, the following values are computed: min <- [1st quartile - median - 3rd quartile] -> max.

The mean relative comparison is computed with 95% confidence.

Results table

Test file	large_code_notebook	large_md_notebook
open
`chromium`
actual	4981 <- [6424 - 7015 - 7264] -> 7740	3210 <- [3321 - 3385 - 3449] -> 3779
expected	5189 <- [6616 - 7132 - 7344] -> 7951	3142 <- [3280 - 3330 - 3398] -> 3652
Mean relative change	-1.9% ± 2.8%	Error: Data have different length
switch-from
`chromium`
actual	404 <- [440 - 450 - 469] -> 541	300 <- [321 - 329 - 338] -> 399
expected	418 <- [444 - 468 - 482] -> 605	277 <- [304 - 312 - 327] -> 452
Mean relative change	-2.6% ± 1.7%	Error: Data have different length
switch-to
`chromium`
actual	1488 <- [1643 - 1683 - 1734] -> 1867	1385 <- [1480 - 1494 - 1516] -> 1562
expected	1500 <- [1659 - 1688 - 1723] -> 1916	1385 <- [1437 - 1461 - 1484] -> 1681
Mean relative change	-0.0% ± 1.0%	Error: Data have different length
close
`chromium`
actual	4999 <- [5528 - 10839 - 14874] -> 19392	3268 <- [3581 - 3677 - 3834] -> 4069
expected	4783 <- [5650 - 10864 - 14596] -> 19152	2982 <- [3536 - 3635 - 3713] -> 4027
Mean relative change	1.3% ± 11.7%	Error: Data have different length

Changes are computed with expected as reference.

fcollonval

please run benchmark

github-actions · 2021-10-08T19:25:46Z

Benchmark report

The execution time (in milliseconds) are grouped by test file, test type and browser.
For each case, the following values are computed: min <- [1st quartile - median - 3rd quartile] -> max.

The mean relative comparison is computed with 95% confidence.

Results table

Test file	large_code_notebook	large_md_notebook
open
`chromium`
actual	7616 <- [8614 - 8729 - 9017] -> 9661	3633 <- [3731 - 3824 - 4287] -> 4419
expected	6505 <- [7136 - 8658 - 9043] -> 10213	4284 <- [4642 - 4722 - 4861] -> 5007
Mean relative change	⚠️ 6.2% ± 2.7%	⚠️ -15.3% ± 1.3%
switch-from
`chromium`
actual	1027 <- [1080 - 1112 - 1259] -> 1819	684 <- [784 - 871 - 899] -> 1117
expected	481 <- [511 - 530 - 550] -> 708	331 <- [356 - 363 - 374] -> 417
Mean relative change	⚠️ 119.7% ± 6.0%	⚠️ 133.5% ± 5.1%
switch-to
`chromium`
actual	968 <- [995 - 1005 - 1020] -> 1071	824 <- [925 - 942 - 958] -> 1004
expected	1792 <- [1897 - 1936 - 1983] -> 2145	1649 <- [1677 - 1700 - 1726] -> 1867
Mean relative change	⚠️ -48.1% ± 0.4%	⚠️ -45.3% ± 0.5%
close
`chromium`
actual	1118 <- [1153 - 1178 - 1215] -> 1546	738 <- [770 - 782 - 835] -> 995
expected	6953 <- [17236 - 19474 - 20559] -> 23850	4157 <- [4864 - 4993 - 5274] -> 5839
Mean relative change	⚠️ -93.3% ± 0.3%	⚠️ -84.0% ± 0.3%

Changes are computed with expected as reference.

jtpio · 2021-10-19T08:23:32Z

Thanks @fcollonval for working on this, looks great 👍

.github/workflows/benchmark.yml

blink1073 · 2021-10-19T11:01:07Z

Nice work @fcollonval!

jtpio · 2021-11-10T16:24:42Z

Looks like this could potentially be backported to 3.2.x?

This change exposes some useful Galata methods such as initTestPage which don't seem to be on 3.2.x at the moment.

fcollonval · 2021-11-10T16:44:37Z

It breaks the galata.newPage API and adds a new dependency @stdlib/stats.

If the API change is seen as acceptable, it could indeed be ported (probably by bumping the minor version of @jupyterlab/galata to 4.1.0).

jtpio · 2021-11-11T09:03:46Z

Ah ok, then it's fine we can skip the backport for now 👍

fcollonval · 2021-11-11T11:02:07Z

Thinking a bit more about it. Actually without the correction brought here newPage won't mock (and isolated) the test as the user may think. Moreover I did not bump galata major version. So I agree with you we can backport it.

The newPage is unlikely to be called when dealing with JupyterLab core or extensions tests as it is a one-page app.

fcollonval · 2021-11-11T11:02:25Z

@meeseeksdev please backport to 3.2.x

lumberbot-app · 2021-11-11T11:02:45Z

Owee, I'm MrMeeseeks, Look at me.

There seem to be a conflict, please backport manually. Here are approximate instructions:

Checkout backport branch and update it.

git checkout 3.2.x
git pull

Cherry pick the first parent branch of the this PR on top of the older branch:

git cherry-pick -m1 963299cf92e75dde44a7953d177b616b0ded9894

You will likely have some merge/cherry-pick conflict here, fix them and commit:

git commit -am 'Backport PR #11128: Run comparative benchmark'

Push to a named branch :

git push YOURFORK 3.2.x:auto-backport-of-pr-11128-on-3.2.x

Create a PR against branch 3.2.x, I would have named this PR:

"Backport PR #11128 on branch 3.2.x (Run comparative benchmark)"

And apply the correct labels and milestones.

Congratulation you did some good work! Hopefully your backport PR will be tested by the continuous integration and merged soon!

Remember to remove Still Needs Manual Backport label once the PR gets merged.

If these instruction are inaccurate, feel free to suggest an improvement.

fcollonval added 2 commits September 23, 2021 14:22

Add interval comparison

97e0bfe

Change benchmark test

4279ed7

fcollonval added this to the 4.0 milestone Sep 23, 2021

fcollonval added the maintenance label Sep 23, 2021

fcollonval self-assigned this Sep 23, 2021

github-actions bot added the tag:Testing label Sep 23, 2021

fcollonval commented Sep 23, 2021

View reviewed changes

fcollonval added 2 commits September 23, 2021 14:36

Fix name collision

54efb70

Fix statement order

af831ad

fcollonval force-pushed the ft/bench-compare branch 5 times, most recently from 81f87f7 to 5ce9ffc Compare September 23, 2021 14:41

[skip ci] Debug benchmark

51a4ae6

fcollonval force-pushed the ft/bench-compare branch 3 times, most recently from f04ceee to 60690a1 Compare September 24, 2021 07:05

[skip ci] Activate benchmark tests

6706bac

fcollonval force-pushed the ft/bench-compare branch from 60690a1 to 6706bac Compare September 24, 2021 07:47

[skip ci] improve reporter

c2dd8ee

fcollonval force-pushed the ft/bench-compare branch from 22610a4 to c2dd8ee Compare September 24, 2021 10:16

[skip ci] More robust benchmark computation

ba05cad

jupyterlab deleted a comment from github-actions bot Sep 24, 2021

[skip ci] Run with 100 experiments

55a1b26

[skip ci] Test fake delay of 100ms with 50 exp

153c3fd

fcollonval force-pushed the ft/bench-compare branch 2 times, most recently from 612b245 to 5f2aa11 Compare October 8, 2021 15:54

[skip ci] Fix benchmark job condition

a55eff7

fcollonval force-pushed the ft/bench-compare branch from 5f2aa11 to a55eff7 Compare October 8, 2021 15:55

fcollonval commented Oct 8, 2021

View reviewed changes

fcollonval marked this pull request as ready for review October 9, 2021 10:54

fcollonval and others added 2 commits October 9, 2021 13:03

Update benchmark documentation

40f3b1d

Lint code

55b5a3b

fcollonval added the status:Needs Review label Oct 16, 2021

jtpio reviewed Oct 19, 2021

View reviewed changes

.github/workflows/benchmark.yml Show resolved Hide resolved

blink1073 merged commit 963299c into jupyterlab:master Oct 19, 2021

fcollonval deleted the ft/bench-compare branch October 20, 2021 10:08

lumberbot-app bot added the Still Needs Manual Backport label Nov 11, 2021

fcollonval added a commit to fcollonval/jupyterlab that referenced this pull request Nov 11, 2021

Backport PR jupyterlab#11128: Run comparative benchmark

5a2ae65

fcollonval mentioned this pull request Nov 11, 2021

Backport PR #11128 on branch 3.2.x (Run comparative benchmark) #11441

Merged

fcollonval added a commit that referenced this pull request Nov 15, 2021

Backport PR #11128: Run comparative benchmark (#11441)

af2d159

fcollonval removed status:Needs Review Still Needs Manual Backport labels Nov 29, 2021

fcollonval mentioned this pull request Nov 29, 2021

Add test for benchmark.distributionChange #11573

Merged

github-actions bot added the status:resolved-locked Closed issues are locked after 30 days inactivity. Please open a new issue for related discussion. label May 29, 2022

github-actions bot locked as resolved and limited conversation to collaborators May 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run comparative benchmark #11128

Run comparative benchmark #11128

fcollonval commented Sep 23, 2021 •

edited

Loading

jupyterlab-probot bot commented Sep 23, 2021

fcollonval left a comment •

edited

Loading

fcollonval commented Sep 24, 2021

fcollonval commented Sep 24, 2021 •

edited

Loading

fcollonval commented Sep 25, 2021 •

edited

Loading

fcollonval left a comment

github-actions bot commented Oct 8, 2021

jtpio commented Oct 19, 2021

blink1073 commented Oct 19, 2021

jtpio commented Nov 10, 2021

fcollonval commented Nov 10, 2021

jtpio commented Nov 11, 2021

fcollonval commented Nov 11, 2021

fcollonval commented Nov 11, 2021

lumberbot-app bot commented Nov 11, 2021

Run comparative benchmark #11128

Run comparative benchmark #11128

Conversation

fcollonval commented Sep 23, 2021 • edited Loading

References

Code changes

User-facing changes

Backwards-incompatible changes

jupyterlab-probot bot commented Sep 23, 2021

fcollonval left a comment • edited Loading

Choose a reason for hiding this comment

fcollonval commented Sep 24, 2021

fcollonval commented Sep 24, 2021 • edited Loading

fcollonval commented Sep 25, 2021 • edited Loading

fcollonval left a comment

Choose a reason for hiding this comment

github-actions bot commented Oct 8, 2021

Benchmark report

jtpio commented Oct 19, 2021

blink1073 commented Oct 19, 2021

jtpio commented Nov 10, 2021

fcollonval commented Nov 10, 2021

jtpio commented Nov 11, 2021

fcollonval commented Nov 11, 2021

fcollonval commented Nov 11, 2021

lumberbot-app bot commented Nov 11, 2021

fcollonval commented Sep 23, 2021 •

edited

Loading

fcollonval left a comment •

edited

Loading

fcollonval commented Sep 24, 2021 •

edited

Loading

fcollonval commented Sep 25, 2021 •

edited

Loading