-
-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run comparative benchmark #11128
Run comparative benchmark #11128
Conversation
Thanks for making a pull request to jupyterlab! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment test to trigger the benchmark test
81f87f7
to
5ce9ffc
Compare
f04ceee
to
60690a1
Compare
60690a1
to
6706bac
Compare
First benchmark report @ 6706bac with number of experiments = 20 The execution time (in milliseconds) are grouped by test file, test type and browser. Results table
Changes are computed with expected as reference. |
22610a4
to
c2dd8ee
Compare
Second report for 50 experiments @ ba05cad The execution time (in milliseconds) are grouped by test file, test type and browser. The mean relative comparison is computed with 95% confidence. Results table
Changes are computed with expected as reference. |
Third report - 100 experiments @ 55a1b26
The execution time (in milliseconds) are grouped by test file, test type and browser. The mean relative comparison is computed with 95% confidence. Results table
Changes are computed with expected as reference. |
612b245
to
5f2aa11
Compare
5f2aa11
to
a55eff7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please run benchmark
Benchmark reportThe execution time (in milliseconds) are grouped by test file, test type and browser. The mean relative comparison is computed with 95% confidence. Results table
Changes are computed with expected as reference. |
Thanks @fcollonval for working on this, looks great 👍 |
Nice work @fcollonval! |
Looks like this could potentially be backported to This change exposes some useful Galata methods such as |
It breaks the If the API change is seen as acceptable, it could indeed be ported (probably by bumping the minor version of @jupyterlab/galata to 4.1.0). |
Ah ok, then it's fine we can skip the backport for now 👍 |
Thinking a bit more about it. Actually without the correction brought here
|
@meeseeksdev please backport to 3.2.x |
Owee, I'm MrMeeseeks, Look at me. There seem to be a conflict, please backport manually. Here are approximate instructions:
And apply the correct labels and milestones. Congratulation you did some good work! Hopefully your backport PR will be tested by the continuous integration and merged soon! Remember to remove If these instruction are inaccurate, feel free to suggest an improvement. |
References
Follow-up of the discussion at the JuypterLab dev meeting jupyterlab/frontends-team-compass#128 (comment)
From all the tests carry out in this PR, here are the conclusions:
using a single
page
object instead of one per test, increases the execution time.running 50 experiments are a reasonable number to reach statistical confidence - and using 100 experiments increases the chance some errors occurs (mainly related to test fixture teardown error - I could not figure out which fixture was producing that).
for the notebook with code cells, a delay of 500ms is not captured when opening the file (1000ms is though).
Not using galata fixtures makes test less flaky (and closing much faster so the helper is bringing lots of slow down). But a fake delay of 500ms is not capture neither on notebook open action with code cell but it is if there are only markdown.
Anyway this is a improvement.
Code changes
Add interval change computation from
jupyterlab/benchmarks
repositoryHomogeneous interface for the report and the graph creation functions in
BenchmarkReporter
.Change the logic for benchmark test:
Compute the reference and the challenger in the same job to be hardware independent
For PR, the reference is the common ancestor with the target branch and the challenger is the head of the PR branch.
Set job to run:
please run benchmark
every sunday to evaluate changes introduced by the last week commitsThis is commented for now
Document how to trigger a benchmark.
User-facing changes
N/A
Backwards-incompatible changes
N/A