-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make the performance CI job more stable #33710
Conversation
I'm currently tracking the performance of the commits in trunk in https://codehealth-riad.vercel.app |
Size Change: 0 B Total Size: 1.08 MB ℹ️ View Unchanged
|
Might be worth a try, I'm unsure if it will make a difference. 🤔 Do we have any guarantees for the virtural machines we're allocated? it looks like the VMs should have
I have seen a pretty large variance between performance runs, so I'm not sure what contributing to that. cc @sgomes @griffbrad in case you two ran into this before. This might be better for another PR, but If we're touching this command would it be possible to add a slimmed down run for the plugin publish? For the last few releases, I believe it's been timing out on the site editor tests. |
The tool I'm using to track the numbers take into account VM differences. Unless we control the hardware of the GitHub action runner we can't expect to have the same perfs between jobs. What I do to avoid that issues is to compute: newValue * oldBaseValue / newBaseValue. Where baseValue is a stable commit (Using wp/5.8 for now) |
I'm going to merge this. We'll see how it impacts the graphs over time. |
In the performance tests CI workflow we have been running every test suite three times for each branch under test. The goal of this work, introduced in #33710, was to reduce variation in the reported data from the tests. Unfortunately after measuring the data produced by our test runs, and by running experiments that run the test suites thirty times over, the overall variation is explained primarily by noise in the Github Actions container running our jobs. If running the test suites three times each reduces the variation in the results then it's not detectable above the amount of variation introduced beyond our control. Because these additional rounds extend the perf-test runtime by around twenty minutes on each PR we're reducing the number of rounds to a single pass for PR commits. This will free up compute resources and remove the performance tests as a bottleneck in the PR workflow. Additional work can and should be done to further remove variance in the testing results, but this practical step will remove an obstacle from developer iteration speed without reducing the quality of the numbers being reported.
In the performance tests CI workflow we have been running every test suite three times for each branch under test. The goal of this work, introduced in #33710, was to reduce variation in the reported data from the tests. Unfortunately after measuring the data produced by our test runs, and by running experiments that run the test suites thirty times over, the overall variation is explained primarily by noise in the Github Actions container running our jobs. If running the test suites three times each reduces the variation in the results then it's not detectable above the amount of variation introduced beyond our control. Because these additional rounds extend the perf-test runtime by around twenty minutes on each PR we're reducing the number of rounds to a single pass for PR commits. This will free up compute resources and remove the performance tests as a bottleneck in the PR workflow. Additional work can and should be done to further remove variance in the testing results, but this practical step will remove an obstacle from developer iteration speed without reducing the quality of the numbers being reported.
This is an attempt to make the performance CI job less random. I'm not certain if it achieves it or not but think it's a good change.
Right now, we do this:
This updates the test to do this instead:
So in theory, since we altern between branches, it should be a bit more stable. but it's just a theory.
The other thing here is that in order to altern branches we need to stop and start the wp-env between each test run, I think it's fine but this might add a minute or so to the time it takes to run this already long job.
Thoughts?