-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor performance tests setup #49238
Conversation
d1fef2f
to
79ea0b0
Compare
Size Change: 0 B Total Size: 1.37 MB ℹ️ View Unchanged
|
Flaky tests detected in 0affd72. 🔍 Workflow run URL: https://github.com/WordPress/gutenberg/actions/runs/4700855963
|
1ec1960
to
887e8e8
Compare
887e8e8
to
fd35da9
Compare
This reverts commit 62f5d99.
@@ -39,7 +39,7 @@ jobs: | |||
|
|||
- name: Compare performance with trunk | |||
if: github.event_name == 'pull_request' | |||
run: ./bin/plugin/cli.js perf $GITHUB_SHA trunk --tests-branch $GITHUB_SHA | |||
run: ./bin/plugin/cli.js perf trunk |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we ensure that we run the performance tests that are on the PR. For instance when we make changes to the tests themselves?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh If I'm reading this, there's now an assumption that the perf cli.js is included within a Gutenberg plugin and that we need we compare with the current git hash automatically and use it also as tests branch.
One of the initial goal of the cli is to actually be "independent", while it lives in the Gutenberg plugin, it doesn't have any requirement in the sense that the current branch could be broken, have edits to Gutenberg code... it doesn't matter, the tool would always perform a clone. In fact one goal was to make it independent of Gutenberg entirely and be a tool any plugin can use. It seems the current PR goes into the opposite direction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the general direction of this PR, see my answer here.
(…) In fact one goal was to make it independent of Gutenberg entirely and be a tool any plugin can use.
I'm struggling to visualize the full picture of how the perf tests CLI runner, as an independent tool, would be utilized by other plugins. Would it not be running from the local gutenberg repo then, but would still be comparing gutenberg branches? It would be really helpful if you could give an example of a plugin development scenario where this tool is used 🙏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the initial goal was to be an independent tool, something like so:
# installing the tool
npm install -g @wordpress/plugin-cli
# writing a config file
touch { repoUrl: "some url", buildCommand: "npm run build", perfCommand: "npm run perf", wpPluginSlug: "someslug" } >> plugin-cli.json
# running the perf command
plugin-cli ---config plugin-cli.json some-branch some-other-branch
The goal was also for the cli to include more than just perf command, maintaining changeling, releasing to SVN/Github was also a command that was part of the same tool, I see that it was removed now in favor of all the git workflows. I'm kind of sad that it didn't remain as a command to be used by the workflows instead of writing it within the workflow files like we do now.
Anyway, given that the previous direction is also to move away from the goal of being an independent tool, one more step away won't be that harmful here, so I guess feel free to make that change if you think it's a good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the initial goal was to be an independent tool, something like so:
# installing the tool npm install -g @wordpress/plugin-cli # writing a config file touch { repoUrl: "some url", buildCommand: "npm run build", perfCommand: "npm run perf", wpPluginSlug: "someslug" } >> plugin-cli.json # running the perf command plugin-cli ---config plugin-cli.json some-branch some-other-branch
Oh, so by independent, you meant from Gutenberg entirely, which would run perf comparisons for any plugin?
If yes, I think we can still assume and leverage the fact that it will always be running from SOME plugin repo root and that the current branch will be a primary subject of the test. The CLI will still need to run the current plugin's dedicated perf tests, right? So it will be easier to develop/debug those tests with the CLI, and also utilize it by the target plugin's CI, where the current branch is always a primary reference.
Now that I think about it, we can replace Gutenberg
with a generic plugin
in this refactored script, and it would be all that's needed to make this PR not tie it any further to Gutenberg. There's no check that the current repo is actually Gutenberg.
Also, as I mentioned earlier, we can implement the --from-origin
and build everything from GH if we don't want to use the local code.
How does that sound?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me 👍
PREVIOUS_VERSION_BASE_10=$((PLUGIN_VERSION_ARRAY[0] * 10 + PLUGIN_VERSION_ARRAY[1] - 1)) | ||
PREVIOUS_RELEASE_BRANCH="release/$((PREVIOUS_VERSION_BASE_10 / 10)).$((PREVIOUS_VERSION_BASE_10 % 10))" | ||
WP_VERSION=$(awk -F ': ' '/^Tested up to/{print $2}' readme.txt) | ||
IFS=. read -ra WP_VERSION_ARRAY <<< "$WP_VERSION" | ||
WP_MAJOR="${WP_VERSION_ARRAY[0]}.${WP_VERSION_ARRAY[1]}" | ||
./bin/plugin/cli.js perf "wp/$WP_MAJOR" "$PREVIOUS_RELEASE_BRANCH" "$CURRENT_RELEASE_BRANCH" --wp-version "$WP_MAJOR" | ||
./bin/plugin/cli.js perf "$PREVIOUS_RELEASE_BRANCH" "wp/$WP_MAJOR" --wp-version "$WP_MAJOR" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did we remove $CURRENT_RELEASE_BRANCH
here? I know the release perf tests are broken now but the idea is to have a table that compares the current release, the previous release and the previous WP release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we certain here that $CURRENT_RELEASE_BRANCH
is the same thing as $GITHUB_SHA
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
$CURRENT_RELEASE_BRANCH
is actually release/xx.x
, which is also $GITHUB_HEAD_REF
, not $GUTHUB_SHA
. The primary reference for the CI perf in the refactored cli is $GITHUB_HEAD_REF
so this comparison will still be, e.g.: release/15.6
vs. release/15.5
vs. wp/6.1
. With the refactored CLI we need to only list the refs that we want to compare the current one to.
if ( ! runningInCI ) { | ||
await askForConfirmation( 'Ready to go? ' ); | ||
if ( ! localRef ) { | ||
throw new Error( 'Must be running from a Gutenberg repository root.' ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this my main issue with this PR. My hope for this tool was to be plugin independent, I wanted to transform it into a package that any plugin can use to run perf tests, or any other command.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, making it independent should not be a problem, as the logic for fetching and building from the origin is there. After checking your other comment above, I'm curious whether we should actually account for the fact that the script can run from the Gutenberg root. IMO it would make a lot of sense, as it would not break the original purpose to be an independent tool while being especially helpful in these situations:
- When developing or debugging locally, as it enables working on both the CLI perf tool and perf tests at the same time.
- When running from CI, where the branch we run the workflow from is always a primary reference, enabling us to speed things up by reusing it and its dependencies (restored from the cache ⚡).
'--abbrev-ref': null, | ||
HEAD: null, | ||
} ); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not clear to me what we're doing this within the performance command. We're creating a dependency between the performance command and Github CI runner. What if we do this logic outside and pass the "localRef" as a regular branch to compare like we used to do before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than creating a CI dependency, I'd argue that it adds CI support. When run in CI, the script identifies the local reference and uses it to build the plugin. This is quicker than our previous method, as it circumvents the need to clone the repository and install node modules.
During the refactor, I was fairly confident that this script would not be utilized outside an environment that would be the subject of the test, whether it's Gutenberg or another plugin. TBH, I'm still struggling to see an alternative use-case. If our intention is to build and measure or compare the performance of a specific feature, we can easily switch to the appropriate branch and use the local reference. From your perspective, it seems we want this script to act as a standalone application capable of fetching and performance-testing any references from the origin. While it wouldn't be difficult to include support for that (old) logic in the proposed implementation, I feel I'm yet to understand a compelling use-case example.
Personally I feel this PR is doing too much to be honest and I'd prefer if we make each of the change mentioned in the description separately, so we discuss the reasoning for the existing choices more clearly. |
I'd like to see these split as separate issues too, but I don't have as strong of a feel in it. I've been half-way following along and it's been hard to grasp everything going on. Personally I think it makes sense to get rid of The thing about reusing the branch for the test setup and also for a branch under test was there before but I made a mistake in not having it fully build all the packages and therefore the tests weren't building the Gutenberg plugin, injecting it into the WordPress env, and testing against the actual code. That should be able to be brought back with the appropriate fix if we want, and I think that will account for as much setup time as your changes. (In #45284 I stopped building parts of the app for the tests-branch branch because it didn't need them, but then in #45737 I started reusing the tests-branch if it was the same and already built - side note @youknowriad I now understand how this thing happened, because it looks like I had both of those PRs merge on the same day - they were a non-conflict merge conflict I didn't notice, where each was fine on its own but not together). I've noted that in |
Thank you for all the hard work! This is looking very promising to me! I agree it's a bit hard to review but I also don't think it necessary has to be split into smaller PRs. While reading the description of this PR, I've been thinking of potential ways to further improve the performance. This is just me thinking out loud and not relevant to this PR, but just want to jot it down somewhere in case I forgot 👇
|
@kevin940726, I was thinking the same thing. 🙌 The good part is that each trunk commit already has a plugin ZIP built and uploaded. All we'd need would be the URL to that artifact (which btw I was hoping to achieve within this PR 🙈), but the artifacts API doesn't seem to support searching by ref/sha/etc (yet?). I'd be happy to explore that in a follow-up PR, as it should be a big jump in speed and should also streamline the setup a lot. |
Oh, yeah! For some reason I totally forgot that it's a thing already 😅.
I think this is possible with a few more extra steps today!
curl https://api.github.com/repos/WordPress/gutenberg/actions/runs?branch=trunk&head_sha=<COMMIT_SHA> > runs.json
jq -r '.workflow_runs[] | select(.name=="Build Gutenberg Plugin Zip").artifacts_url' runs.json > artifact_url
curl -s $(cat artifact_url) | jq -r '.artifacts[] | select(.name=="gutenberg-plugin").archive_download_url' > download_url
The gotcha here is that the artifacts have retention limits and will be deleted after a certain amount of time (normally 30 days). To enable comparing older commits we might have to upload them to a persistent storage, like the release API? |
I'm wondering if there's anything I should do at this point to wrap up this PR. I understand the suggestion for splitting this into smaller PRs, but I'm wondering if it's necessary. Maybe we'll get a better picture after summarizing the current feedback/discussion. First of all, there are 3 functionality changes:
...and 2 improvements:
As far as the functionality changes go, we seem to have a partial consensus on removing the For the part where we use the local code, I've made my argument here and, to add to that, I think it doesn't make sense to fetch the primary reference from the origin until we decide to extract the script to a dedicated package. Currently, the primary use of the script is for the CI. Regarding the use of WP stable instead of bleeding edge, I think it makes the most sense as it increases the env consistency, which should work towards measurement stability. Is there any reason we should be testing against the bleeding edge? The improvements part is what speeds up the setup the most, especially the parallel builds, as the fixed tmp folder is relevant only for local development. I believe it's the main thing that makes this PR a bit hard to follow. I'm unsure if the above makes it easier for you to vote for or against splitting this PR. Anyhow, please let me know what you think, and I'll follow the vote. 😄 Note: If we decide to split, I'd suggest doing it as follows:
|
Nice, @kevin940726! 🙌 I'll definitely try that out.
I guess for this case we could just leave the part where we build the plugin from the ref. I don't think we'll be often testing against commits that >1 month old, will we? I can see another gotcha, where the ZIP has not been built yet - kinda the other way around. I think we'd either need a waiting mechanism, build from ref (as above), or just create the logic in a way that would fetch the last available ZIP for |
For me this PR comes down to "smart/implicit vs explicit behavior". I personally prefer explicit behavior: pass the branches to compare, pass the branch to use for the tests... over assumption based on the current branch. I agree that it's slower but I believe it to be more flexible and also opens the door for a more generic tool in the future. That said, I don't want to be the blocker here if you all think otherwise. |
I believe that for every big release someone runs this script locally to compare against previous releases.
Having run into this many times I think it'd be good to be explicit on the WP version we're using in the tests. Too many times the tests failed with no explicable reason, only later to discover that something changed in the WP copy in the env which is never called out in the test setup
From me 👍
Also from me 👍 but I don't have strong opinions. I don't think this is practically more explicit than saying "use the first given branch" and I think it never solved the problem it was introduced to solve; that problem still exists.
This ties in with eliminating
In several experiments I used an Otherwise my thoughts on this haven't changed. I appreciate all the work being done here, but I do think it's heavy to say the least.
If one of these fails we have to revert all of them. It's hard to leave a good commit message for these too, let alone review them. I won't stand in the way if you really believe this is best despite that. |
I'd vote for giving it a try! Worst case we'd have to revert, no big deal! We can monitor it after a few commits after it gets merged so that we don't mess up with the metrics too much.
IIRC, that's the case for a step on CI? I don't remember the exact details and I'm too lazy to check 😝 . Either way I think this is the easier part of the issue, we just need a persistent storage, which is relatively cheap or even free.
Yeah, I think we can wait on CI but for local compares we might have to fallback to build it from scratch. Or if the exact commit doesn't matter, we can just pick the latest built output, like you suggested! |
Now, that I think about it, let's make sure that we don't break the "manual" performance CI job where we have the ability to choose which branches to compare manually and choose which branch to use for the tests themselves. A job you can run from here https://github.com/WordPress/gutenberg/actions/workflows/performance.yml by clicking on "run workflow". |
edit: Custom comparison tests and runner are connected. See #49238 (comment).
The disparity between the performance.js source and the tests' source is actually what's currently causing the release perf tests to fail. The error is that the tests are saving result artifacts under a different name than the perf script is expecting them to be, so it cannot find them. For example, we can look at the most recent release perf job. The command is: ./bin/plugin/cli.js perf "wp/$WP_MAJOR" "$PREVIOUS_RELEASE_BRANCH" "$CURRENT_RELEASE_BRANCH" --wp-version "$WP_MAJOR" ...which values for that run are: ./bin/plugin/cli.js perf wp/6.2 release/15.7 release/15.8 --wp-version 6.2
Now that I think about it, the immediate fix would be adding the ./bin/plugin/cli.js perf wp/6.2 release/15.7 release/15.8 --test-branch release/15.8 --wp-version 6.2 vs. ./bin/plugin/cli.js perf release/15.8 release/15.7 wp/6.2 --wp-version 6.2 I'm not 100% sure, though, because I don't know if I think this is a good example of why it would be safer to use the job triggering ref as the source for both the perf runner script and the tests. I think that would actually be a good first step in arriving at what I'm proposing with this PR. I'll start drafting. |
There could be some differences but at the moment the release job runs, there's none (or almost none). The workflow dispatch job uses |
Ah, apologies for the confusion - I was looking at the workflow from my own branch where I already removed the
That's fair, though I'm still failing to see why we wouldn't just use the branch that we're triggering from ( |
Imagine we add a new metric today and we want to check whether React 18 upgrade had an impact on that metric. We can run the test manually by using "trunk" as the branch for the tests and compare the commit that did the React 18 with the previous one. |
I agreed on separating tests from comparison branches, it should remain as is. I asked about separating the test runner (performance.js) from the tests branch ( |
At the moment at least, the metrics are present in both the tests and the runner. In other words, there's a strong relationship between these two so I wouldn't mind if we use the same thing for both these two. But, I think it's just a bad design that they are tied, ideally perf tests define the metrics, and the runner is just orchestrating stuff without any relation to the tests themselves. But again It's not the case today, so 🤷 |
Yeah they're currently very much connected - the runner is responsible for computing and rendering the results that are produced by the tests. In an ideal scenario, which part of the process do you think should take responsibility for handling the results? How could we untie it from the runner? Maybe a separate script? We'd still be making that connection though, only with a different script... |
I think the runner doesn't need to know about the metrics, it can just use the "keys" of the JSON output by the performance tests. |
This has been addressed within #52022 |
What?
Refactor the performance testing environment setup to improve developer experience both locally and in CI.
TL;DR:
Read on for more details.
⚡︎ Reduce disk space usage to a fixed 3GB
While debugging the script, after a couple of reruns, I started running out of disk space. It turned out that the script is creating a new test environment (repo clones + wp-env docker images) with each run without clearing it when finished. Currently, the initial perf script run takes ~8GB (6GB for 3 repo clones + 2GB for wp-env docker images), and each next takes another ~6GB for new clones.
Furthermore, one of the comparison branches is always the branch that we're currently checked out at (true for each CI job), but for some reason, it's being cloned twice from the origin: one for the test environment and one for the test source. This takes not only extra space but also time.
The refactored approach includes two changes to address the above:
wp-gutenberg-performance-tests
),This setup requires 3GB for a typical job where we compare two branches, e.g., the current one vs.
trunk
.⚡︎ Use the current root branch code directly
Let's say your current branch is
try/some-stuff
and you'd like to compare its performance againsttrunk
. To do that, you need to run the following from the gutenberg root:Currently, this will not use your local code. Instead, it will clone the branch to be used as the env source. It will also create another copy of that branch to be used as the source of the tests. In reality, we always use the current branch as a reference (as each workflow perf comparison job does), so using what's already available makes more sense. For the CI, it means that it will only need to build the current branch, allowing us to drop the clones and extra (uncached)
npm
installs. Locally, it means we can test against the currently developed code, which isn't currently possible.⚡︎ Use the current root branch as a tests source
In addition to the above, the current branch should always be used as the tests source because they should always be coupled with the workflow and the CLI runner script. Currently, the tests branch is either the first one passed for comparison or the one specified by the
--test-branch
option. This approach can cause issues, as exemplified by the currently failing release performance tests where the branch used as the tests source (wp/6.1
) is not aligned with workflow and CLI source (release/15.6
).Because of the above, keeping the
--test-branch
option didn't make sense, so I've removed it.⚡︎ Enable rerunning the tests using the existing environment (locally)
Using the same base folder for each run enables the reuse of the environment. Currently, each run creates the whole environment from scratch, which takes ~8mins on an M1 Mac. The refactored approach allows for the reuse of existing env, which skips the setup if the builds are up to date and move straight to the tests. Unfortunately, I don't think this is possible currently in the CI, but it helps a ton when working with the tests locally.
⚡︎ Run each branch setup in parallel
Fetching/building separate branches can be done in parallel, saving some extra time. CI efficiency is too erratic to give solid numbers, but testing locally, the refactored setup takes 50% less time than the current one when building from scratch. For my machine, which is an M1 Mac, it reduced from ~8 to ~4 minutes.
⚡︎ Use the latest stable WordPress version instead of bleeding edge by default
We're currently using the WP bleeding edge (
core: "WordPress/WordPress"
in wp-env config) for the performance tests environment unless specified otherwise via the--wp-version
option. I'm not sure if that's intentional, but it makes more sense to me to run performance tests against the latest production release instead (core: null
) for more consistent measurements.Testing Instructions
Checkout this branch (
refactor/perf-tests-tmp-folder
) and runnpm install
.The following scenarios should be working as expected:
Compare the current branch with another branch or commit SHA. There should not be a need to pass the current branch's name, e.g.:
Compare the current branch with two or more other git refs, e.g.:
Compare using a specific WP version:
Rerun the above to ensure the setup is being reused:
To ensure whether other specific comparisons work, e.g., the
Compare performance with current WordPress Core and previous Gutenberg versions
from the performance.yml workflow, you'll need to do the following.Assuming we're testing
release/15.6
vs.release/15.5
andwp/6.1
:git checkout release/15.6
as it's the one from which the workflow will be dispatched,git merge origin refactor/perf-test-results-path
,nvm use && npm install
,bin/plugin/cli.js perf release/15.5 wp/6.1 --wp-version 6.1
,git reset --hard origin/release/15.6
.The script should recognize if the current root branch name is passed explicitly and still use the local code, e.g.:
Finally, ensure that when comparing this branch with
trunk
, they should both show the same numbers. This should be proof that the new logic does not affect the perf measurements.