-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade Playwright to 1.39.0 #54051
Upgrade Playwright to 1.39.0 #54051
Conversation
Size Change: 0 B Total Size: 1.7 MB ℹ️ View Unchanged
|
Flaky tests detected in 2c19ed1. 🔍 Workflow run URL: https://github.com/WordPress/gutenberg/actions/runs/6809474155
|
There is an odd issue with failing tests. Playwright now sees Update: I submitted the bug report microsoft/playwright#26809. |
I just saw that the Playwright issue was removed from the next release milestone. While the reported case is a regression, we should also update our tests to unblock the updates. Why?A spinner in the image block markup means the media is still uploading. The tests need to wait until the upload process is finished. |
ed3720d
to
869e9d7
Compare
I've got a solution for the image/svg problem but discovered a few more blockers from upgrading to the latest Playwright version. Blockers
Screenshot |
@Mamaduka Hmm, I can't reproduce the TS errors in the screenshots. I checked out to the branch, did |
869e9d7
to
7b302b9
Compare
Sorry, @kevin940726. The repro required rebase and bumping the version. I committed changes using the
|
Well, that was fast 😄 Thank you, @kevin940726! |
[ | ||
path.resolve( require.resolve( 'playwright-core' ), '..', 'cli.js' ), | ||
'install', | ||
], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@swissspidy, @kevin940726, I had to change the CLI path resolution to this (reason #22612).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it works it works :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Could we just use npx
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried that when I initially found the issue, but it failed locally for me. Pascal might have more details on why we're using node
vs. npx
.
Pushed the fix for the image block e2e tests (#54051 (comment)), unless there're any other failures this should be good to merge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! Thank you!
@kevin940726, the 299a2fa is probably causing The test was using gutenberg/test/e2e/specs/editor/blocks/paragraph.spec.js Lines 86 to 88 in 6340800
Update: The |
Oops! Yep, I accidentally ignored |
3aaf5ea
to
2c19ed1
Compare
The current API. The idea is that |
The 0864852 commit fixes failing cc @WunderBart |
0864852
to
8507143
Compare
Is this PR responsible of the performance metric jump in the "first block" metric in http://codevitals.run |
Do you mean "jump" in a good way? If I'm reading the site correctly, the "First block load" metrics have been improved.
In theory, improved performance by the Playwright can improve our metrics. But knowing sure reason would be better. Unfortunately, I don't have a clue at the moment; maybe others have some ideas 😅 |
It goes in the right direction yeah but a change in the test tool shouldn't impact the metric ideally.
Upgrading playwright can indeed make the tests faster but it shouldn't improve the metrics in our graphs because our graphs compare the current commit with a reference commit and normalize the numbers relatively to and old number for a reference commit. This means that upgrading playwright should in theory update both commits similarly and with the normalization applied, the number should stay stable in the graphs. What happens here, is that the upgrade has made the test faster for the latest commit but the upgrade didn't have the same impact on the reference commit which is strange unless we may be doing something wrong in our tests. |
* Bump to latest * Revert unneeded changes for dragFiles * Fix 'playwright-core' CLI path resolution * Fix Image e2e tests * Fix drop * Fix failing 'should select with shift + click' e2e test --------- Co-authored-by: Kai Hao <[email protected]>
It might be an issue with the base commit. Looking at the upgrade commit job, we see the following numbersfor the
Now, looking at the preceding commit, the numbers were:
In both cases, the difference between base and merge commits is ~2 seconds. This might be a static bump between the commits independent of the framework version (or even the framework itself). The UI execution speed could have improved proportionally, but a request might have improved between the base and newer commits that will always have the same ~2s timing difference. Does that make sense? |
@WunderBart it makes sense and it confirms for me that the metric computation is wrong. It feels like the metric includes a static value that comes from the test itself and not from the actual code loading. |
@youknowriad, I think the issue might be with how we chose the new base commit recently. The time between the merge commit (that updated the base) and the new base commit was almost 3 months, so there already was a big difference in If we update/reset the base commit, shouldn't we always choose the one that is as close as possible to the merge one so there's no performance difference between the two? The normalization factor would be 1, so we'd likely see some bump from the drift itself, but from that point on, any performance change should be tracked correctly, if I'm not mistaken. |
@WunderBart I'm not following, I'm not sure how this relates to the base commit at all. We're updating playwright and we're updating it for both the base commit and this commit, so the numbers should change relatively in the same way as if we didn't update playwright. |
What if they did update relatively, but there already was a 2s static bump between the merge and base commits? For example, when we re-set the base, the X metric values were:
There already was a 2s bump, which we don't know the source of. For the sake of this example, let's assume it came from an improved request, making it a static bump between the commits. Let's assume the next commit updates Playwright, and the browser execution speed is 30% faster. Assuming the 2000ms is static and not affected by the update, the new values would become:
If we had re-set the base with a commit with the same performance, and even if the static bump happened along the way, the Playwright update would not affect the metric. Consider the following measurement timeline:
I hope it makes sense! 😅 |
This is the assumption that I disagree with I think. If 2000ms is a "static" update, then we're measuring the wrong thing. We're not measuring the time the server loads the first block, we're measuring the time it takes playwright to load the first block. In other words, if we're measuring the correct thing, the difference should never be static across two job runs. Maybe our metric is affected by playwright timeout or things like that and it shouldn't be, we should measure real DOM timeouts, real server load times... and playwright's own code shouldn't affect the measure. |
The test itself is very simple as it just waits for the first block selector (inside the editor iframe) - not much to be changed there. I'd love to confirm whether Playwright has anything to do with it, though, but I wasn't able to repro locally so far. Regardless, I think the next time we reset the base, we should choose a commit closest to the merge one so it's at least easier to track down weird bumps like this one. |
playwright have built-in "waits" when you click buttons and perform any action basically. I think for performance tests, these should be disabled as much as we can. |
No UI actions are performed for the loading test (which includes As for, e.g., a button waits, Playwright ensures that the button is visible, stable (not moving), and clickable (not disabled, not overlayed) before clicking it. We were also doing these checks with Puppeteer (e.g. via |
IMO, any browser launcher will impact the metrics, which is unavoidable. If we really want to minimize the impact then we might want to use |
What?
Upgrade Playwright to 1.39.0
What's new?
Testing Instructions
CI checks are green.