-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update end to end test sharding on CI, make Playwright tests faster, Puppeteer tests slower #50362
Conversation
I wonder why Playwright - 4 is taking so much longer. It might be good to identify which particular tests are slow and try to improve that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me! Thanks!
Are there any blockers for this PR? |
@Mamaduka I think it's ok. I'd just noticed that one of the shards was still taking a long time, so I wondered why they weren't more evenly balanced. I couldn't spot any obvious reasons. I need to revert 14b78c3, then it should be ok to merge, at which point the required checks will have to be updated in github. |
14b78c3
to
d8ef29d
Compare
In that respect it's probably good to ensure the test files don't get too big, and break them up across multiple files when they do. That would help the sharding work better, as one shard wouldn't spend too long on any individual file. I personally don't really like the idea of writing longer test cases as they're hard to read and reason about and difficult to refactor. |
It depends on the case, like everything, I guess 😄 But yes, I'm also in the camp "write small and atomic tests". |
Yeah, still seeing the same where Playwright 4 is taking much longer—10 minutes longer than Playwright 1. It's noticeable that one of the tests outputs a lot of warning messages (I think it's one of the tests that runs in firefox), so it could be that this makes it run more slowly. |
Flaky tests detected in d8ef29d. 🔍 Workflow run URL: https://github.com/WordPress/gutenberg/actions/runs/5263616230
|
I think sharding splits by test files but not test cases. So unless we split a big test file into multiple smaller ones, writing multiple shorter test cases would probably just make it slower 🤔 . If that ever becomes a bottleneck then perhaps we can figure out a quick convention so that others can follow? |
I think that's the case here. A possible follow-up would be to split firefox and webkit testing to a different shard. |
Yeah, I'm mainly advocating for breaking things down into shorter test files where it's logical (e.g. when a test has separate The comment about shorter test cases was more about maintainability. I realise that will make things slower, but I think there's a balance to be had. |
I think we're onto something here. When listing the tests run by the last shard, most of the tests running in two browsers end up there. It makes sense that tests running in multiple browsers will take longer than running in one. You can check this by following the command(s): # on trunk
npm run test:e2e:playwright -- --shard=2/2 --list
# on this branch
npm run test:e2e:playwright -- --shard=4/4 --list |
I completely agree for unit tests. I've found for e2e tests that they're a lot easier to work with when combined in a sensible way because they run so much faster and are more effective at catching bugs when they replicate a real user flow. This also helps document the expected behavior in a way that I think is easier to understand. The VS Code Playwright tooling around it is also quite good for stepping through the e2e tests for debugging. There's definitely a careful balance to which tests make sense to combine together to still keep it isolated enough to understand, debug, and not be flaky. I totally hear that point. Also, I don't mean to make this into a debate around e2e test length, but wanted to share from the perspective of someone who was wholeheartedly in the "keep everything isolated" camp and have shifted my philosophy around it since working with e2e tests. |
I'll merge this as tests are failing in |
…Puppeteer tests slower (WordPress#50362) * Increase playwright shards to 4, decrease puppeteer to 3 * Temporarily enable list reporter * Revert "Temporarily enable list reporter" This reverts commit 14b78c3.
What?
As more end to end tests are migrated to Playwright (and new tests are added), those jobs are naturally taking longer. (around 30-35 minutes)
Similarly as there are fewer tests running using Puppeteer those are becoming quicker. (around 10-15 minutes)
This adjusts the jobs so that there's more sharding for Playwright, now tests are split into 4 sets. Puppeteer tests are now reduced to 3 sets.
By my calculations, it should make both Puppeteer and Playwright tests complete in around 20 minutes, so they'll both be quicker than Performance tests again.