-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance: Add comment with perf changes in PRs #56899
base: trunk
Are you sure you want to change the base?
Conversation
Size Change: +36 B (0%) Total Size: 1.71 MB
ℹ️ View Unchanged
|
e051218
to
d91ce29
Compare
d91ce29
to
664a17c
Compare
Flaky tests detected in 2eb27e9. 🔍 Workflow run URL: https://github.com/WordPress/gutenberg/actions/runs/7143561472
|
664a17c
to
3e498ba
Compare
This reverts commit 3e498ba.
I think 30% is too much, maybe we should try initially with 20% or even 10%, check the PRs that get failures, and if there's too many, we bump it a bit. |
@ellatrix, I did test with a 10% threshold and got some failures:
It made me realize that we probably need some type of non-linear threshold, because the higher the ms value, the more significant the 10% become. For example, 10% for the Some notes for now since I need to focus on something else for a bit:
Do you have any thoughts on the above? cc @youknowriad @dmsnell |
I'm not sure we should "fail" the tests but I think we should show the changes in a PR comment (similar to the bundle size changes). Can we try that first before failing the tests. |
I agree this would be a good way of evaulating the check and its parameters. At the same time, we could still fail the PR if the regression is critical, e.g. 50%. |
Another idea is to add a column to the comparison results with the relative difference. For example:
|
Yeah tbh I agree with logging the results in a comment. If it fails people will just rerun the test until it succeeds, which would work if it's somewhere on the border |
Thanks @WunderBart I'm concerned about this change for a few reasons: one is the missing description and motivation for the new gate-keeper. Two is that I don't think we have a good feel yet for what should be rejected and what shouldn't, what is a regression and what isn't. As we've all discussed, our performance tests aren't exactly the most reliable or representative at the moment. I'm all for adding a new comment though to PRs and updating that comment as the PR's merge-branch is updated. It would help expose our performance metrics to folks and raise awareness of the different measurements we're making. Ultimately some changes are going to slow things down as a direct result of added functionality: are we to reject that work in blank-and-white terms because it pushed us over some threshold? What if we have some major change that reduces a metric and then it turns out that our speedup broke some part of the app - should we reject our revert PR because it undoes or partially undoes the changes we made? Personally the unforgiving and often-changing code-linting rejections have been a frustrating experience for me, and I'm worried about adding even more process to dishearten developers who are trying to add valuable contributions to Gutenberg. That is, I love reporting on how this impacts our recorded metrics, I'm skeptical that rejecting PRs is going to provide a net boost to the project. If nothing else, having the comments readily available in the PR might help a reviewer point out performance regressions and leave that call subjective (where I believe it belongs). Note too that we can hone in on more meaningful rows in the table. @jsnajdr did a good job with |
Love it. Looks like we have a consensus on making this about adding a comment with perf changes instead of failing the PR. I'll start working on that and change this PR's title so it's no longer confusing. Thank you all for all the feedback so far. |
What?
Let's add a comment to PR that will inform about the potential performance regression (or improvement!)
Why?
To avoid shipping code that unintentionally degrades the editor performance.
How?
The data is all there, we just need to calculate it.
Testing Instructions
TBA