-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Perf testing: Cover Site Editor loading time #23842
Changes from all commits
7a6026f
40c77ca
17328fd
d2de7a7
aab0fe5
374b29a
a39f46d
3b314bf
ccc62da
48d1b3d
41f8057
db31da8
5ac6ec4
8f9b021
164611c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,6 +2,7 @@ | |
* External dependencies | ||
*/ | ||
const path = require( 'path' ); | ||
const { pickBy, mapValues } = require( 'lodash' ); | ||
|
||
/** | ||
* Internal dependencies | ||
|
@@ -19,7 +20,8 @@ const config = require( '../config' ); | |
/** | ||
* @typedef WPPerformanceCommandOptions | ||
* | ||
* @property {boolean=} ci Run on CI. | ||
* @property {boolean=} ci Run on CI. | ||
* @property {string=} testsBranch The branch whose performance test files will be used for testing. | ||
*/ | ||
|
||
/** | ||
|
@@ -46,14 +48,14 @@ const config = require( '../config' ); | |
/** | ||
* @typedef WPFormattedPerformanceResults | ||
* | ||
* @property {string} load Load Time. | ||
* @property {string} domcontentloaded DOM Contentloaded time. | ||
* @property {string} type Average type time. | ||
* @property {string} minType Minium type time. | ||
* @property {string} maxType Maximum type time. | ||
* @property {string} focus Average block selection time. | ||
* @property {string} minFocus Min block selection time. | ||
* @property {string} maxFocus Max block selection time. | ||
* @property {string=} load Load Time. | ||
* @property {string=} domcontentloaded DOM Contentloaded time. | ||
* @property {string=} type Average type time. | ||
* @property {string=} minType Minium type time. | ||
* @property {string=} maxType Maximum type time. | ||
* @property {string=} focus Average block selection time. | ||
* @property {string=} minFocus Min block selection time. | ||
* @property {string=} maxFocus Max block selection time. | ||
*/ | ||
|
||
/** | ||
|
@@ -64,7 +66,7 @@ const config = require( '../config' ); | |
* @return {number} Average. | ||
*/ | ||
function average( array ) { | ||
return array.reduce( ( a, b ) => a + b ) / array.length; | ||
return array.reduce( ( a, b ) => a + b, 0 ) / array.length; | ||
} | ||
|
||
/** | ||
|
@@ -119,13 +121,15 @@ function curateResults( results ) { | |
* | ||
* @param {string} performanceTestDirectory Path to the performance tests' clone. | ||
* @param {string} environmentDirectory Path to the plugin environment's clone. | ||
* @param {string} testSuite Name of the tests set. | ||
* @param {string} branch Branch name. | ||
* | ||
* @return {Promise<WPFormattedPerformanceResults>} Performance results for the branch. | ||
*/ | ||
async function getPerformanceResultsForBranch( | ||
performanceTestDirectory, | ||
environmentDirectory, | ||
testSuite, | ||
branch | ||
) { | ||
// Restore clean working directory (e.g. if `package-lock.json` has local | ||
|
@@ -147,30 +151,36 @@ async function getPerformanceResultsForBranch( | |
const results = []; | ||
for ( let i = 0; i < 3; i++ ) { | ||
await runShellScript( | ||
'npm run test-performance', | ||
`npm run test-performance -- packages/e2e-tests/specs/performance/${ testSuite }.test.js`, | ||
performanceTestDirectory | ||
); | ||
const rawResults = await readJSONFile( | ||
path.join( | ||
performanceTestDirectory, | ||
'packages/e2e-tests/specs/performance/results.json' | ||
`packages/e2e-tests/specs/performance/${ testSuite }.test.results.json` | ||
) | ||
); | ||
results.push( curateResults( rawResults ) ); | ||
} | ||
|
||
return { | ||
load: formatTime( median( results.map( ( r ) => r.load ) ) ), | ||
domcontentloaded: formatTime( | ||
median( results.map( ( r ) => r.domcontentloaded ) ) | ||
), | ||
type: formatTime( median( results.map( ( r ) => r.type ) ) ), | ||
minType: formatTime( median( results.map( ( r ) => r.minType ) ) ), | ||
maxType: formatTime( median( results.map( ( r ) => r.maxType ) ) ), | ||
focus: formatTime( median( results.map( ( r ) => r.focus ) ) ), | ||
minFocus: formatTime( median( results.map( ( r ) => r.minFocus ) ) ), | ||
maxFocus: formatTime( median( results.map( ( r ) => r.maxFocus ) ) ), | ||
}; | ||
const medians = mapValues( | ||
{ | ||
load: results.map( ( r ) => r.load ), | ||
domcontentloaded: results.map( ( r ) => r.domcontentloaded ), | ||
type: results.map( ( r ) => r.type ), | ||
minType: results.map( ( r ) => r.minType ), | ||
maxType: results.map( ( r ) => r.maxType ), | ||
focus: results.map( ( r ) => r.focus ), | ||
minFocus: results.map( ( r ) => r.minFocus ), | ||
maxFocus: results.map( ( r ) => r.maxFocus ), | ||
}, | ||
median | ||
); | ||
|
||
// Remove results for which we don't have data (and where the statistical functions thus returned NaN or Infinity etc). | ||
const finiteMedians = pickBy( medians, isFinite ); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does this has an impact on the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I thought I'm doing that here? https://github.com/WordPress/gutenberg/pull/23842/files#diff-f7011184c69b1a3bacc4458a3cc16446R51-R58 😅 🤔 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. missed this somehow 😬 😄 |
||
// Format results as times. | ||
return mapValues( finiteMedians, formatTime ); | ||
} | ||
|
||
/** | ||
|
@@ -198,6 +208,19 @@ async function runPerformanceTests( branches, options ) { | |
|
||
log( '>> Cloning the repository' ); | ||
const performanceTestDirectory = await git.clone( config.gitRepositoryURL ); | ||
|
||
if ( !! options.testsBranch ) { | ||
log( | ||
'>> Fetching the ' + | ||
formats.success( options.testsBranch ) + | ||
' branch' | ||
); | ||
await git.checkoutRemoteBranch( | ||
performanceTestDirectory, | ||
options.testsBranch | ||
); | ||
} | ||
|
||
const environmentDirectory = getRandomTemporaryPath(); | ||
log( | ||
'>> Perf Tests Directory : ' + | ||
|
@@ -220,21 +243,32 @@ async function runPerformanceTests( branches, options ) { | |
log( '>> Starting the WordPress environment' ); | ||
await runShellScript( 'npm run wp-env start', environmentDirectory ); | ||
|
||
/** @type {Record<string, WPFormattedPerformanceResults>} */ | ||
const testSuites = [ 'post-editor', 'site-editor' ]; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should this be an argument instead of running all tests with default to post-editor? or do you think we should be running all tests everytime? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is in the function body of I'm leaning towards always running both perf suites. Their main use case is CI (the GH Action), as this is the most reliable guard against perf regressions, so it should IMO be easy to monitor that. Furthermore, the Site Editor test only covers loading time, which is currently pretty short. We can revisit if we add more coverage and testing times grow too much. |
||
|
||
/** @type {Record<string,Record<string, WPFormattedPerformanceResults>>} */ | ||
const results = {}; | ||
for ( const branch of branches ) { | ||
results[ branch ] = await getPerformanceResultsForBranch( | ||
performanceTestDirectory, | ||
environmentDirectory, | ||
branch | ||
); | ||
for ( const testSuite of testSuites ) { | ||
results[ testSuite ] = {}; | ||
for ( const branch of branches ) { | ||
results[ testSuite ][ | ||
branch | ||
] = await getPerformanceResultsForBranch( | ||
performanceTestDirectory, | ||
environmentDirectory, | ||
testSuite, | ||
branch | ||
); | ||
} | ||
} | ||
|
||
log( '>> Stopping the WordPress environment' ); | ||
await runShellScript( 'npm run wp-env stop', environmentDirectory ); | ||
|
||
log( '\n>> 🎉 Results.\n' ); | ||
console.table( results ); | ||
for ( const testSuite of testSuites ) { | ||
log( `\n>> ${ testSuite }\n` ); | ||
console.table( results[ testSuite ] ); | ||
} | ||
} | ||
|
||
module.exports = { | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,80 @@ | ||
/** | ||
* External dependencies | ||
*/ | ||
import { basename, join } from 'path'; | ||
import { writeFileSync } from 'fs'; | ||
|
||
/** | ||
* Internal dependencies | ||
*/ | ||
import { useExperimentalFeatures } from '../../experimental-features'; | ||
|
||
/** | ||
* WordPress dependencies | ||
*/ | ||
import { trashAllPosts, visitAdminPage } from '@wordpress/e2e-test-utils'; | ||
import { addQueryArgs } from '@wordpress/url'; | ||
|
||
jest.setTimeout( 1000000 ); | ||
|
||
describe( 'Site Editor Performance', () => { | ||
useExperimentalFeatures( [ | ||
'#gutenberg-full-site-editing', | ||
'#gutenberg-full-site-editing-demo', | ||
] ); | ||
|
||
beforeAll( async () => { | ||
await trashAllPosts( 'wp_template' ); | ||
await trashAllPosts( 'wp_template_part' ); | ||
} ); | ||
afterAll( async () => { | ||
await trashAllPosts( 'wp_template' ); | ||
await trashAllPosts( 'wp_template_part' ); | ||
} ); | ||
|
||
it( 'Loading', async () => { | ||
const results = { | ||
load: [], | ||
domcontentloaded: [], | ||
type: [], | ||
focus: [], | ||
}; | ||
|
||
await visitAdminPage( | ||
'admin.php', | ||
addQueryArgs( '', { | ||
page: 'gutenberg-edit-site', | ||
} ).slice( 1 ) | ||
); | ||
|
||
let i = 3; | ||
|
||
// Measuring loading time | ||
while ( i-- ) { | ||
await page.reload( { waitUntil: [ 'domcontentloaded', 'load' ] } ); | ||
const timings = JSON.parse( | ||
await page.evaluate( () => | ||
JSON.stringify( window.performance.timing ) | ||
) | ||
); | ||
const { | ||
navigationStart, | ||
domContentLoadedEventEnd, | ||
loadEventEnd, | ||
} = timings; | ||
results.load.push( loadEventEnd - navigationStart ); | ||
results.domcontentloaded.push( | ||
domContentLoadedEventEnd - navigationStart | ||
); | ||
} | ||
|
||
const resultsFilename = basename( __filename, '.js' ) + '.results.json'; | ||
|
||
writeFileSync( | ||
join( __dirname, resultsFilename ), | ||
JSON.stringify( results, null, 2 ) | ||
); | ||
|
||
expect( true ).toBe( true ); | ||
} ); | ||
} ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this seems to be the change that's causing issues in the Github action. We're changing the API of
npm run test-performance
. Can't we just leave it as is? basically run all performance tests when you donpm run test-performance
? This is documented already in several places and I'm not sure we should be breaking this assumption?Alternatively, we could have:
This will ensure we don't break the API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's only one side of the medal though, since
master
only produces oneresults.json
file for the post editor only, whereas this branch renames that file topost-editor.results.json
, and writes the site editor results tosite-editor.results.json
.In fact, an earlier version of this PR had
npm run test-performance
(with no added args), and that caused the GH action to fail when looking forpost-editor.results.json
, sincemaster
had only producedresults.json
.I'm not sure we can retain API stability this way, and add test coverage for a different test suite. We'd be extremely limited in our options, and things might end up rather hacky (something like adding site-editor specific fields to
results.json
, while retaining the post-editor ones without even renaming them -- sounds like it could become quite messy and hard to understand/maintain/extend quickly).Overall, I'm not convinced that that's the right approach; the requirement of this kind of API stability basically means we can never significantly change perf tests, and that's just a really tough constraint, since it's really hard to anticipate whatever needs might arise in the future.
Wouldn't it make sense that a branch that changes perf tests uses those tests to run on both that branch, and master?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mmm you're right, I can see that. I guess an alternative here could be to make
./bin/plugin/cli.js
take an optional "branch" where if specified, we run the perf tests from that branch instead of "master" and we could use that argument in the github action? Any thoughts on that?Regardless, though, if we change the signature of
npm run test-performance
we should make sure to update the docs, having a--suite
option or something like that is a bit nicer than having to pass the file.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, still a few more questions 😅
This is actually an interface where I am wondering if we need to add more options: Do we need the extra argument? Doesn't it make sense to always use the branch's tests? For most branches, those will be identical to
master
's anyway, so it doesn't really make a difference there. But if a branch does modify them, we surely wanna use them, no?I don't mind adding that. Personally, I actually like the 'pass-through' approach (i.e. the
npm ... -- ...
syntax), mostly for its discoverability -- I didn't really look at any code but just tried it, and it did the right thing. I find that quite intuitive (and it's always the first thing I try when needing to pass a filename to an npm script). I prefer this uniform syntax over the 'nice' one, but I'm happy to go along with the 'nice' one.Finally: To make the actual change, we'll have to insert the branch's test files into the wp-env container. As I said, this might be kinda painful (or even near-impossible currently) to test on my Linux box, where there's always some permissions issue when attempting to add or change files in those Docker images (and I haven't been able to fully track down that issue for a while -- wp-env works for most purposes for me, just not for stuff that requires write access, e.g. uploading media files). My alternative suggestion would be to merge without those changes, verify that the new perf test behavior works fine (and revert if it doesn't), and tackle that perf test file injection in a separate PR. Would you be okay with that? 😬
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to run the same tests across branches
./bin/plugin/cli.js branchA branchB
runs the master tests in the environment built usingbranchA
andbranchB
and compare both.I was suggesting something like
./bin/plugin/cli.js branchA branchB --testsBranch branchC
to run the branchC tests in the environment built usingbranchA
andbranchB
and compare both.In the Github action BranchC will be equal to branchA though but that's fine.
I don't feel strongly here, the important thing is that the docs stay correct whichever choice you make.
I think it's just a matter of calling
git checkout branchC
at the beginning of theperf
command (when setting up the perf environment directory)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rare situations where we had to make something work in different branches differently came down so far, to selector changes to open the inserter but that doesn't have an impact on any metric.
The metric is simple: - type a character, click on a block, load the page... These are the things that need to be computed similarly across branches and these things don't differ depending on the branch the test is used for. So for me personally, I think the testsBranch argument is doing its work properly so far.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will take your word for it; I don't understand what that work properly is, other than there was a PR where filenames changed and something crashed because of the lack of a tests branch. Thanks for helping clarity this! It's not a big problem that I'm seeing; was mostly just trying to remove things I don't think we need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@youknowriad I'm coming back to this and trying to better understand what you wrote here; I'm wondering if you are saying that you suspect that test changes will work in a PR but then fail when run on trunk.
One thing I discovered while working on our test suites is that we don't run tests against the PR's branch, but the "PR's merge branch." The merge branch is a branch which tracks the PR but every time we make a commit it creates a hypothetical merge of that branch tip into the target branch (usually
trunk
).What this means is that if in a PR we make changes to a test suite, then when those changes run in CI and we're relying on
$GITHUB_SHA
(as we do), then the tests already incorporate the issues of merging back intotrunk
. If the new test files would break intrunk
they would also break on the$GITHUB_SHA
.I know that at that point in that PR we're comparing two different test suites, but by providing
--tests-branch
we're changing the equation from "the results in the PR comparingtrunk
to the changes are invalid" into "the results intrunk
comparing one commit to the previous commit are invalid."Since we only report performance changes in mainline
trunk
I'm still struggling to see how--tests-branch
helps us achieve the goal we say it does.cc: @ockham
This is another thing I think I've been uncovering as a hidden assumption, that our results are consistent within test runs. Even with running each branch 3x I'm seeing routine variation between the branches on the perf tests, frequently hitting more than 2x difference when no difference exists. I'm going to start trying to gather some statistics on the values from individual PR runs so we can discuss the facts of the measurements we're making, but I know already for a fact that the way we talk about these measurements in the PRs doesn't match the reality I've seen.
For example, I make a change to
README.md
and in the performance tests (after 3x runs and taking the median) it shows that my branch loads the editor in half the time as it does intrunk
and yettrunk
is 2x faster when typing. The test code is the same, the code under test is the same, but our test runtime variance is so much wider than the time it takes to run the code we're testing that our results are probably mostly random variation.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is happening for any of the metrics we track in https://codehealth.vercel.app then it's a problem yes. But in my experience I only saw it in site editor metrics. These are more recent and we didn't work on their stability yet so I don't think we can trust them just yet. It's just a vague indication for now. But for the post editor metrics tracked on the site, I've not seen any major stability issues like that. I do see small variances (you can look at the graphs) but any important change has correlated well with an impactful commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for some context, @youknowriad, I'm trying to start gathering data on this in #45747 and in at least the first batch of 30 test runs I found that 22 of them showed results at least as different on the
type
event (in the post editor) as the reported change from 14.4 to 14.5 was - 10 of which were more than that many ms faster, and 12 of which were more than that many ms slower.How are you measuring variance and correlation? I have to be missing something because it looks like the data is explained primarily by random variation. That is, if there is an impact, I'd like to know how you were able to pull it out of the noise caused by random variation which seems to have a bigger impact than any reported change.
In the meantime I'll spin up a PR that specifically tests that release and see if I can reproduce the measurements you found.