-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor performance tests artifacts handling #48684
Conversation
Size Change: 0 B Total Size: 1.34 MB ℹ️ View Unchanged
|
Flaky tests detected in 7c86fa4. 🔍 Workflow run URL: https://github.com/WordPress/gutenberg/actions/runs/4415097223
|
0c299b1
to
d27f085
Compare
89698f2
to
0c1446a
Compare
b4eb3c0
to
1d8219e
Compare
bin/log-performance-results.js
Outdated
metricsPrefix: 'block-theme-', | ||
}, | ||
{ | ||
file: 'front-end-classic-theme-performance-results.json', | ||
file: 'front-end-classic-theme.results.json', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a need to remove this? it was my assumption that we might end up with more test results in the future and this could help make it obvious which result belongs to which test.
it's fine to change it, and we can handle it later of course, but it has been helpful to me at least seeing all the information in the filename
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea here was that if CI zips it into performance-results
folder, then it doesn't need a .performance-results.js
suffix. On second thought, though, there's no zipping when running locally, and everything lands in the artifacts
folder, so it would be better to keep that suffix AND add it to the raw result json
s as well. Will make an update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bin/plugin/commands/performance.js
Outdated
`packages/e2e-tests/specs/performance/${ testSuite }.test.results.json` | ||
) | ||
await runShellScript( | ||
`npm run test:performance -- ${ testSuite } --wordpress-artifacts-path=${ artifactsPath } --results-filename=${ resultsFilename }`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we're relying on the ENV
is it necessary to pass this here or could the performance test read the same ENV
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I thought it would, but it looks like only the top-level process (node cli.js perf
) receives the ENV
from CI. The await runShellScript( "npm run ...
spawns a separate process that doesn't receive the same ENV
so I needed to figure out a way to pass it down. Maybe there's a better way to share the env, but at my current level of understanding, this made the most sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can manually pass through any ENV
we want in runShellScript
gutenberg/bin/plugin/lib/utils.js
Lines 17 to 26 in 6434e0b
/** | |
* Utility to run a child script | |
* | |
* @typedef {NodeJS.ProcessEnv} Env | |
* | |
* @param {string} script Script to run. | |
* @param {string=} cwd Working directory. | |
* @param {Env=} env Additional environment variables to pass to the script. | |
*/ | |
function runShellScript( script, cwd, env = {} ) { |
@youknowriad would it make sense to pass env: { ...process.env, …, ...env }
as a default? instead of env: { …, ...env }
? would that be dangerous?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can manually pass through any ENV we want in runShellScript
Yeah I've just noticed that! 😄 Made already some changes in that regard in a53065b. I'm not passing the whole parent env though - just the relevant bits. I'm curious though if passing the whole thing would be safe.
bin/plugin/commands/performance.js
Outdated
@@ -225,6 +210,7 @@ async function runTestSuite( testSuite, performanceTestDirectory, runKey ) { | |||
async function runPerformanceTests( branches, options ) { | |||
const runningInCI = !! process.env.CI || !! options.ci; | |||
const TEST_ROUNDS = options.rounds || 1; | |||
const artifactsPath = process.env.WP_ARTIFACTS_PATH || ''; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to double-check, if this is empty, it will print to the current directory? or should we pass './'
as the default? or something like __dirname
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is empty, it will be resolved down the stream (per job) when the puppeteer env is set up (when running npm run test:performance
):
gutenberg/packages/scripts/config/jest-environment-puppeteer/index.js
Lines 45 to 49 in b3ab0ee
const root = process.env.GITHUB_WORKSPACE || process.cwd(); | |
const ARTIFACTS_PATH = path.resolve( | |
root, | |
process.env.WP_ARTIFACTS_PATH || 'artifacts' | |
); |
Note that the GITHUB_WORKSPACE
var is not available here when running in CI (as mentioned here) so it will always resolve to the cwd
. This was causing the artifacts to be saved along the env
folders that we create during the perf tests setup and the necessity to copy them later on to the workspace's ./__test-results
folder, so that they're available for the CI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that the
GITHUB_WORKSPACE
var is not available here when running in CI (...)
'/.wp-env.json' | ||
// Create the config file for the current env. | ||
fs.writeFileSync( | ||
path.join( environmentDirectory, '.wp-env.json' ), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similar question here about removing performance
from the name. when I download these files they lose some inherent linkage to the performance CI job that created them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was the same before - we were copying the .wp-env.performance.json
and renaming it to .wp-env.json
, so in this regard nothing has changed. I assumed the default .wp-env.json
name is used for the wp-env start
script to pick it up without any extra arguments.
bin/plugin/commands/performance.js
Outdated
rawResults[ i ] = {}; | ||
for ( const branch of branches ) { | ||
const runKey = `${ branch }_${ testSuite }_run-${ i }`; | ||
const runKey = `${ testSuite }_${ branch }_run-${ i }`; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there context behind this swap? not the most important thing, but it stands out in this PR where we're working on the path changes.
I guess there's a claim that grouping by test suite is more favorable than grouping by branch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've swapped it because it's the actual order in which we iterate through those tests and how we compare the results (suite A: branch X vs. branch Y
, suite B: branch X vs. branch Y
, etc.). So both the log info and results filenames will now be compiled in this fashion:
runner info (current) | runner info (new) |
---|---|
results upload (current) | results upload (new) |
---|---|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a side note, I think we should also replace trunk
in the reference measurements with commit SHA, because trunk
alone doesn't say much. Also, the summary should point to what is the reference and what is subject of the comparison, and display a 3rd column (delta) to show whether we're slowing down or speeding up, for example:
>> Comparing trunk (13a14ca) with the current branch (32b4c12)
┌──────────────────────┬─────────────────────────┬────────────────────────┬─────────────────┐
│ (index) │ 13a14ca (reference) │ 32b4c12 (subject) │ Δ │
├──────────────────────┼─────────────────────────┼────────────────────────┼─────────────────┤
│ serverResponse │ '235.12 ms' │ '233.46 ms' │ -0.5% │
│ firstPaint │ '251.88 ms' │ '186.88 ms' │ +18.2% │
│ domContentLoaded │ '685.96 ms' │ '651.18 ms' │ +0.5% │
│ loaded │ '686.72 ms' │ '652.04 ms' │ x.xx% │
│ firstContentfulPaint │ '8436.28 ms' │ '8057 ms' │ x.xx% │
│ firstBlock │ '9095.38 ms' │ '8713.32 ms' │ x.xx% │
│ type │ '42.98 ms' │ '43.03 ms' │ x.xx% │
│ minType │ '41.9 ms' │ '41.92 ms' │ x.xx% │
│ maxType │ '45.71 ms' │ '46.52 ms' │ x.xx% │
│ typeContainer │ '14.81 ms' │ '14.99 ms' │ x.xx% │
│ minTypeContainer │ '13.35 ms' │ '13.69 ms' │ x.xx% │
│ maxTypeContainer │ '18.87 ms' │ '17.7 ms' │ x.xx% │
│ focus │ '40.99 ms' │ '44.78 ms' │ x.xx% │
│ minFocus │ '35.64 ms' │ '35.16 ms' │ x.xx% │
│ maxFocus │ '64.35 ms' │ '71.31 ms' │ x.xx% │
│ inserterOpen │ '33.11 ms' │ '32.39 ms' │ x.xx% │
│ minInserterOpen │ '27.96 ms' │ '27.73 ms' │ x.xx% │
│ maxInserterOpen │ '52.56 ms' │ '52.9 ms' │ x.xx% │
│ inserterSearch │ '13.53 ms' │ '12.29 ms' │ x.xx% │
│ minInserterSearch │ '5.76 ms' │ '6.11 ms' │ x.xx% │
│ maxInserterSearch │ '19.31 ms' │ '18.43 ms' │ x.xx% │
│ inserterHover │ '26.31 ms' │ '26.4 ms' │ x.xx% │
│ minInserterHover │ '22.51 ms' │ '22.38 ms' │ x.xx% │
│ maxInserterHover │ '47.75 ms' │ '46.44 ms' │ x.xx% │
│ listViewOpen │ '157.32 ms' │ '159.09 ms' │ x.xx% │
│ minListViewOpen │ '142.89 ms' │ '142.65 ms' │ x.xx% │
│ maxListViewOpen │ '235.52 ms' │ '253.54 ms' │ x.xx% │
└──────────────────────┴─────────────────────────┴────────────────────────┴─────────────────┘
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should also replace trunk in the reference measurements with commit SHA
if we can display both that also has value. I think different needs use them differently. for most purposes trunk
does a better job communicating than 13a14ca
because we recognize that when a CI job runs it's running against the latest trunk
at the time the CI job ran. we can always lookup what 13a14ca
is, but that involves additional steps when trunk
is pretty bare and simple.
what is the reference and what is subject of the comparison
for most runs this matters, but the script supports multiple "branches" beyond two, and in those cases I'm not sure one is the reference. if we include this we will be making a decision that one of the "branches" is the reference, likely the first one. (branches is in quotes because it's really just a git
ref).
and display a 3rd column (delta) to show whether we're slowing down or speeding up, for example:
no strong opinion here on what we should do, but I can share my personal hesitancy with this due to the fact that we're still hitting some wild variation across tests (even with the recent fixes), and percentages always make changes look simpler than they are. I like making the easy-to-misuse things harder to use, whereas simply reporting ms
is just that, a fact reported without any interpretation, because the interpretation is itself extremely difficult to automate.
…r e2e env variables are set. Remove the GITHUB_WORKSPACE path variable as it's not reachable for the child process in CI.
'test/emptytheme' | ||
), | ||
'https://downloads.wordpress.org/theme/twentytwentyone.1.7.zip', | ||
'https://downloads.wordpress.org/theme/twentytwentythree.1.0.zip', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are these things we want to hard-code here in this seemingly independent function? I could easily imagine someone wanting to update the themes and then overlooking these lines…
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only carried over from the .wp-env.performance.json
boilerplate file. Hard-coding those here was a solution to an issue described in #49063. In my mind, it was an intermediate solution because those themes should be installed within the tests that use them so that when running tests in isolation (e.g. via npm run test:performance theme-tests
), same themes are used as in CI or when we're running locally via the CLI. I would consider this a blocker - I plan to address this in a follow-up PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would consider this a blocker - I plan to address this in a follow-up PR.
if you meant "I would not consider this a blocker" then that sounds good. if not, then I'm confused 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good catch! Missed the important 'not' there 😅
Really love where you've taken this. What do you see as continued blockers to get this in, or do you think it's ready to go? |
@dmsnell, I don't see any blockers at this point. In a follow-up PR which I'm already working on, I plan to make the builds reusable for subsequent jobs (at least locally) and parallelize them (builds) to make things faster. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the work. I don't see any reason to block this.
Let's update our setup following WordPress/gutenberg#48684
What?
TL;DR:
./artifacts
folder regardless of the triggering environment:CI via CLI / locally via CLI / locally via npm, etc.
current-branch vs. trunk
comparison, as they provide useful information,.wp-env.performance.json
boilerplate.Save all the performance result files to the main artifacts folder
Instead of storing files around the source code, let's use the already available
WP_ARTIFACTS_PATH
directory.The
process.env.WP_ARTIFACTS_PATH
has been primarily used for storing the failed test artifacts (screenshots, HTML snapshots), but we can also use it for storing any other artifacts, like performance results or temporary trace files. Those files are currently stored alongside the source code and then copied to an intermediate./__test-results/
folder, which is then picked up by the results upload CI action. After this PR, those files will be saved directly to theWP_ARTIFACTS_PATH
folder, so we won't need to:Once created, they're ready to be picked up by the upload action because the CI will define the
WP_ARTIFACTS_PATH
variable.Additionally, the final (calculated) results file that we currently save to the enigmatic path (before uploading them to codehealth), will now be saved to the
WP_ARTIFACTS_PATH
as well and zipped with the other (raw) results.Finally, if the CI job fails, the results upload step will not occur. However, if there are any results that have already been created, they will still be uploaded along with the failure artifacts.
Archive performance results for each comparison step
Currently, we are archiving performance results only for the
Compare performance with trunk
step. Now, the results will be uploaded for every comparison step as they provide useful information:In case of a failure, the results upload action will not be dispatched. If any result files are written during a failed job, though, they will be uploaded within the failure artifacts upload, along with the screenshot and HTML snapshot of the failed test.
Sort results and test logs primarily by the suite name
This aligns with the test running order (test suite -> branch). See the comparison below of the current vs. new results upload content.
I've updated the test log to reflect the above and added the
test rounds
info that was missing:Write the env-specific configs directly
For the performance tests, we're creating "environment" folders for each branch we want to test against. Those folders need to contain env-specific
wp-env
config, which we're currently copying from a boilerplate config that we copy over and amend with the target env config values. There's no point keeping this boilerplate and doing the extra copy step, because we can write this file directly. Also, this way we can provide absolute paths which makes it more readable.Testing Instructions
CI:
Locally:
wp-env
is stopped,bin/plugin/cli.js perf refactor/perf-test-results-path trunk
,./artifacts
folder.npm run test:performance test-editor
To test if the failure artifacts are still uploaded as expected:
expect(false).toBe(true)
,bin/plugin/cli.js perf refactor/perf-test-results-path trunk
again,./artifacts
as well.