Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

misc: lantern trace collection script #9662

Merged
merged 109 commits into from
Dec 19, 2019
Merged
Show file tree
Hide file tree
Changes from 108 commits
Commits
Show all changes
109 commits
Select commit Hold shift + click to select a range
a261919
init
connorjclark Sep 11, 2019
92ea971
first pass
connorjclark Sep 12, 2019
f4441aa
tracePath
connorjclark Sep 12, 2019
20c2709
remove task stuff
connorjclark Sep 12, 2019
419151a
progress stuff
connorjclark Sep 12, 2019
1423a95
ProgressLogger
connorjclark Sep 12, 2019
6565644
urls. 9
connorjclark Sep 12, 2019
80b0e58
move to folder, readme, urls.json
connorjclark Sep 12, 2019
a4c7b58
comments
connorjclark Sep 12, 2019
8badb3f
eslint
connorjclark Sep 12, 2019
4104f8b
archive
connorjclark Sep 12, 2019
c25babd
readme
connorjclark Sep 12, 2019
475c6eb
comments
connorjclark Sep 12, 2019
f937608
pr
connorjclark Sep 12, 2019
2738755
log skipped
connorjclark Sep 12, 2019
22bb27a
wpt/unthrottled
connorjclark Sep 13, 2019
ae5481e
use while
connorjclark Sep 13, 2019
1b8d529
Update lighthouse-core/scripts/lantern/collect/collect.js
connorjclark Sep 13, 2019
32e9c41
comment
connorjclark Sep 13, 2019
0e23c7b
remove path join
connorjclark Sep 13, 2019
7012e86
pr
connorjclark Sep 14, 2019
d62c64b
exec async
connorjclark Sep 14, 2019
c4a5bc1
pr
connorjclark Sep 14, 2019
a9f48c9
pr
connorjclark Sep 14, 2019
1f02506
archive
connorjclark Sep 14, 2019
5d82760
lhr
connorjclark Sep 14, 2019
ef1566e
devtools log
connorjclark Sep 14, 2019
71c100f
max buffer. filestub is insecure
connorjclark Sep 16, 2019
68b06d7
readme
connorjclark Sep 16, 2019
f9fe259
sorta organize
connorjclark Sep 16, 2019
1ee49a2
update list
connorjclark Sep 16, 2019
d134c52
urls
connorjclark Sep 17, 2019
a9d2b7e
10s
connorjclark Sep 17, 2019
23e4306
403
connorjclark Sep 17, 2019
0992893
urls
connorjclark Sep 18, 2019
f221897
delete bad urls
connorjclark Sep 20, 2019
dc0331b
birds arent real
connorjclark Sep 20, 2019
9e47e8e
sort
connorjclark Sep 20, 2019
b4982d2
cool sites
connorjclark Sep 20, 2019
4b49fe8
sites
connorjclark Sep 20, 2019
53a91cc
banjo
connorjclark Sep 20, 2019
3207ac1
chrome beta
connorjclark Sep 20, 2019
89dae69
remove stale comment
connorjclark Sep 23, 2019
c52b519
better waiting
connorjclark Sep 25, 2019
b50784f
download link
connorjclark Sep 26, 2019
68503ba
summary
connorjclark Sep 26, 2019
de61ec2
remove bad urls
connorjclark Sep 27, 2019
0acf633
golden
connorjclark Sep 27, 2019
e07e3df
common.js
connorjclark Sep 27, 2019
60a2576
golden
connorjclark Sep 27, 2019
a7c2d36
logger
connorjclark Sep 27, 2019
415199a
fix emptry trace event
connorjclark Sep 27, 2019
2e9a4c7
fix collect
connorjclark Sep 27, 2019
dce6140
tsc
connorjclark Sep 27, 2019
1f4f51a
add metrics dump to golden.json
connorjclark Sep 30, 2019
a664425
update zips
connorjclark Sep 30, 2019
9c84f91
changes
connorjclark Sep 30, 2019
2a6bcd3
use public drive links
connorjclark Oct 1, 2019
5a8fea2
fix wpt query
connorjclark Oct 1, 2019
a8b3bf7
rm custom connectivity params
connorjclark Oct 1, 2019
71a6a87
update location
connorjclark Oct 2, 2019
09319e2
fix
connorjclark Oct 4, 2019
ecb3c06
Merge remote-tracking branch 'origin/master' into lantern-collect
connorjclark Oct 4, 2019
ebe0cf9
rm url that always timesout wpt
connorjclark Oct 11, 2019
e5b2915
use golden format that lantern expects
connorjclark Oct 15, 2019
af618eb
fix metrics
connorjclark Oct 15, 2019
c604758
Merge remote-tracking branch 'origin/master' into lantern-collect
connorjclark Oct 15, 2019
336b1d2
links
connorjclark Oct 16, 2019
3cae321
moto4
connorjclark Oct 17, 2019
20206d9
Merge remote-tracking branch 'origin/master' into lantern-collect
connorjclark Oct 22, 2019
aab7f40
oopif flag
connorjclark Oct 22, 2019
06e14fc
mkdirp
connorjclark Oct 23, 2019
75bdecd
use stable
connorjclark Oct 24, 2019
3fe1f2b
Merge remote-tracking branch 'origin/master' into lantern-collect
connorjclark Oct 26, 2019
ae65e32
use beta
connorjclark Oct 28, 2019
7182a26
disable throttling
connorjclark Oct 29, 2019
36f3432
fix 302s
connorjclark Oct 30, 2019
01ddf2f
retry if no metrics. use fmp instead of tti for median
connorjclark Oct 30, 2019
97bea1a
tweak percentiles
connorjclark Oct 30, 2019
161ef23
wait for wpt before doing local
connorjclark Oct 30, 2019
12b0d43
mkdirp
connorjclark Oct 30, 2019
0337100
don't hit the api so hard
connorjclark Oct 30, 2019
30d7ef9
make all wpt requests at same time
connorjclark Oct 30, 2019
270676c
Merge remote-tracking branch 'origin/master' into lantern-collect
connorjclark Oct 30, 2019
90725b5
dont wait to poll more than 10 min
connorjclark Oct 31, 2019
4fa8939
Merge branch 'lantern-collect' of github.com:GoogleChrome/lighthouse …
connorjclark Oct 31, 2019
e8b04d0
remove url that never finishes
connorjclark Nov 4, 2019
1063c4d
no oopifs by default
connorjclark Nov 4, 2019
9b3675d
Merge branch 'master' into lantern-collect
patrickhulce Nov 18, 2019
43efd9f
Revert "make all wpt requests at same time"
patrickhulce Nov 18, 2019
4c7a63f
patrick's tweaks to lantern collection script (#9980)
patrickhulce Nov 18, 2019
e7afdec
check metrics, use Chrome stable, record screenshots
connorjclark Nov 18, 2019
46a3804
nit
connorjclark Nov 18, 2019
3809a24
tweaks
connorjclark Nov 20, 2019
472b977
tmp
connorjclark Nov 20, 2019
4345fe5
persist warnings in summary
connorjclark Nov 20, 2019
b16f6b2
docs on gcp
connorjclark Nov 20, 2019
7000578
Merge remote-tracking branch 'origin/master' into lantern-collect
connorjclark Dec 17, 2019
75ef38c
pipefail
connorjclark Dec 17, 2019
210b614
repeat until works
connorjclark Dec 17, 2019
0ca1fee
add --project
connorjclark Dec 18, 2019
da264dd
gsutil
connorjclark Dec 18, 2019
8f6b889
travis just lantern
connorjclark Dec 18, 2019
b7e6661
update
connorjclark Dec 18, 2019
c94beda
works now
connorjclark Dec 18, 2019
39b5f6c
fix logic
connorjclark Dec 18, 2019
17b6478
fix bash
connorjclark Dec 18, 2019
7c1518e
curl
connorjclark Dec 18, 2019
04e8b5a
delete prompt
connorjclark Dec 19, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
57 changes: 57 additions & 0 deletions lighthouse-core/scripts/lantern/collect/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
# Lantern Collect Traces

Collects many traces using a local machine and mobile devices via WPT.

There are 9 runs for each URL in the big zip. The golden zip contains just the median runs (by performance score), along with a dump of the `metrics` collected by Lighthouse.

[Download all](https://drive.google.com/open?id=17WsQ3CU0R1072sezXw5Np2knV_NvGAfO) traces (3.2GB zipped, 19GB unzipped).
[Download golden](https://drive.google.com/open?id=1aQp-oqX7jeFq9RFwNik6gkEZ0FLtjlHp) traces (363MB zipped, 2.1GB unzipped).

Note: Only 45/80 of the URLs in `./urls.js` have been processed.

## Get a WPT key

This is how you get a regular key:

http://www.webpagetest.org/getkey.php -> "Login with Google" -> fill form. Key will be emailed to you.

But you'll really need a privileged key to run the collection in a reasonable amount of time.

Note: to actually run this, you want a better key than the default. Ask @connorjclark for it.

## Lighthouse Version

Check what version of Lighthouse WPT is using. You should use the same version of lighthouse for the desktop collection.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this important? WPT uses LH 5.2.0 rn.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't be a big deal as long as we're not turning on different trace categories or something between the versions. observed metric definitions haven't changed in forever.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually I take it back, it'll be a big deal because we want LCP and layout stability to be computed for these that's the whole point of getting new ones. I guess we'll have to derive them after the fact from the trace if we want to avoid delaying this an entire release :/

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so: i'll collect the LHR, devtoolslog, and trace for local unthrottled desktop runs. And the LHR and trace for WPT runs. put them all in a giant zip and upload somewhere.

then make the new golden lantern set by taking the median runs for the local runs. and just run trace-processor on the WPT traces to get the metrics.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right-o, but since we don't have LCP metric yet, I would triple check that the traces you're getting have the LCP and layout stability events in them at least and we don't need to enable some other category to get them.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup they do


## Verify URLs

```sh
node -e "console.log(require('./urls.js').join('\n'))" |\
xargs -P 10 -I{} curl -A 'Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3694.0 Mobile Safari/537.36 Chrome-Lighthouse' -o /dev/null -s --write-out '%{http_code} {} (if redirect: %{redirect_url})\n' {} |\
sort
```

Note: some good URLs will 4xx b/c the site blocks such usages of `curl`.

## Run

```sh
DEBUG=1 WPT_KEY=... NUM_SAMPLES=3 node --max-old-space-size=4096 collect.js
```

Output will be in `dist/collect-lantern-traces`, and zipped at `dist/collect-lantern-traces.zip`.

```sh
node golden.js
```

Output will be in `dist/golden-lantern-traces`, and zipped at `dist/golden-lantern-traces.zip`.

Update the zips on Google Drive and `download-traces.sh`.


## Run in GCP

```sh
WPT_KEY=... /usr/local/google/home/cjamcl/code/lighthouse/lighthouse-core/scripts/lantern/collect/gcp-create-and-run.sh
```
331 changes: 331 additions & 0 deletions lighthouse-core/scripts/lantern/collect/collect.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,331 @@
/**
* @license Copyright 2019 Google Inc. All Rights Reserved.
* Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0
* Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
*/
'use strict';

/** @typedef {import('./common.js').Result} Result */
/** @typedef {import('./common.js').Summary} Summary */

const fs = require('fs');
const fetch = require('isomorphic-fetch');
const {execFile} = require('child_process');
const {promisify} = require('util');
const execFileAsync = promisify(execFile);
const common = require('./common.js');

const LH_ROOT = `${__dirname}/../../../..`;
const SAMPLES = process.env.SAMPLES ? Number(process.env.SAMPLES) : 9;
const TEST_URLS = process.env.TEST_URLS ? process.env.TEST_URLS.split(' ') : require('./urls.js');

if (!process.env.WPT_KEY) throw new Error('missing WPT_KEY');
const WPT_KEY = process.env.WPT_KEY;
const DEBUG = process.env.DEBUG;

/** @type {typeof common.ProgressLogger['prototype']} */
let log;

/** @type {Summary} */
let summary;

/**
* @param {string} message
*/
function warn(message) {
summary.warnings.push(message);
log.log(message);
}

/**
* @param {string} filename
* @param {string} data
*/
function saveData(filename, data) {
fs.mkdirSync(common.collectFolder, {recursive: true});
fs.writeFileSync(`${common.collectFolder}/${filename}`, data);
return filename;
}

/**
* @param {string} url
* @return {Promise<string>}
*/
async function fetchString(url) {
const response = await fetch(url);
if (response.ok) return response.text();
throw new Error(`error fetching ${url}: ${response.status} ${response.statusText}`);
}

/**
* @param {string} url
*/
async function startWptTest(url) {
const apiUrl = new URL('https://www.webpagetest.org/runtest.php');
apiUrl.search = new URLSearchParams({
k: WPT_KEY,
f: 'json',
url,
location: 'Dulles_MotoG4:Motorola G (gen 4) - Chrome.3G',
runs: '1',
lighthouse: '1',
// Make the trace file available over /getgzip.php.
lighthouseTrace: '1',
lighthouseScreenshots: '1',
// Disable some things that WPT does, such as a "repeat view" analysis.
type: 'lighthouse',
}).toString();
const wptResponseJson = await fetchString(apiUrl.href);
const wptResponse = JSON.parse(wptResponseJson);
if (wptResponse.statusCode !== 200) {
throw new Error(`unexpected status code ${wptResponse.statusCode} ${wptResponse.statusText}`);
}

return {
testId: wptResponse.data.testId,
jsonUrl: wptResponse.data.jsonUrl,
};
}

/**
* @param {string} url
* @return {Promise<Result>}
*/
async function runUnthrottledLocally(url) {
const artifactsFolder = `${LH_ROOT}/.tmp/collect-traces-artifacts`;
const {stdout} = await execFileAsync('node', [
`${LH_ROOT}/lighthouse-cli`,
url,
'--throttling-method=provided',
'--output=json',
`-AG=${artifactsFolder}`,
process.env.OOPIFS === '1' ? '' : '--chrome-flags=--disable-features=site-per-process',
], {
// Default (1024 * 1024) is too small.
maxBuffer: 10 * 1024 * 1024,
});
const lhr = JSON.parse(stdout);
assertLhr(lhr);
const devtoolsLog = fs.readFileSync(`${artifactsFolder}/defaultPass.devtoolslog.json`, 'utf-8');
const trace = fs.readFileSync(`${artifactsFolder}/defaultPass.trace.json`, 'utf-8');
patrickhulce marked this conversation as resolved.
Show resolved Hide resolved
return {
devtoolsLog,
lhr: JSON.stringify(lhr),
trace,
};
}

/**
* @param {string} url
* @return {Promise<Result>}
*/
async function runForWpt(url) {
const {testId, jsonUrl} = await startWptTest(url);
if (DEBUG) log.log({testId, jsonUrl});

// Poll for the results every x seconds, where x = position in queue.
connorjclark marked this conversation as resolved.
Show resolved Hide resolved
let lhr;
// eslint-disable-next-line no-constant-condition
while (true) {
connorjclark marked this conversation as resolved.
Show resolved Hide resolved
const responseJson = await fetchString(jsonUrl);
const response = JSON.parse(responseJson);

if (response.statusCode === 200) {
lhr = response.data.lighthouse;
assertLhr(lhr);
break;
}

if (response.statusCode >= 100 && response.statusCode < 200) {
// If behindCount doesn't exist, the test is currently running.
// * Wait 30 seconds if the test is currently running.
// * Wait an additional 10 seconds for every test ahead of this one.
// * Don't wait for more than 10 minutes.
const secondsToWait = Math.min(30 + 10 * (response.data.behindCount || 0), 10 * 1000);
if (DEBUG) log.log('poll wpt in', secondsToWait);
await new Promise((resolve) => setTimeout(resolve, secondsToWait * 1000));
} else {
throw new Error(`unexpected response: ${response.statusCode} ${response.statusText}`);
}
}

const traceUrl = new URL('https://www.webpagetest.org/getgzip.php');
traceUrl.searchParams.set('test', testId);
traceUrl.searchParams.set('file', 'lighthouse_trace.json');
const traceJson = await fetchString(traceUrl.href);

/** @type {LH.Trace} */
const trace = JSON.parse(traceJson);
// For some reason, the first trace event is an empty object.
trace.traceEvents = trace.traceEvents.filter(e => Object.keys(e).length > 0);

return {
lhr: JSON.stringify(lhr),
trace: JSON.stringify(trace),
};
}

/**
* Repeats the ascyn function a maximum of maxAttempts times until it passes.
* The empty object ({}) is returned when maxAttempts is reached.
* @param {() => Promise<Result>} asyncFn
* @param {number} [maxAttempts]
* @return {Promise<Result|null>}
*/
async function repeatUntilPassOrNull(asyncFn, maxAttempts = 3) {
for (let i = 0; i < maxAttempts; i++) {
try {
return await asyncFn();
} catch (err) {
warn('Error: ' + err.toString());
}
}

return null;
}

/**
* @param {LH.Result=} lhr
*/
function assertLhr(lhr) {
if (!lhr) throw new Error('missing lhr');
if (lhr.runtimeError) throw new Error(`runtime error: ${lhr.runtimeError}`);
const metrics = common.getMetrics(lhr);
if (metrics &&
metrics.estimatedInputLatency &&
metrics.firstContentfulPaint &&
metrics.firstCPUIdle &&
metrics.firstMeaningfulPaint &&
metrics.interactive &&
// WPT won't have this, we'll just get from the trace.
// metrics.largestContentfulPaint &&
metrics.maxPotentialFID &&
metrics.speedIndex
) return;
throw new Error('run failed to get metrics');
}

async function main() {
log = new common.ProgressLogger();

// Resume state from previous invocation of script.
summary = common.loadSummary();

// Remove data if no longer in TEST_URLS.
summary.results = summary.results
.filter(urlSet => TEST_URLS.includes(urlSet.url));

fs.mkdirSync(common.collectFolder, {recursive: true});

// Traces are collected for one URL at a time, in series, so all traces are from a small time
// frame, reducing the chance of a site change affecting results.
for (const url of TEST_URLS) {
// This URL has been done on a previous script invocation. Skip it.
if (summary.results.find((urlResultSet) => urlResultSet.url === url)) {
log.log(`already collected traces for ${url}`);
continue;
}
log.log(`collecting traces for ${url}`);

const sanitizedUrl = url.replace(/[^a-z0-9]/gi, '-');
/** @type {Result[]} */
const wptResults = [];
/** @type {Result[]} */
const unthrottledResults = [];

let wptResultsDone = 0;
let unthrottledResultsDone = 0;

// The closure this makes is too convenient to decompose.
// eslint-disable-next-line no-inner-declarations
function updateProgress() {
const index = TEST_URLS.indexOf(url);
const wptDone = wptResultsDone === SAMPLES;
const unthrottledDone = unthrottledResultsDone === SAMPLES;
log.progress([
`${url} (${index + 1} / ${TEST_URLS.length})`,
'wpt',
'(' + (wptDone ? 'DONE' : `${wptResultsDone + 1} / ${SAMPLES}`) + ')',
'unthrottledResults',
'(' + (unthrottledDone ? 'DONE' : `${unthrottledResultsDone + 1} / ${SAMPLES}`) + ')',
].join(' '));
}

updateProgress();

// Can run in parallel.
const wptResultsPromises = [];
for (let i = 0; i < SAMPLES; i++) {
const resultPromise = repeatUntilPassOrNull(() => runForWpt(url));
// Push to results array as they finish, so the progress indicator can track progress.
resultPromise.then((result) => result && wptResults.push(result)).finally(() => {
wptResultsDone += 1;
updateProgress();
});
wptResultsPromises.push(resultPromise);
}

// Wait for the first WPT result to finish because we can sit in the queue for a while before we start
// and we want to avoid seeing totally different content locally.
await Promise.race(wptResultsPromises);

// Must run in series.
for (let i = 0; i < SAMPLES; i++) {
const result = await repeatUntilPassOrNull(() => runUnthrottledLocally(url));
if (result) {
unthrottledResults.push(result);
}
unthrottledResultsDone += 1;
updateProgress();
}

// Wait for *all* WPT runs to finish since we just waited on the first one earlier.
await Promise.all(wptResultsPromises);

const urlResultSet = {
url,
wpt: wptResults
.map((result, i) => {
const prefix = `${sanitizedUrl}-mobile-wpt-${i + 1}`;
return {
lhr: saveData(`${prefix}-lhr.json`, result.lhr),
trace: saveData(`${prefix}-trace.json`, result.trace),
};
}),
unthrottled: unthrottledResults
.filter(result => result.lhr && result.trace && result.devtoolsLog)
.map((result, i) => {
// Unthrottled runs will have devtools logs, so this should never happen.
if (!result.devtoolsLog) throw new Error('expected devtools log');

const prefix = `${sanitizedUrl}-mobile-unthrottled-${i + 1}`;
return {
devtoolsLog: saveData(`${prefix}-devtoolsLog.json`, result.devtoolsLog),
lhr: saveData(`${prefix}-lhr.json`, result.lhr),
trace: saveData(`${prefix}-trace.json`, result.trace),
};
}),
};

// Too many attempts (with 3 retries) failed, so don't both saving results for this URL.
if (urlResultSet.wpt.length < SAMPLES / 2 || urlResultSet.unthrottled.length < SAMPLES / 2) {
warn(`too many results for ${url} failed, skipping.`);
continue;
}

// We just collected NUM_SAMPLES * 2 traces, so let's save our progress.
log.log(`collected results for ${url}, saving progress.`);
summary.results.push(urlResultSet);
common.saveSummary(summary);
}

log.progress('archiving ...');
await common.archive(common.collectFolder);
log.closeProgress();
}

main().catch(err => {
if (log) log.closeProgress();
process.stderr.write(`Fatal error in collect:\n\n ${err.stack}`);
process.exit(1);
});
Loading