-
Notifications
You must be signed in to change notification settings - Fork 9.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tests(lantern): add lantern regression test scripts #5435
Conversation
what do each of these groups represent? i'm not sure between WPT data, 'golden' lantern data, and freshly computed lantern data.. what we're dealing with here. |
we have a few main things
|
perfect. this helps enormously. thanks! some name ideas below:
I'm also fine with switching the order of baseline/fresh/master and wpt/lantern. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Is there going to be a failure condition?
# Testing lantern can be expensive, we'll only run the tests if we touched files that affect the simulations. | ||
CHANGED_FILES="" | ||
if [[ "$CI" ]]; then | ||
CHANGED_FILES=$(git --no-pager diff --name-only $TRAVIS_COMMIT_RANGE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
$TRAVIS_COMMIT_RANGE
cooool
Hm, yeah seems useful to match golden LHR pattern here for now, I'll exit with |
@paulirish I oversimplified the It contains the list of all the URLs we should compare, the WPT metric values, and the paths to the devtools log and trace of the unthrottled runs. Just WPT feels a bit weird in that scenario since it's an index of the unthrottled data around it. If I were going verbose I'd probably say something like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry! lots of questions.
we probably have to work through this in voice. :)
evaluate-results.js
- this is evaluating the correlation of wpt data with our HEAD lantern data, right?print-correlations.js
?diff-expectations.js
- this has nothing to do with wpt data but is logging out any difference between master...HEAD of lantern metrics, right? do we need this script to pass? or is this a sanity check so we are cognizant of any changes we make?assert-master-lantern-values-unchanged.js
?
- lantern-3g-master-computed-values.json
this name lgtm, though i'd be fine with dropping 3g.
- lantern-3g-golden-expectations-index.json
this one not lgtm.
first, i'm a little wary of using 'expectations' especially since we use it in smokehouse and it feels like the semantics here are different.
i still feel like calling this wpt-baseline
is fair. you said:
Just WPT feels a bit weird in that scenario since it's an index of the unthrottled data around it.
but i dont understand. if the data has no throttling applied then why would it be ...-3g-...
? i dont get why it'd be called lantern
either.
and perhaps a rather macro question: why are we correlating simulated 3g against unthrottled metrics? i totally was expecting that we're comparing all this to wpt-3g metrics.
site-index-plus-golden-expectations
site-index-plus-golden-expectations-plus-HEAD-computed
exitCode = 1; | ||
} | ||
} finally { | ||
fs.unlinkSync(TMP_COMPUTED); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how do these files compare to lantern-data/lantern-{computed|expected}.json
?
can we move -data/lantern-computed to .tmp?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe these should have -for-diff
in the filename? (if i'm understanding why they're separate)
tar -xzf lantern-traces.tar.gz | ||
mv lantern-traces-subset lantern-data | ||
mv lantern-traces-subset/ lantern-data/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can these live in something like lantern-data/unthrottledAssets/
?
so that lantern-data/lantern-expected.json are not siblings to them?
const EXPECTATIONS_PATH = path.resolve(process.cwd(), INPUT_PATH); | ||
const EXPECTATIONS_DIR = path.dirname(EXPECTATIONS_PATH); | ||
const COMPUTED_PATH = path.join(EXPECTATIONS_DIR, 'lantern-computed.json'); | ||
const RUN_ALL_SCRIPT_PATH = path.join(__dirname, 'run-all-expectations.js'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems like this isn't running expectations but "collecting computed values". wdyt?
DIRNAME="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )" | ||
LH_ROOT_PATH="$DIRNAME/../../.." | ||
cd $LH_ROOT_PATH | ||
|
||
if [[ -f lantern-data/lantern-expectations.json ]] && ! [[ "$FORCE" ]]; then | ||
echo "Lantern data already detected, done." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const path = require('path'); | ||
|
||
const GOOD_ABSOLUTE_THRESHOLD = 0.2; | ||
const OK_ABSOLUTE_THRESHOLD = 0.5; | ||
|
||
const GOOD_RANK_THRESHOLD = 0.1; | ||
|
||
if (!process.argv[2]) throw new Error('Usage $0 <computed summary file>'); | ||
const COMPUTATIONS_INPUT_ARG = process.argv[2] || './lantern-data/lantern-computed.json'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one of these things doesn't look like the others. COMPUTATIONS?
@paulirish how you feelin' about this now :) |
merge when you're ready |
let's do it! 🎉 |
Adds lantern regression testing against ~100 URLs to travis (only when lantern files are affected).
A few things this PR highlighted:
computed
vs.expected
but these aren't great and not 100% consistent.