docs: add lantern accuracy data #3826

patrickhulce · 2017-11-15T21:22:59Z

before rushing off to improve accuracy, the next AI is on @vinamratasingal to draw up document on what we actually want to achieve accuracy with

vinamratasingal-zz

Looks good, added a couple of comments/sections that would be helpful.

vinamratasingal-zz · 2017-11-15T23:26:08Z

docs/lantern.md

+| -- | -- | -- | -- |
+| Lantern predicting Default LH | .850 : 19.6% | .866 : 21.0% | .907 : 26.9% |
+| Lantern predicting LH on WPT | .764 : 34.4% | .795 : 32.5% | .879 : 33.1% |
+| Lantern w/adjusted settings predicting LH on WPT<sup>1</sup> | .769 : 32.9% | .808 : 31.1% | .879 : 32.6% |


What settings were adjusted here?

RTT, throughput, and CPU multipliers

I'll move the footnote that explains this to be on adjusted settings instead of the end of the line 👍

vinamratasingal-zz · 2017-11-15T23:28:23Z

docs/lantern.md

+<sup>1</sup> 320 ms RTT, 1.3 mbps, 5x CPU
+
+<sup>2</sup> Default LH traces and WPT traces were captured several weeks apart, so some site changes may have occurred that skew these stats
+


Can we add a section explaining what conclusions we are drawing from this data, and potential reasons for why Lantern is correlating TTI on WPT but not FMP/FCP?

done-ish :) with the additional reference stats the TTI/FMP/FCP isn't necessarily an outlier that needs explaining anymore IMO, but let me know if you still think it needs some hypotheses on that

vinamratasingal-zz · 2017-11-17T21:55:03Z

docs/lantern.md

+
+## Accuracy
+
+All of the following accuracy stats are reported excluding the 10% tail as the initial research found approximately ~10% of sites will radically vary simply by visiting the page a second time through no fault of the metrics or prediction logic. This means the accuracy is slightly overstated but should still hold for the  controlled-enivornment/repeated view use case.


Can we make it explicit that this was calculated based on an analysis of 1500 URLs run only once

patrickhulce · 2017-11-28T01:01:54Z

@vinamratasingal I believe I have addressed your concerns, mind taking another look?

vinamratasingal-zz

Almost :) I have one more question that needs to be answered before giving a LGTM.

vinamratasingal-zz · 2017-11-28T12:08:48Z

docs/lantern.md

+## Conclusions
+
+### Lantern Accuracy Conclusions
+Definitive conclusions on repeat view accuracy require much more data for the same URLs (i.e. more than 1 run for each URL per environment), but for the single view use case, Lantern is roughly as accurate at predicting the rank of a website the next time you visit it as the metrics themselves which is the highest goal we set out to achieve. As a sanity check, we also see that using the unthrottled metrics to predict the rank of throttled performance has a significantly lower rank correlation than Lantern.


Maybe I'm just being silly, but can you help me understand what "Lantern is roughly as accurate at predicting the rank of a website the next time you visit it as the metrics themselves which is the highest goal we set out to achieve." means?

This sentence is just pointing out that the rank correlation of Lantern with LH is roughly the same as LH with LH, meaning that running lantern once on a URL gives you just as good of a clue as to what the next load time will be as you loading the site for real, i.e. the accuracy of Lantern is smaller than or equal to the natural deviation of load timing.

The jury is still out on how inaccurate the estimate would be if you could run it 100 times, which is identified in future work.

Let me know which snippets from here you think are worth including or if it's still clear as mud :)

Hmmm yeah the language feels a bit unclear to me. For me, what would be helpful is to reframe this section into two bullets:

For the single view use case, we conclude that the rank correlation of Lantern with LH is roughly the same as LH with LH. [add 1 sentence explaining what this means based on what you said above]

For repeat view accuracy, we need to do more work.

How's this

For the single view use case, we conclude that the rank correlation of Lantern with standard LH is roughly the same as any the rank correlation between any two arbitrary LH runs. That is to say, the average error we observe between a Lantern performance score and a LH on DevTools performance score is within the expectated natural deviation. As a sanity check, we also see that using the unthrottled metrics to predict throttled performance has a significantly lower correlation than Lantern does.

For the repeat view use case, we require more data to reach a conclusion, but the high correlation of the single view use case suggests the accuracy meets our correlation requirements even if some sites may diverge.

Sounds great! LGTM coming your way :)

vinamratasingal-zz · 2017-12-05T02:06:01Z

docs/lantern.md

+## Conclusions
+
+### Lantern Accuracy Conclusions
+Definitive conclusions on repeat view accuracy require much more data for the same URLs (i.e. more than 1 run for each URL per environment), but for the single view use case, Lantern is roughly as accurate at predicting the rank of a website the next time you visit it as the metrics themselves which is the highest goal we set out to achieve. As a sanity check, we also see that using the unthrottled metrics to predict the rank of throttled performance has a significantly lower rank correlation than Lantern.


Sounds great! LGTM coming your way :)

patrickhulce requested review from paulirish and vinamratasingal-zz as code owners November 15, 2017 21:22

vinamratasingal-zz suggested changes Nov 15, 2017

View reviewed changes

vinamratasingal-zz suggested changes Nov 17, 2017

View reviewed changes

patrickhulce added 2 commits November 27, 2017 17:01

docs: add lantern accuracy data

b8f4281

feedback

c260b37

patrickhulce force-pushed the add_lantern_stats branch from 32448e1 to c260b37 Compare November 28, 2017 01:01

vinamratasingal-zz reviewed Nov 28, 2017

View reviewed changes

more feedback

b08d28b

vinamratasingal-zz approved these changes Dec 5, 2017

View reviewed changes

patrickhulce merged commit 10d27e8 into master Dec 5, 2017

patrickhulce deleted the add_lantern_stats branch December 5, 2017 19:10

dependencies bot mentioned this pull request Dec 17, 2017

Update lighthouse in / from 2.5.0 to 2.7.0 chauncey-garrett/dotfiles#57

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: add lantern accuracy data #3826

docs: add lantern accuracy data #3826

patrickhulce commented Nov 15, 2017

vinamratasingal-zz left a comment

vinamratasingal-zz Nov 15, 2017

patrickhulce Nov 15, 2017

vinamratasingal-zz Nov 15, 2017

patrickhulce Nov 28, 2017

vinamratasingal-zz Nov 17, 2017

patrickhulce Nov 28, 2017

patrickhulce commented Nov 28, 2017

vinamratasingal-zz left a comment

vinamratasingal-zz Nov 28, 2017

patrickhulce Nov 28, 2017

vinamratasingal-zz Nov 30, 2017

patrickhulce Dec 4, 2017

vinamratasingal-zz Dec 5, 2017

vinamratasingal-zz Dec 5, 2017

		<sup>1</sup> 320 ms RTT, 1.3 mbps, 5x CPU

		<sup>2</sup> Default LH traces and WPT traces were captured several weeks apart, so some site changes may have occurred that skew these stats


		## Accuracy

		All of the following accuracy stats are reported excluding the 10% tail as the initial research found approximately ~10% of sites will radically vary simply by visiting the page a second time through no fault of the metrics or prediction logic. This means the accuracy is slightly overstated but should still hold for the controlled-enivornment/repeated view use case.

docs: add lantern accuracy data #3826

docs: add lantern accuracy data #3826

Conversation

patrickhulce commented Nov 15, 2017

vinamratasingal-zz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

patrickhulce commented Nov 28, 2017

vinamratasingal-zz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment