Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: add lantern accuracy data #3826

Merged
merged 3 commits into from
Dec 5, 2017
Merged

docs: add lantern accuracy data #3826

merged 3 commits into from
Dec 5, 2017

Conversation

patrickhulce
Copy link
Collaborator

closes #3691

before rushing off to improve accuracy, the next AI is on @vinamratasingal to draw up document on what we actually want to achieve accuracy with

Copy link

@vinamratasingal-zz vinamratasingal-zz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, added a couple of comments/sections that would be helpful.

docs/lantern.md Outdated
| -- | -- | -- | -- |
| Lantern predicting Default LH | .850 : 19.6% | .866 : 21.0% | .907 : 26.9% |
| Lantern predicting LH on WPT | .764 : 34.4% | .795 : 32.5% | .879 : 33.1% |
| Lantern w/adjusted settings predicting LH on WPT<sup>1</sup> | .769 : 32.9% | .808 : 31.1% | .879 : 32.6% |

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What settings were adjusted here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RTT, throughput, and CPU multipliers

I'll move the footnote that explains this to be on adjusted settings instead of the end of the line 👍

docs/lantern.md Outdated
<sup>1</sup> 320 ms RTT, 1.3 mbps, 5x CPU

<sup>2</sup> Default LH traces and WPT traces were captured several weeks apart, so some site changes may have occurred that skew these stats

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a section explaining what conclusions we are drawing from this data, and potential reasons for why Lantern is correlating TTI on WPT but not FMP/FCP?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done-ish :) with the additional reference stats the TTI/FMP/FCP isn't necessarily an outlier that needs explaining anymore IMO, but let me know if you still think it needs some hypotheses on that

docs/lantern.md Outdated

## Accuracy

All of the following accuracy stats are reported excluding the 10% tail as the initial research found approximately ~10% of sites will radically vary simply by visiting the page a second time through no fault of the metrics or prediction logic. This means the accuracy is slightly overstated but should still hold for the controlled-enivornment/repeated view use case.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make it explicit that this was calculated based on an analysis of 1500 URLs run only once

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@patrickhulce
Copy link
Collaborator Author

@vinamratasingal I believe I have addressed your concerns, mind taking another look?

Copy link

@vinamratasingal-zz vinamratasingal-zz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost :) I have one more question that needs to be answered before giving a LGTM.

docs/lantern.md Outdated
## Conclusions

### Lantern Accuracy Conclusions
Definitive conclusions on repeat view accuracy require much more data for the same URLs (i.e. more than 1 run for each URL per environment), but for the single view use case, Lantern is roughly as accurate at predicting the rank of a website the next time you visit it as the metrics themselves which is the highest goal we set out to achieve. As a sanity check, we also see that using the unthrottled metrics to predict the rank of throttled performance has a significantly lower rank correlation than Lantern.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm just being silly, but can you help me understand what "Lantern is roughly as accurate at predicting the rank of a website the next time you visit it as the metrics themselves which is the highest goal we set out to achieve." means?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence is just pointing out that the rank correlation of Lantern with LH is roughly the same as LH with LH, meaning that running lantern once on a URL gives you just as good of a clue as to what the next load time will be as you loading the site for real, i.e. the accuracy of Lantern is smaller than or equal to the natural deviation of load timing.

The jury is still out on how inaccurate the estimate would be if you could run it 100 times, which is identified in future work.

Let me know which snippets from here you think are worth including or if it's still clear as mud :)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm yeah the language feels a bit unclear to me. For me, what would be helpful is to reframe this section into two bullets:

  • For the single view use case, we conclude that the rank correlation of Lantern with LH is roughly the same as LH with LH. [add 1 sentence explaining what this means based on what you said above]

  • For repeat view accuracy, we need to do more work.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How's this

  • For the single view use case, we conclude that the rank correlation of Lantern with standard LH is roughly the same as any the rank correlation between any two arbitrary LH runs. That is to say, the average error we observe between a Lantern performance score and a LH on DevTools performance score is within the expectated natural deviation. As a sanity check, we also see that using the unthrottled metrics to predict throttled performance has a significantly lower correlation than Lantern does.
  • For the repeat view use case, we require more data to reach a conclusion, but the high correlation of the single view use case suggests the accuracy meets our correlation requirements even if some sites may diverge.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds great! LGTM coming your way :)

docs/lantern.md Outdated
## Conclusions

### Lantern Accuracy Conclusions
Definitive conclusions on repeat view accuracy require much more data for the same URLs (i.e. more than 1 run for each URL per environment), but for the single view use case, Lantern is roughly as accurate at predicting the rank of a website the next time you visit it as the metrics themselves which is the highest goal we set out to achieve. As a sanity check, we also see that using the unthrottled metrics to predict the rank of throttled performance has a significantly lower rank correlation than Lantern.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds great! LGTM coming your way :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Compare Lantern results to WPT throttled output
2 participants