Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CPU/Memory Power is different on each run #6907

Closed
iAmServerless opened this issue Jan 2, 2019 · 15 comments
Closed

CPU/Memory Power is different on each run #6907

iAmServerless opened this issue Jan 2, 2019 · 15 comments
Assignees
Labels
P1 PSI/LR PageSpeed Insights and Lightrider variability

Comments

@iAmServerless
Copy link

iAmServerless commented Jan 2, 2019

Lighthouse Audit gives a completely different result for performance on multiple runs.

The only thing which is different between each run is CPU/Memory Power.

Machine MacBook Pro (Retina, 15-inch, Mid 2014)
Chrome Version 71.0.3578.98
Lighthouse version 3.2.0.
CPU/Memory Power: 1372, 1320, 992

Machine MacBook Pro (Retina, 15-inch, Mid 2014)
Chrome Version 73.0.3658.0
Lighthouse 4.0.0-beta.
CPU/Memory Power: 1329, 1278

web.dev
Lighthouse version 4.0.0-alpha.1
CPU/Memory Power: 886, 736

All tests are run with the following settings

Device - Mobile,
Throttling - Simulated Fast 3G, 4x CPU slowdown
The same configuration is used by web.dev #6654 (comment)

What is the expected behavior?

It is very difficult to analyze the results because post improvements results are positive on one run and negative on another.

I believe throttling CPU 4x means throttling the CPU on which lighthouse is running by 4x.
Isn't it possible to simulate CPU used in nexus 5X on web.dev.
1.8 GHz hexacore (4x1.4 GHz Cortex-A53 + 2x1.8 GHz Cortex-A57) 64-bit ARMv8-A

Both Images are from web.dev

screen shot 2019-01-02 at 5 04 55 pm

screen shot 2019-01-02 at 5 05 15 pm

Below scores are from MacBook Pro (Retina, 15-inch, Mid 2014) and Chrome Version 73.0.3658.0

screen shot 2019-01-02 at 5 17 50 pm

screen shot 2019-01-02 at 5 18 00 pm

We are actively working on Performance and this variation of Scores from 37 to 62 is making it very difficult for us to set a benchmark.

@exterkamp
Copy link
Member

Thanks for submitting an issue! I ran this site across pagespeed-insight, web.dev, locally in node, and in the extension. The scores seemed very consistent in PSI, web.dev, and in node (35-45ish). But when run in the extension it gets >60. Are you running from the MBP with the extension? If you run with the node-cli does it still vary widely?

The CPU/Memory Power numbers are an estimated debug statistic and shouldn't effect the score in these ways, when I run locally with power >1200 it generates consistent results with PSI/web.dev which has power ~600.

@connorjclark
Copy link
Collaborator

connorjclark commented Jan 2, 2019

Let's give the bot I wrote a spin.

LH Runner Go! https://www.makaan.com/delhi-property/uttam-nagar-flats-for-sale-51195

EDIT:

Just ignore the first two outputs. Bot is a bit bugged it seems.

@devtools-bot
Copy link

I ran Lighthouse for https://www.makaan.com/delhi-property/uttam-nagar-flats-for-sale-51195, here's what I found.

index
output lighthouse@​4.0.0-beta
output lighthouse@​3.2.1
json output PSI
html json output lighthouse@​master-4f16a6
html json output Extension@​3.3.0.4001-Chrome72.0.3617.0

@connorjclark
Copy link
Collaborator

connorjclark commented Jan 2, 2019

The above bot ran on my machine, and shows PSI scoring 84, node getting 53 and the chrome extension getting 50. So that's wild.

EDIT: OK, the bot was running PSI for a desktop target, instead of the expected mobile. Disregard that value. I'll update the bot to run PSI for a mobile device.

@patrickhulce
Copy link
Collaborator

If PSI really does get down to ~600 CPU power, then it's probably worth pushing on #6162. That could have a meaningful impact on performance numbers.

@benschwarz
Copy link
Contributor

Calibre is seeing very consistent benchmark indexes as well as metrics. I know that’s anecdotal “evidence”, but I’d be happy to share some specific details on request

@iAmServerless
Copy link
Author

iAmServerless commented Jan 3, 2019

@exterkamp I have seen the score from 32 to 60 varying on web.dev itself in multiple runs.
Results are completely different on webpagetest.org (https://www.webpagetest.org/lighthouse.php?test=190102_JM_b47f3be54e3a6f6ba3bd741de293dd64&run=2) where throttling is done at os level(catchpoint/WebPageTest#1156 (comment)) instead of lighthouse throttling.

Also can you please share how CPU/Memory Power is calculated what factors are considered. It can help us in setting our system accordingly.

@brendankenny
Copy link
Member

Just to be clear, we expect the CPU/Memory Power number to be different depending on the machine Lighthouse is run on (that's why Lighthouse measures it). The issue here (as I understand it) is the variance when run multiple times on a single machine. It is a benchmark so will be affected by the state of the machine and what else is running at the time, but the benchmark is meant to be relatively lightweight and quick, so it won't be perfect and we can definitely improve it further in the future.

Other clarifications:

  • the benchmark is run without any Lighthouse-provided throttling applied, so the particular throttling numbers shouldn't affect it
  • throttling provided outside Lighthouse will affect the number (like through WebPagetest), but ideally it should just appear to be a slower machine from Lighthouse's perspective and (again, ideally) should still be consistent across multiple runs of the same physical device and the same throttling settings.

@exterkamp
Copy link
Member

Still looking into possible variance from our side, running this site 50 times yielded these results:

  • avg: 0.3516
  • std dev: 0.06478724852
  • min: 0.2199999988
  • max: 0.4799999893

all scores

We can add this site to the list of problematic performance sites. I'd be curious what DZL thinks of this site @patrickhulce is this point spread high or low on LR?

@patrickhulce
Copy link
Collaborator

It's within the expected range for live sites. Normal-ishly distributed with a max range of ~25 is a fairly reasonable range for sites that aren't making explicit efforts to be stable for measurement. It's a tad high in our post-lantern world, but not shocking.

(Histogram below for those who might find it easier to interpret the data that way 😃)

image

For reference, with DevTools throttling the 95% confidence interval for most sites was typically +/- 15 on the performance score. Lantern is roughly half that for most sites +/- 8. For curious visitors, you can read more about variance in our comprehensive doc

@iAmServerless
Copy link
Author

Thanks, @patrickhulce for sharing the document.
Its will help us in moving forward for our performance goals.

@paulirish
Copy link
Member

Plan is to set a fixed multiplier in PSI where we have a benchmarkindex of ~800, which is 1/2 of a typical laptop.

@exterkamp
Copy link
Member

@patrickhulce is exploring this more and capturing some data on it here #9085

@exterkamp
Copy link
Member

Deduping for #9085

@kirthikasimi
Copy link

Why does the CPU power differ everytime we run lighthouse report on chrome web devtools , is it the user machine CPU power (ranges from 1200-1500) and everytime we run pagespeed insights is it the google server CPU power(800-1000)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P1 PSI/LR PageSpeed Insights and Lightrider variability
Projects
None yet
Development

No branches or pull requests

9 participants